alt.hn

6/26/2026 at 6:02:11 AM

A Fake Shell for Pangenomics

https://www.cs.cornell.edu/~asampson/blog/flash.html

by matt_d

6/30/2026 at 9:42:52 PM

> It might seem odd to prefer shell scripting over a full-featured dynamic scripting language, but shell scripts like this have some material advantages over Python:

And thus 99% of bioinformatics pipelines are shell at their heart... You need 10 packages, written in 4 different programming languages, and the common interfaces are files and pipes.

And for that matter, this could use a named pipe rather than a file (assuming `odgi depth` only uses streaming access):

    odgi depth -i chr8.pan.og -r chm13#chr8 | \
        bedtools makewindows -b /dev/stdin -w 5000 > chm13.chr8.w5kbps.bed
    
    odgi depth -i chr8.pan.og -b chm13.chr8.w5kbps.bed --threads 2 | \
        bedtools sort > chr8.pan.depth.w5kbps.bed
And Bash process substitution allows writing it all without an explicitly named pipe, though it may look a bit ugly:

    odgi depth -i chr8.pan.og -b --threads 2 <( \
            odgi depth -i chr8.pan.og -r chm13#chr8 | \
            bedtools makewindows -b /dev/stdin -w 5000
        ) | \
        bedtools sort > chr8.pan.depth.w5kbps.bed
Which is why bioinformaticians get bad reputations with software engineers. (I still have a fair amount of misplaced pride for adding a shebang to a Makefile once to make a pipeline into a command several decades ago...)

by epistasis

6/30/2026 at 10:04:07 PM

I added a shebang to a readme once (written in literate style) so the poor engineers on the other side wouldn't have to deal with the multi-step monstrosity within.

by AlotOfReading

6/30/2026 at 11:08:57 PM

> It might seem odd to prefer shell scripting over a full-featured dynamic scripting language, but shell scripts like this have some material advantages over Python

Nothing strange. Shell is the most natural dynamic language. It's a shame we don't have better shells.

by aboardRat4

6/30/2026 at 8:56:04 PM

I really like the IR-based approach, it solves something that's always bothered me about shell pipelines: you're forced to think in terms of serializing bytes, even when both ends of the pipe are the same program and could just share memory. Flash makes that optimization explicit and easy to compose with the rest of the pipeline. One question, though: have you run into any issues with the "opportunistic" binary format substitution (the .flatgfa fallback) when scripts are shared across machines where some files have already been converted and others haven't?

by gianiac