You’re still coding in shell, it’s just that each “line” is a pretty powerful function. Plus each function has a pretty simple interface: it will take a text stream and some positional or named arguments and it will produce two text streams (one is usually “the output” and the other is usually an “out of band” stream) and an integer.
And within these constraints, you can write really powerful programs. Sometimes ridiculously more powerful.
Looking at McIlroy’s code there’s another thing to note:
The commands leading up to each
sort can run in parallel; consuming
and using the output of the previous step in the pipeline as it is
sorts are bottlenecks where the data builds up until all
previous pipeline steps complete and the
sort begins spewing output.
So in the example above, first the two
trs and the first
in parallel. Once the
sort begins to sort the data and
then starts to output. At that point the first
sort run in parallel. Once the first
sort and the
complete, the second
sort begins to sort and once output happens it and
sed run in parallel - with the
sed quitting causing the second
sort to exit before finishing output (in all likelihood).
The shell-v-hadoop article above goes into way more detail on the builtin, implicit parallelization of shell, but it’s an impressive feature of the language and is one that can bring surprising performance benefits if used correctly.