I was reminded of this article which is a nice little example of how to use shell tools in Unix to solve problems faster than if you were to “write a program.”
You’re still coding in shell, it’s just that each “line” is a pretty powerful function. Plus each function has a pretty simple interface: it will take a text stream and some positional or named arguments and it will produce two text streams (one is usually “the output” and the other is usually an “out of band” stream) and an integer.
And within these constraints, you can write really powerful programs. Sometimes ridiculously more powerful.
Looking at McIlroy’s code there’s another thing to note:
|
|
The commands leading up to each sort
can run in parallel; consuming
and using the output of the previous step in the pipeline as it is
created. The sort
s are bottlenecks where the data builds up until all
previous pipeline steps complete and the sort
begins spewing output.
So in the example above, first the two tr
s and the first sort
run
in parallel. Once the tr
s exit, sort
begins to sort the data and
then starts to output. At that point the first sort
, the uniq
and
the second sort
run in parallel. Once the first sort
and the uniq
complete, the second sort
begins to sort and once output happens it and
the sed
run in parallel - with the sed
quitting causing the second
sort
to exit before finishing output (in all likelihood).
The shell-v-hadoop article above goes into way more detail on the builtin, implicit parallelization of shell, but it’s an impressive feature of the language and is one that can bring surprising performance benefits if used correctly.