Sometimes I need to take two newline delimited lists and do set operations on them. These generally are outputs from commands.
You can do unions, intersections and differences - and in this post I’m
going to explore the latter. Specifically A - B
.
Say you have three directories, A
, B
and C
and you want all the
files in A
that aren’t in B
copied into C
. To do this, you can do
the following:
|
|
This works fine but the duplication of the find is kind of
annoying. Without it you’d end up with symmetric difference, A ⊖ B
,
not difference. The bits that A
and B
don’t have in common.
But can it be done without running the command twice? For this find
it’s
probably not too bad, but some commands are more compute/IO intensive.
The answer, as it often is in shell, is sed
:
|
|
This will just duplicate each line as sed by default will print each
line, the p
command just prints it one more time.