Python vs shell

One of the nice parts about shell is that you can quickly prototype things. But eventually they become too brittle or limited or complex and you need to switch to a better langauge. Pipelines are powerful, but real data structures are better.

The brevity is nice though. From the ia-save script, these four lines of shell…

  find posts -name '*.md' -print0 \
    | xargs -0 awk '/^\[[^]]*\]: / && $2 ~ /^http/ {print $2}' \
    | sed 's/#.*//' \
    | sort -u; \

…become twelve lines of python. The data in the python version ends up in a set (in this case accomplished by a dict) which makes it easier to work with than the stream it’s in for the shell version.

In addition it’s easy to see how to give it a list of urls to not save.

  urls = {}
  for root, _, files in os.walk('posts', topdown=False):
    for name in files:
      if name.endswith('.md'):
        post = os.path.join(root, name)
        with open(post) as f:
          for line in f:
            for url in re.findall('http[s]?://[^ #")\n\]]+', line):
              urls[url] = 1
  for url in urls:

Sometimes “more” code is better.