Brain Phrye

code cooking diy fiction personal photos politics reviews tools 


Python vs shell

[ Listen to article ]

One of the nice parts about shell is that you can quickly prototype things. But eventually they become too brittle or limited or complex and you need to switch to a better langauge. Pipelines are powerful, but real data structures are better.

The brevity is nice though. From the ia-save script, these four lines of shell…

1
2
3
4
  find posts -name '*.md' -print0 \
    | xargs -0 awk '/^\[[^]]*\]: / && $2 ~ /^http/ {print $2}' \
    | sed 's/#.*//' \
    | sort -u; \

…become twelve lines of python. The data in the python version ends up in a set (in this case accomplished by a dict) which makes it easier to work with than the stream it’s in for the shell version.

In addition it’s easy to see how to give it a list of urls to not save.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
  urls = {}
  for root, _, files in os.walk('posts', topdown=False):
    for name in files:
      if name.endswith('.md'):
        post = os.path.join(root, name)
        print(post)
        with open(post) as f:
          for line in f:
            for url in re.findall('http[s]?://[^ #")\n\]]+', line):
              urls[url] = 1
  for url in urls:
    print(url)

Sometimes “more” code is better.