Brain Phrye

code cooking diy fiction personal photos politics reviews tools


Shell scripts: drawing the line

At geek brekkie yesterday P— mentioned the idea of archiving links that you use with the Internet Archive. This seemed like a great idea to use in deploying my blog.

I’ve wanted to add a general link checker to look for broken links. This isn’t quite the same thing but it would be an option for remediating link rot when found. Plus it seemed simple to do.

My proof of concept for this also provides an excellent answer for a common question: when have you gone too far for a shell script and should switch to a “real language.” This script has gotten past that line so thought I’d share it.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
#!/bin/bash

(
  find posts -name '*.md' -print0 \
    | xargs -0 awk '/^\[[^]]*\]: / && $2 ~ /^http/ {print $2}' \
    | sed 's/#.*//' \
    | sort -u; \
  awk '{print $1}' .ia-urls
) | sort | uniq -u \
  | grep -v //web.archive.org/ \
  | xargs -I XX bash -c 'echo "XX" $(curl -s -I "http://web.archive.org/save/XX" |
    awk "tolower(\$1) ~ /^content-location:/ {print \"https://web.archive.org/\" \$2}
         tolower(\$1) ~ /^x-archive-wayback-runtime-error:/ {print \"ERROR\"}")' \
  | tr -d '\r' \
  >> .ia-urls

I’ve written longer shell scripts that are fine as shell scripts. It’s not length - it’s complexity and brittleness. This script is both.

It’s a good proof of concept to learn how the IA “api” works. And just generally to think through the data structures and the work flow of deployment. Right now I have ./scripts/ia-save and ./scripts/ia-check (also doesn’t work as a shell script) which I see how to wire into my deployment pipeline and possibly git hooks.