At geek brekkie yesterday P— mentioned the idea of archiving links that you use with the Internet Archive. This seemed like a great idea to use in deploying my blog.
I’ve wanted to add a general link checker to look for broken links. This isn’t quite the same thing but it would be an option for remediating link rot when found. Plus it seemed simple to do.
My proof of concept for this also provides an excellent answer for a common question: when have you gone too far for a shell script and should switch to a “real language.” This script has gotten past that line so thought I’d share it.
|
|
I’ve written longer shell scripts that are fine as shell scripts. It’s not length - it’s complexity and brittleness. This script is both.
- Shellcheck will complain I’m reading/writing to
.ia-urls
in a single pipeline but in this case it’s wrong. Thesort -u
after the subshell acts as a barrier. It won’t output till the awk exits but that’s getting pretty into the weeds in shell esoterica. - The xargs running
bash -c
. It’s needed to prefix the archive save output with the url being saved. If urls were arguments to afor
loop (or thewhile read url
variant) that would make this a bit less brittle, but they would introduce their own problems. - Getting the list of urls is incomplete and misses a different markdown url style that can cross over lines.
It’s a good proof of concept to learn how the IA “api” works. And
just generally to think through the data structures and the work
flow of deployment. Right now I have ./scripts/ia-save
and
./scripts/ia-check
(also doesn’t work as a shell script)
which I see how to wire into my deployment pipeline and
possibly git hooks.