Brain Phrye

code cooking diy fiction personal photos politics reviews tools 


Activity - a one-off script

[ Listen to article ]

So after using Go to do the initial activity collection, I went a bit more old school for extracting it from git repos directly. Note that this and the previous post use the data capabilities of hugo.

Yep, shell, sed, awk, sort and uniq baby! It’s not ideal, but this script only runs once so good enough.

The reason for this one-off extraction is because both gitlab and github restrict the time period they save activity data (two years for gitlab, 90 days for github - though some gitlab instances hold more and some less). Future activity recording will be done via the Go programs described before; this just runs once-ish. If I find older code repos I’ll likely add them to this.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
personal=(
# local paths to repos.
)
github=(
# local paths to repos.
)

die() {
  echo "FATAL: $*"
  exit 1
}

personal_git() {
  cd "$1" || die "Couldn't change to '$1'"
  git log --before '2017-10-28' --all --date='format:%Y-%m-%d' \
          --pretty='format:%ae:%ad'
  cd - > /dev/null || die "Couldn't change back from '$1'"
}

github_git() {
  cd "$1" || die "Couldn't change to '$1'"
  git log --before '2019-02-07' --all --date='format:%Y-%m-%d' \
          --pretty='format:%ae:%ad'
  cd - > /dev/null || die "Couldn't change back from '$1'"
}

{
for d in "${personal[@]}"; do
  personal_git "$d"
  echo
done
for d in "${github[@]}"; do
  github_git "$d"
  echo
done
} | cat \
  | grep -E 'kevin.lyda@aptarus.com|kevin@ie.suberic.net|devnull@localhost' \
  | sed 's/.*://;s/-/ /' \
  | sort -r \
  | uniq -c \
  | awk '{printf("printf \"\\\"%s\\\":%s,\" >> ../../../data/activity/older_%s\n", $3, $1, $2)}' \
  | bash

for f in ../../../data/activity/older*; do
  t=$f.json
  printf '{"update":"","stats":{' > $t
  sed 's/,$/}}/' $f >> $t
  rm $f
done

This isn’t an uncommon sort of shell reporting pipeline. You pull in a bunch of data, filter out the bits you don’t want (the grep) and then format it for future consumption. The awk -> bash bit is a thing I don’t usually do - in fact I can’t think of a time I’ve done it before - but it’s not dissimilar to awk to xargs say.

For something I’d run more often it’s not what I’d use, but for a one-off script this sort of thing works well.