Importing an old code repo

I’ve converted code repos a number of times through my career. From cvs to subversion, from subversion to mercurial and git, from mercurial to git. However until the other day, I’d never converted from cvs to git directly.

In this case it was converting an old CVS repo I’d found in some old backup cdroms which were being tossed out.

It went well but there were a few hiccups. First there are a number of methods to do this. My first port of call was git’s own cvsimport command but the documentation makes it pretty clear there are better choices. From there I went with cvs2git which is confusingly part of the cvs2svn project. Historically it makes sense - early cvs migrations were to subversion because git and other modern version control systems didn’t yet exist.

Once I started I discovered the other larger issue: cvs repositories often have corruption issues. There are lots of reasons: cvs had a number of bugs over the years, certain actions in a repository required admins to manually make changes and cvs really lacked a way to check the correctness of repositories.

One issue is that cvs is in many ways a wrapper around an older system called rcs. And it was a rather light wrapper in many ways because source was largely kept in rcs files (,v files) and then some (but not all) repository actions were tracked in a history file. And that’s usually where problems in migrating cvs crop up.

This time that was fine. The issue in this case was old bugs in cvs. The issue I had was discussed on a mailing list back in 2000. It turns out that reanimating old data often takes old bug reports!

I just glanced through your RCS file and it looks like perhaps some revisions were deleted improperly. There used to be bugs in the ‘cvs admin -o’ command which may have caused something like this, or someone may have attempted to edit the file by hand and caused the trouble.

In this case two files in the CVSROOT directory were corrupt: cvsignore,v and modules,v. These were administration files and not actually important for migrating the repository out from cvs to git.

But it does help to understand the issue. Those files looked something like this (skipping past the header info):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
1.2
log
@foo two
@
text
@moo
foo 2
@


1.1
log
@foo 1
@
text
@d3 1
@

This is just an rcs file I made (foo,v) and then corrupted myself. If I try and check it out co -r1.1 foo it will fail with this error:

1
2
3
4
5
foo,v  -->  foo
revision 1.1
writable foo exists; remove it? [ny](n): y
co: foo,v:40: edit script refers to line past end of file
co aborted

Line 40 in this case is the d3 1 line. The rcs file format stores the full file in the current version - the moo and foo 2 lines (without the @ chars) in the revision 1.2 section. Then the older revisions do edit commands to change the text - in this case it was supposed to delete a single line at line 2. However I changed it to want to delete a single line at line 3 (@d3 1). Changing that to @d2 1 and it will work fine.

Working out how to fix that in nearly 20 year files is a bit harder, but that’s effectively the process.

Sometime later I’ll see how well 20 year C and C++ code builds on current systems. I’m guessing poorly for the C++ code.