I’ve converted code repos a number of times through my career. From cvs to subversion, from subversion to mercurial and git, from mercurial to git. However until the other day, I’d never converted from cvs to git directly.
In this case it was converting an old CVS repo I’d found in some old backup cdroms which were being tossed out.
It went well but there were a few hiccups. First there are a number of
methods to do this. My first port of call was git’s own cvsimport
command but the documentation makes it pretty clear there are better
choices. From there I went with cvs2git which is confusingly part of
the cvs2svn
project. Historically it makes sense - early cvs migrations
were to subversion because git and other modern version control systems
didn’t yet exist.
Once I started I discovered the other larger issue: cvs repositories often have corruption issues. There are lots of reasons: cvs had a number of bugs over the years, certain actions in a repository required admins to manually make changes and cvs really lacked a way to check the correctness of repositories.
One issue is that cvs is in many ways a wrapper around an older system
called rcs. And it was a rather light wrapper in many ways because
source was largely kept in rcs files (,v
files) and then some (but
not all) repository actions were tracked in a history
file. And that’s
usually where problems in migrating cvs crop up.
This time that was fine. The issue in this case was old bugs in cvs. The issue I had was discussed on a mailing list back in 2000. It turns out that reanimating old data often takes old bug reports!
I just glanced through your RCS file and it looks like perhaps some revisions were deleted improperly. There used to be bugs in the ‘cvs admin -o’ command which may have caused something like this, or someone may have attempted to edit the file by hand and caused the trouble.
In this case two files in the CVSROOT
directory were corrupt:
cvsignore,v
and modules,v
. These were administration files and not
actually important for migrating the repository out from cvs to git.
But it does help to understand the issue. Those files looked something like this (skipping past the header info):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | 1.2 log @foo two @ text @moo foo 2 @ 1.1 log @foo 1 @ text @d3 1 @ |
This is just an rcs file I made (foo,v
) and then corrupted myself.
If I try and check it out co -r1.1 foo
it will fail with this error:
1 2 3 4 5 | foo,v --> foo revision 1.1 writable foo exists; remove it? [ny](n): y co: foo,v:40: edit script refers to line past end of file co aborted |
Line 40 in this case is the d3 1
line. The rcs file format stores the
full file in the current version - the moo
and foo 2
lines (without
the @
chars) in the revision 1.2 section. Then the older revisions do
edit commands to change the text - in this case it was supposed to delete
a single line at line 2. However I changed it to want to delete a single
line at line 3 (@d3 1
). Changing that to @d2 1
and it will work fine.
Working out how to fix that in nearly 20 year files is a bit harder, but that’s effectively the process.
Sometime later I’ll see how well 20 year C and C++ code builds on current systems. I’m guessing poorly for the C++ code.