Betacantrips/ bits/ vcs overview
This is a work-in-progress about various VCSs and what they do or do not do. Right now this is just a brain dump of some notes.

http://www.lucas-nussbaum.net/blog/?p=239

http://lambdaman.blogspot.com/2007/06/aptitude-development-moves-to-mercurial.html

http://times.usefulinc.com/2007/06/20-bzr

http://www.advogato.org/person/robertc/diary.html?start=70 (Holy what?)

http://en.wikipedia.org/wiki/Comparisonofrevisioncontrolsoftware#_note-0 shows lots of features and stuff; as far as I can tell, git, monotone, mercurial, darcs and bzr are pretty much interchangeable as far as that page goes.

http://www.linuxsymposium.org/2006/viewabstract.php?contentkey=194

http://web.glandium.org/blog/?p=144

http://blog.madduck.net/vcs/2007.07.11_creating-a-git-branch-without-ancestry

http://blog.ianbicking.org/dvcs-mini-roundup.html

http://blog.jozilla.net/archives/2006/3

Some notes from Mozilla people:

  • http://weblogs.mozillazine.org/preed/2007/04/versioncontrolsystemshootou1.html
  • http://wiki.mozilla.org/VersionControlSystem_Requirements
  • http://viper.haque.net/~timeless/vcs.shootout.notes

Questions you can ask of your VCS:

  • Can you have a repository which "includes" another repository?
  • Another (related) question is: if you take the fully-qualified path to a repo, and add the path of a file in that repo, do you get a fully-qualified path to a repo?

    In other words, if your project is at http://svn.example.org/foo and you have a directory called src/, can I do a checkout of http://svn.example.org/foo/src/ ? I think this is true for SVN but it does not appear to be true for darcs.

    In general, distributed version control systems do not appear to have any desire for you to be able to check out only part of a repository. There's "partial checkouts", but I think this means checkouts without full history.

    If you don't have this feature, you need to be very careful to align the repository boundaries with the semantic boundaries of each project -- splitting a repo into two repos is generally a pain, so be sure you won't. (What happens if part of a project becomes a library? Do you rewrite the history of the project, or does the library lose its history as part of the project?)

  • How does forking from upstream work?

    Let's say you want to maintain a branch which is "upstream + X" where X is some patch (or a series of patches, whatever). Can you pull patches from upstream and have your branch maintain its "upstream + X" state? (Most DVCSs do this.) What if your patch is accepted upstream, or modified and then accepted? Do you get conflicts?

    Note that I don't know how this "ought" to work. But since this is the most central mode of operation for a DVCS, let's find out what the workflow is.

  • Another thing which may alarm you if you come from a centralized-SCM background is that every working copy is also a repository. Doing a darcs get, git clone or hg clone means copying the whole repository. Generally this isn't as bad as it might be for SVN, but for a big project you're still talking about gigabytes of history.

  • Most DVCSes use long checksum numbers as the identifiers for their revisions. This is massively ugly. Most also allow using "only the beginning" of those checksums. Mercurial also allows you to refer to a revision using a repository-local sequence number: 0, 1, 2, etc. It isn't clear what anybody will do if, due to the birthday paradox and enough commits, a collision happens. [SHA1 gives you approx 1e48 different checksums -- how many commits does this mean, on average? Wikipedia says for MD5 that at least 840 billion documents are needed, so this probably isn't a big deal.]

    See: http://monotone.ca/docs/Hash-Integrity.html . I guess this means you'd need 2^80 different commits, on average, before you got a collision. That's 1.2e24, which is more than the number of bytes in a typical hard drive (1 TB = 2^40 bytes = 1.1e12 bytes).

Let's begin.

The Subversion team apparently has a policy of "no wanking": that is, that Subversion isn't an academic exercise in the latest version control fads. Fitz showed a thorough understanding of those fads but has little patience for them; he described monotone as "alien technology [...] string theory that happens to do version control" to a chorus of laughter. --http://www.al3x.net/entries/741

git

(i.e. there is no git equivalent of svn:externals) --http://wiki.debian.org/XStrikeForce/git-usage

Git was designed for monolithic code bases, not for modular code bases, although work is in progress to allow it to support sub projects (similar to svn:externals). "Such flexibility is an implicit feature of centralized SCMs, but is much more difficult to implement in a distributed system like git. As a result, git currently lacks built-in subproject support, although gitweb does have a notion of subprojects." --http://jaredrobinson.com/blog/?m=200701

hg (Mercurial)

hg is supposed to be very similar to git.

darcs

A darcs get gets you a directory which is both repository, which is a database of patches, and working copy. The working copy initially represents the result of applying all the patches in the repository, plus some unrecorded changes. Although the darcs manual says that darcs does not keep a history of changes, each repo keeps track of when a given patch was committed to it, so you can still do things like svn log (darcs changes) and diff from a given date.

  • Pluses: I like Darcs; no complicated setup/maintenance; users (including me) can customize my deployment settings for those projects I deploy in place; users can "fork" at will.

  • Minuses: No subtrees, so I'd have to have a distinct Trac site for each Darcs repository. Maintaining version synch between them might also be annoying. Using darcs on Windows sounds miserable. --http://www.advogato.org/person/titus/diary.html?start=150

Live use: http://koweycode.blogspot.com/2006/10/wxhaskell-on-darcshaskellorg.html

monotone

Monotone seperates working copies and repositories. A given working copy is taken from a particular repository. Each repository is a database of patches, as well as some number of "heads", which are each versions to which nobody has committed anything yet. When you branch, you create a new head.

Monotone integrates PKI into its permission scheme.

Live use: http://syndie.i2p.net/monotone_howto.html

  • Support for "nested trees" -- Some way to combine several projects that are versioned separately into a single workspace, similar to CVS modules, or Arch "configs". There are a fair number of people interested in this, but no design yet; part of the project would be figuring out what exactly everyone wants from this capability. --http://www.venge.net/mtn-wiki/SummerOfCode2006

bzr

bzr has an something like svn:externals called "nested trees". (see http://jelmer.vernstok.nl/blog/archives/164-Bazaar-and-Subversion-nested-tree-support.html)

arch

arch has something called 'multi-tree projects' but the details don't seem quite right to me. See this. Also this thread on darcs-users.

Sizes

  • Installed-size of darcs and dependencies: 5112 kB.
  • bzr: 5792 kB.
  • mercurial: 2507 kB.
  • monotone: 6910 kB.
  • git: 10.6 MB. With cogito: 11.6 MB.

This isn't rigorous; I already have python installed, so that could be considered a dependency of some systems.

Other crap

[2005-06-25T21:26:48Z] <ddaa> the trick is that you can have nested trees
[2005-06-25T21:27:14Z] <ddaa> and version control operations apply independently on each tree
[2005-06-25T21:27:18Z] <alex-i> hence, it's impossible?
[2005-06-25T21:27:26Z] <ddaa> if you commit the outermost tree, that does not affect the inner trees
[2005-06-25T21:27:39Z] <alex-i> why "no" then?
[2005-06-25T21:27:43Z] <ddaa> there is even some support in tla and baz to checkout such compound trees
[2005-06-25T21:28:13Z] <ddaa> Ha... I see
[2005-06-25T21:28:23Z] <ddaa> in Arch you cannot checkout part of a tree

--http://www.scooter.cx/~mozbot/%23revctrl-20050626-070000.xml

Distributed version control systems are like ordinary version control systems, except they are distributed. Distributed does not mean "better". It means "different". It also means "more complicated". It might even mean "wider in scope". Mostly, though, it means "different".

Obviously a centralized system like SVN isn't distributed. So when I see an entry like this one, I wonder, "Why is this person beating up on SVN?"

git-rebase is the mechanism by which you integrate upstream code changes into your branch or fork?

Random nitpicks: * http://www.advogato.org/person/robertc/diary.html?start=71

Is it only size-preserving changes? I didn't notice; I only noticed about the mtime.

Interesting:

  • http://www.ligarto.org/rdiaz/VersionControl.html
  • http://benjamin.smedbergs.us/blog/2007-01-26/vcs-migration-headaches/
  • http://www.robf.de/Hacking/bazaar/dvc-rules.html ?
  • http://weblogs.mozillazine.org/preed/2007/01/downplayingthedistributed_do.html
  • http://www.selenic.com/pipermail/mercurial/2005-May/000334.html
  • http://www.jukie.net/~bart/blog/git-vs-hg

Benchmarks

http://sayspy.blogspot.com/2007/07/another-unscientific-comparison-of.html

  • Size/time of commit versus number of files in repo

    • If I have 16000 files in my repo, but only change one of them, how much space is it to store the commit? How long does it take to perform?
  • Size/time of commit versus number of versions in repo

  • Time it takes to fetch a single file versus number of files in repo and number of commits in repo

  • Time it takes to view all versions of a single file versus number of files in repo and number of commits in repo

Blue Sky design by Jonas John.