VCS comparison: Git / Mercurial / Bazaar

in
Translation Note: The Engels version of this content is being displayed because the Nederlands translation is unavailable.

...mainly from the perspective of a Drupal developer.

In november 2008, I was going to compare version control systems, to give input on which system my employer was going to use for its toolchain. We never really got to a proper decision about this, but it led to me settling on one in my personal business. I'll document reasons for choosing my current one, so I may look at them later and see if anything's changed. (This may grow into a number of evolving blog posts about the same subject.)

...and also, because there is less current proper comparison of VCS products floating around on the web, than I would have expected. ('Current', as in 'less than two years old', is important because some of these products are still developing pretty quickly.)

Preselection

People had already decided it was going to be one of the decentralized VCSes (that were gaining ground so fast), so that developers would have some flexibility w.r.t. their own working model. We did not want to require everyone to check in all their own working branches in a central location.

So I browsed around for two evenings, trying to get a feel for what the big buzz was about those decentralized VCSes, and quickly found that I agreed with that line of thought. Then I tried to find out about the differences between Git and Mercurial, to be able to make sound decisions. Only after two days I discovered there was a serious third contender which I'd never heard of: Bazaar - and started again comparing it against the other two. (I saw other projects as well, like Darcs, but they seemed to have lost all momentum.)

Requirements

After browsing around, I got a better feel for what criteria were the ones that we should decide upon...

  • The top priority is that developers should not (or as little as possible) be 'crippled' in their current way of working, whatever it is. We want a source repository with revision history and some level of control over the code... but without making developers jump through hoops that they normally wouldn't.
  • Second: the presence of functionality that helps developers do their job (of maintaining their code, branches, whatever) better / more efficiently.
  • Third: the system should fit our working model nicely. Most of the 'real' development was probably going to be Drupal modules, and then one-off patches to existing code. Not a lot of heavy concurrent (by more than two or three coders) development on the same chunk of code was expected.
    The working model would most of the time (seen from 'the company view') be:
    • We have one central repository for Drupal stuff, i.e. one 'mainline branch' that we all have access to and are supposed to commit things to once they're ready (or need to be stored for another reason).
    • Most cycles go like "pull/update from repository, develop code, [insert any local VCS stuff and branching here, which the company repository needs to 'know' only as far as the developers want it to], commit changed/new code to repository. Merging could happen but wasn't expected to be very dramatic, and neither were commit sizes.
  • It should have a good interface to CVS (because Drupal still uses a CVS repo, and we should be able to make commits to our future projects on that
  • More 'working model' related things: most people use their own hardware/OS for development, and not all users of the repository would be 'heavy coders'. So the presence of good Windows support/(somewhat graphical) integration would be good. Preferrably not hard to install.
  • Good documentation is always a plus - especially because not 100% of users would be Certified Geeks.
  • Same goes for integration with other IDEs / packages. All good working integration 'plugins' would be a plus. (We hadn't standardised on any IDE yet, and we didn't know if we would. Some were saying Eclipse, some Netbeans, and I was switching between Vim and Netbeans just trying stuff out.)
  • Finally: when in doubt (because systems are very close in terms of the above features), choose the easiest one to learn / use.

Details

CVS access/support was pretty well (bug free) covered by all systems. (I don't remember detail differences.)

As for the 'developer functionality', there's one 'special thing' I thought of that would be nice to have support for. That is, some kind of support for 'nested branches'. Usually, you want to work on a Drupal module, that is inside of a Drupal tree (working directory). This Drupal directory tree would probably be have an associated branch/repository somewhere, but the module (i.e. subdirectory) you're developing this module in, is not necessarily registered in that same VCS branch/repo. And you certainly want to keep the change histories of 'the Drupal tree' and 'the custom developed module' separate.

From the documents, it was hard to see how these things work in practice and whether the things I read were actually equal to what I wanted. Documentation told me that Mercurial had some experimental feature called nested repositories and Git had a stable feature called submodules.
I have not tested any of these in practice, because I have not used either system. While using Bazaar, I later found out that no special functionality is needed. You simply do a 'bzr init' in a subdirectory1 and you have a separate 'branch/repo'. (Maybe with git/hg it also works like that, and nothing special is needed; I don't know.)

About the 'working model' I mentioned, the idea developing in my head was much clearer. Bazaar wins. We have one repository with 'mainline' branches in a central location and a couple of developers usually track the changes in that, make changes, and push('merge', whatever) their changes back to the central location.

From reading the documents, it seems like Mercurial cannot even do that! I don't have the reference any more, but I clearly read that you cannot merge changes (back) into an unchanged branch. So a developer would be unable to 'push' his modifications to the main central branch, unless/until someone modified that branch? WTF!?2 (That is probably because of a totally different philosophy. While reading, I wasn't even sure if Mercurial has a proper concept of 'pushing' changes, at all.)

Except for this (which pretty much threw Mercurial out of the equation) I know that we can probably get all three systems to match our way of working (or vice versa) without severe problems in the end... because all systems are flexible enough. But I think it really is an advantage if a system seems to have sane default assumptions geared to your way of working. Bazaar just wins here. The concepts in its documentation match 'our way of thinking' much more closely. For instance, it has sane version numbering. Maybe not everyone likes 'sane version numbering', but when you just have one main branch anyway (and X people with their own developer branch tracking it) instead of X developer branches that constantly merge each other's changes in a quasi random way... it can come in pretty darn handy. 'Sane defaults' are good and make you feel more comfortable adopting a new system.

Also... the Bazaar documentation seemed much better suited to give to my boss than Git's. Even though my boss is not the most important person in the equation, he was also going to be a user of the system, so that is a plus.

So this is not going to be a nice schematic with pluses and minuses for every mentioned requirement. These things were the ones that made me lean toward Bazaar, and I wasn't seeing anything why I would not select it. (Yeah yeah, big speed differences and frequent compatibility breaks between storage models. I read tons of posts about those - but they seemed to be largely gripes from a past period. It was still slow but not as slow that it would hurt us.)

There was another 'non requirement' that made me lean toward Bazaar. A few bigger projects seemed to be switching to it, recently. I also read a story about why Mozilla's decision process (illustrated with cheesy pictures Wink) eliminated Bazaar, but even that text led me to lean toward Bazaar because its only disadvantage (slowness) was part fixed already in the past year, past unimportant - and for the rest it came out positive.

Plus... I wasn't sure about the future of Drupal itself in terms of version control... there seemed to be mainly Git fans and Bazaar fans hanging around, but browsing the drupal.org archive I had the impression that Bazaar was used slightly more among Drupal developers. (But one year later, october 2009, and I think there still is no clear tendency among Drupal devs toward one or the other.)

Decision:

Bazaar wins, as far as I'm concerned.

...however, I said good bye to my employer before I got to speaking this through with colleagues or writing this down.

July 2009, I finally set up my own repository for Drupal (including contrib modules and needed patches) using Bazaar.

Present and future

I'm happy enough with it... but not totally convinced I won't ever switch to anything else.

Bazaar is certainly not bug free yet - and also, the different concepts & working models for branches (some of which have been added later) are not as easy to grasp and distinguish, as I hoped. I have set up my branches so that I use Four Kitchens' repository as a parent. (David Strauss wrote a good practical introduction to those different concepts). But I've been going mad before I knew what 'normal standalone branches, stacked branches, repository branches, checkouts and lightweight checkouts' really are.

The situation was worsened by the fact that I hit a few (nonreproducible) bugs while trying to branch Four Kitchens' repository. And there was no good documentation guiding me in that confusion and clearly explaining different concepts, which is less than what I'd hoped for. (So I probably made some wrong assumptions in what are 'bugs' and what is 'expected behaviour', like an upstream repo not telling me what format it is... which then led me to file an unnecessary bug report at Launchpad.)

Bazaar seems stupid about merging in ways I don't understand.
It happens sometimes that I have (a directory with) files that are not under version control (because I was just testing a newly downloaded module) and when these files seem OK, they are added to my 'downloads' repository in another way and eventually get merged back into my working branch.
What happens? The merge of the 'new versioned' directory from my upstream branch creates a conflict with the 'unversioned' files in my working tree (the directory is moved to 'directoryname.moved') even though they have the exact same contents. And I have to resolve those conflicts manually (by just removing the identical and superfluous 'directoryname.moved' and running an explicit 'bzr resolve directoryname'). That just seems stupid.

There are more scenarios where identical files create merge conflicts, that I don't see the logic of. (I don't have another example right now.) A friend (who's on cygwin) has had much grief about lots of conflicts that I'd never seen, which created lots more work for him (to my embarrassment - even though I could never reproduce such a situation).

So it isn't totally bug free yet. But the way I work with it (after I eventually managed to create a repo without errors), it works pretty well. I personally have not run into any showstoppers yet.

Other feature differences I'm collecting while writing this text (and maybe in the future) to see whether I've made the right choice, are:

Bazaar makes it easy to create and serve 'central repositories', with all the different protocols it operates over. That's a plus. Maybe not a big one, because I have my own server anyway... and once you've invested the time for setting up one, it's there. But it will always remain somewhat of an advantage that if you're coding away with a few others (think 'code sprint'), you could even use a random SFTP account as an ad-hoc central repo.

While typing this up, I just read about one fundamental difference between Git and Bazaar: there's a one-to-one relationship of a directory and a 'branch' in Bazaar. Different branches means different directories. And if you're a web developer, being able to switch branches in the same directory easily may grow to be a big plus, even though I don't yet care.

Further, Git's index (staging area) may be a real nice thing once you're used to it (even though it requires you to adapt to it). As a Drupal developer (usually not working on lots of different features and large sets of files at once), I don't yet know how much of an advantage it would be. Maybe with more development experience will come the urge to switch...

...on the other hand, it seems that the concept of 'different branches / projects in one central repository' which I am now adapting, will also need to go out the window. Because Git always looks at the repository as one thing (important for things like tagging, I guess). I don't know if that's a biggie, or that a model of 'one developed Drupal module is one separate repo' will be perfectly fine.

We'll see what the future brings. For now... I'm sticking with Bazaar. And I'll be putting up my own repo for public use soon, in case anyone thinks it's useful.

  • 1. one which does not have itself or its contents added to the 'main' branch yet. I don't even want to test what happens if you do that.
  • 2. Either this, or the primary Mercurial documentation is so obfuscated by 'insiders viewpoints with prior in depth knowledge' that a new person trying to read about its functionality is unable to grasp what the documentation means. Which, in my book, is also a big disadvantage.