23 May 2008

This week I’ve been at UDS in Prague, and looking at some possible ways to deploy bzr for packaging (which is a hot topic: developers don’t want to change workflows without a concrete benefit, and definitely don’t want to pay a cost for doing so – e.g. having to have all of history locally just to make a trivial change).

One of the discussions inspired a scalability test for bzr – not how we think we’d deploy bzr for Ubuntu developers, just a test to understand how it would scale *if* we did it this way.

Lars Wirzenius has a habit of testing VCS systems capabilities in various ways, including importing the Debian/Ubuntu source archive into them. He kindly ran a test using bzr, creating a single shared repository, with one branch in it per source packages.

This took a few hours to generate (I’m not sure of the exact figure, we forgot to time it, but it was started in the afternoon and finished in the morning). The resulting repository has 21GB in its .bzr/repository/packs directory, and 500MB in its .bzr/repository/indices directory. There are 30 pack files, the largest of which is 16GB, and the smallest a few hundred kB.

In general VCS terms this repository has 16000 heads, 16000 commits (because we didn’t import deep archive history).

But what about performance? Its currently copying to a machine where I can do some serious benchmarks using this repository. I do have some quick and dirty figures though. To branch a single package (libyanfs-java) from its branch within the repository to a new standalone branch with cold cache took ~5 seconds. Branching again from the repository now the needed data is in page cache took 0.6 seconds. Branching from the newly created branch to another new standalone branch took 0.3 seconds.

There is a clear slowdown occuring here. Including startup costs the time to make a new branch is doubled by adding the branch to the repository. However as the repository is 16000 times the size, the scaling factor (2/16000) is pretty darn good. I’m stoked at this result, as I think it demonstrates just what the underlying pack store is capable of. We are working on streamlining the upper layers of bzr to make better and better use of the underlying store. For instance, John Meinel has just done this for ‘bzr missing’ and ‘bzr uncommit’.

Now I must go, time for breakfast!