[darcs-devel] darcs-2 performs really well for the "darcs get" use case

Folks: We use darcs to manage our source code in the http://allmydata.org project (it is an open source, secure, decentralized file system). Our trunk repository [1] currently has 2,484 patches in it. The current version of the source code has 269 files, at a total of 13 MiB bytes (some of the files are binaries) and around 48,000 lines in the non-binaries. The _darcs/patches directory (which contains all of the patches, gzipped) takes 40 MiB of disk space. We have automated builds and tests using buildbot, so this was an opportunity for me to benchmark different versions of darcs. The buildslave that I used is a Celeron Coppermine at 564 MHz running Ubuntu Dapper [exhibit 2]. Originally it was running darcs-1.0.5 (that's what comes with Ubuntu Dapper), and a "darcs get --partial" of our source code over HTTP took 286 seconds [exhibit 3]. Next I installed darcs-1.0.9 -- the final release in the darcs-1 line. A "darcs get --partial" took 308 seconds [exhibit 4]. (I didn't try this experiment enough times to determine if the difference between darcs-1.0.5 and darcs-1.0.9 was merely jitter in the network or the machine load.) Next I installed darcs-2.0.0. A "darcs get --partial" took 93 seconds [exhibit 5]. Next I configured it to do its darcs get from a hashed-format repository instead of an old darcs-1-format repository as described in the darcs manual [6]. A "darcs get --partial" took 6.47 seconds [exhibit 7]. Next I configured it to use a "global cache" as described in the darcs manual [8]. The global cache was not populated yet, of course, so the next "darcs get --partial" did not benefit from it, and indeed took 7.19 seconds to run and to populate the global cache [exhibit 9]. Finally, I ran it again with the global cache having been populated in the previous run. This time "darcs get --partial" took 3.85 seconds [exhibit 10]. Morals of the story: 1. Upgrade from darcs-1 to darcs-2. 2. Starting using hashed-format repositories. 3. If you don't mind having only a partial copy of history, in order to have faster "darcs get", then use "darcs get --lazy" (which is the preferred spelling for "darcs get --partial" in darcs-2). 4. Whether or not you are using --lazy, enable a global cache. A global cache can speed up other operations in addition to "get", including working on different branches. 5. If you have a workload that is important to you other than "darcs get", then try an experiment like this one on your workload and report your results to darcs-users at darcs.net. :-) Regards, Zooko [1] http://allmydata.org [2] http://allmydata.org/buildbot/waterfall? builder=dapper&last_time=1208899391 [3] http://allmydata.org/buildbot/builders/dapper/builds/1464/steps/ darcs/logs/stdio [4] http://allmydata.org/buildbot/builders/dapper/builds/1466/steps/ darcs/logs/stdio [5] http://allmydata.org/buildbot/builders/dapper/builds/1467/steps/ darcs/logs/stdio [6] http://darcs.net/manual/node7.html#SECTION00740000000000000000 [7] http://allmydata.org/buildbot/builders/dapper/builds/1468/steps/ darcs/logs/stdio [8] http://darcs.net/manual/node5.html#SECTION00510000000000000000 [9] http://allmydata.org/buildbot/builders/dapper/builds/1469/steps/ darcs/logs/stdio [10] http://allmydata.org/buildbot/builders/dapper/builds/1470/steps/ darcs/logs/stdio