[darcs-users] Darcs and the HTTP library

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 So, a while ago I mentioned that it'd be nice to scrap the libwww/curl/other-web-binding bindings since they made the configuration and building and documentation more complex, and led to not a few end-user problems, and we had no control or influence with them. The obvious alternative was Haskell's HTTP library, but aside from issues with proxies and SSH, one major objection was performance. As everyone knows, HTTP was [Char]-based and consequently horribly slow and and memory-inefficient, and it would've hurt badly Darcs's performance. Which really doesn't need any hurting. So I'm happy to announce that raw performance, at least, no longer seems to be a problem! Just a few days ago HTTP-4000.0.0 was released to Hackage http://hackage.haskell.org/cgi-bin/hackage-scripts/package/HTTP and its summary reads: "A library for client-side HTTP, version 2. Rewrite of existing HTTP package to allow overloaded representation of HTTP request bodies and responses. Provides three such instances: lazy and strict ByteString, along with the good old String. Inspired in part by Jonas Aadahl et al's work on ByteString'ifying HTTP a couple of years ago. Git repository available at http://code.galois.com/HTTPbis.git" (It doesn't seem to've been announced, but dons mentioned it to me on Reddit.) Naturally, one wonders how fast this rewrite is. Fortunately, the homepage http://www.haskell.org/http/ provides an example 'get.hs'. I installed the new HTTP, compiled get.hs with it, and ran a bulk download with it: gwern at craft:33333~>time wget -q http://www.haskell.org/ghc/dist/current/dist/ghc-6.7.20070401-i386-unknown-linux.tar.bz2 && time ./get http://www.haskell.org/ghc/dist/current/dist/ghc-6.7.20070401-i386-unknown-linux.tar.bz2 > ghc.bz2 && diff ghc-6.7.20070401-i386-unknown-linux.tar.bz2 ghc.bz2 && du -h ghc-6.7.20070401-i386-unknown-linux.tar.bz2 ghc.bz2 && rm ghc-6.7.20070401-i386-unknown-linux.tar.bz2 ghc.bz2 =wget -q 0.06s user 0.43s system 2% cpu 23.032 total ./get > ghc.bz2 3.10s user 0.67s system 15% cpu 24.518 total 22M ghc-6.7.20070401-i386-unknown-linux.tar.bz2 22M ghc.bz2 Note that in this iteration 'get' is only 1 second slower. The files are identical, as diff reports, and they aren't empty files either, they are the right size, as du reports. On some runs, get is faster and on other runs, wget is faster. There seems to be a weak trend of get being a few seconds slower. I will note in its defense that it's not writing to a file like wget, but printing to stdout; this could be slowing it down. I don't recommend trying to switch to HTTP right now, because as I said, I have no idea whether HTTP can handle Darcs's SSH and proxy needs. But this is worth noting for the future. - -- gwern -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEAREKAAYFAklN090ACgkQvpDo5Pfl1oIX6QCeK14nKb9yB4Rx2f5xx84JcxIw YP4An3mi90h/fVizvjDHm00FDBVqj3Dg =Ram2 -----END PGP SIGNATURE-----