[Haskell-cafe] Parsing cabal files to calculate average number of dependencies

Athas on #haskell wondered how many dependencies the average Haskell package had. I commented that it seemed like some fairly simple scripting to find out, and as these things tend to go, I wound up doing a complete solution myself. First, we get most/all of Hackage locally to examine, as tarballs: for package in `cabal list | grep '\*' | tr -d '\*'`; do cabal fetch $package; done Then we cd .cabal/packages/hackage.haskell.org Now we can run a command which extracts the .cabal file from each tarball to standard output: find . -name "*.tar.gz" -exec tar --wildcards "*.cabal" -Oxf {} \; We could grep for 'build-depends' or something, but that gives unreliable dirty results. (>80k items, resulting in a hard to believe 87k total deps and an average of 27 deps.) So instead, we use the Cabal library and write a program to parse Cabal files & spit out the dependencies, and we feed each .cabal into that: find . -name "*.tar.gz" -exec sh -c 'tar --wildcards "*.cabal" -Oxf {} | runhaskell ~/deps.hs' \; And what is deps.hs? Turns out to be surprisingly easy to parse a String, extract the Library and Executable AST, and grab the [Dependency] field, and then print it out (code is not particularly clean): import Distribution.Package import Distribution.PackageDescription import Distribution.PackageDescription.Parse main :: IO () main = do cbl <- getContents let desc = parsePackageDescription cbl case desc of ParseFailed _ -> return () ParseOk _ d -> putStr $ unlines $ map show $ map (\(Dependency x _) -> x) $ extractDeps d extractDeps :: GenericPackageDescription -> [Dependency] extractDeps d = ldeps ++ edeps where ldeps = case (condLibrary d) of Nothing -> [] Just c -> condTreeConstraints c edeps = concat $ map (condTreeConstraints . snd) $ condExecutables d So what are the results? (The output of one run is attached.) I get 18,134 dependencies, having run on 3,137 files, or 5.8 dependencies per package. -- gwern http://www.gwern.net -------------- next part -------------- A non-text attachment was scrubbed... Name: deps.txt.gz Type: application/x-gzip Size: 36515 bytes Desc: not available URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20110701/c195722d/attachment.bin>