Blog Archive

Thursday, February 12, 2009

Dependencies in Hackage, revisited

In a previous post I described how to scrape the Hackage website to do reverse lookups on package dependency data for packages hosted on Hackage.

With the release of the new HTTP library (version 4000) that code doesn't work anymore. This post presents a different solution to the problem.

Instead of pulling data out of html documents, we instead load and parse the local .tar file that cabal-install uses to do its own dependency chasing.

You'll need tar and utf8-string from Hackage.

First, the necessary imports:

> import Data.Maybe
> import Data.List

> import Codec.Archive.Tar

> import Data.ByteString.Lazy (ByteString)
> import qualified Data.ByteString.Lazy as BS

> import qualified Data.ByteString.Lazy.UTF8 as UTF8

> import System.IO
> import System.Environment

> import Distribution.Text
> import Distribution.Package
> import Distribution.PackageDescription
> import Distribution.PackageDescription.Parse
And now the 'main' method.

The first argument is the name of the package you're checking dependencies for, and the second argument is the path to the Hackage index tar-file (for me this is ~/.cabal/packages/hackage.haskell.org/00-index.tar).

> main :: IO ()
> main = do
> [target,tarball] <- getArgs
> withFile tarball ReadMode $ \h -> do
> contents <- BS.hGetContents h
> let matches = matchesFromIndex contents (== target)
> sequence_ $ map (print . disp) matches

And then we have the function which, given the contents of the tar-file as a ByteString, returns back the list of PackageIds which depend on the indicated package.

> matchesFromIndex :: ByteString -> (String -> Bool) -> [PackageId]
> matchesFromIndex index p =

> let tarchive = readTarArchive index
> cabalFiles = map UTF8.toString $ findCabalEntries tarchive
> parseResults = map parsePackageDescription cabalFiles
> gPckgDiscs = okayOnly parseResults
> matches = filter (match p) gPckgDiscs

> in map packageId matches

> okayOnly :: [ParseResult a] -> [a]
> okayOnly = mapMaybe fromOkay
> where fromOkay (ParseOk _ a) = Just a
> fromOkay _ = Nothing

> -- Does this package have a dependency which matches our
> -- query?
> match :: (String -> Bool) -> GenericPackageDescription -> Bool
> match p pkg = any (matchDep p) (gPckgDeps pkg)

> -- Does this dependency match our query?
> matchDep :: (String -> Bool) -> Dependency -> Bool
> matchDep p (Dependency (PackageName name) _) = p name
There's a bit of black-magic going on here - I don't entirely understand the structure of the new 'library' and 'executable' sections of the .cabal file, but I scrape everything out which has the right type.

> gPckgDeps :: GenericPackageDescription -> [Dependency]
> gPckgDeps pkg = normalDeps ++ libDeps ++ execDeps
> where
> normalDeps = buildDepends $ packageDescription pkg

> libDeps = case condLibrary pkg of
> Nothing -> []
> Just cndTree -> depsFromCndTree exLibDeps cndTree

> execDeps = concatMap (depsFromCndTree exExecDeps . snd)
> (condExecutables pkg)

> exLibDeps = pkgconfigDepends . libBuildInfo
> exExecDeps = pkgconfigDepends . buildInfo

> depsFromCndTree f tree =
> let x = condTreeData tree

> parts = condTreeComponents tree
> mdlTrees = map mdl parts
> thrdTrees = mapMaybe thrd parts

> trees = mdlTrees ++ thrdTrees


> in f x ++
> condTreeConstraints tree ++
> concatMap (depsFromCndTree f) trees

> where mdl (_,x,_) = x
> thrd (_,_,x) = x

And this is the bit which takes the decoded tar-file and returns back the entries which look like they could be .cabal files.

> findCabalEntries :: TarArchive -> [ByteString]
> findCabalEntries TarArchive{archiveEntries = xs} = mapMaybe go xs

> where go :: TarEntry -> Maybe ByteString
> go x | fileType x /= TarNormalFile = Nothing
> | isBoringName (fileName x) = Nothing
> | otherwise = Just $ entryData x

> fileType = tarFileType . entryHeader
> fileName = tarFileName . entryHeader

> isBoringName = not . isSuffixOf ".cabal"


Not too shabby.

No comments:

Listening:

Watching:

  • House
  • Ride Back