FS

Editing XML in Haskell

noreply@blogger.com (Antoine) — Tue, 28 Dec 2010 01:53:00 +0000

edit: Here's an hpaste for the solution I eventually came up with: http://hpaste.org/42628/xml_cursor_monad

I recently found myself wanting to make small tweaks to an XML file not under my control before processing it in my Haskell app.

My end goal would be a way to a) declare what in the XML I want to edit and then b) apply a function to the located element and have it update in place.

The hxt (Haskell Xml Toolkit) has a lot of pieces, and includes and XPath parser, so I looked at it first. But like XSLT, hxt seems geared towards XML processing - not editing. It has tools for applying an translation recursively through a tree, but that requires finding an element based on a predicate on the element, not based on it's location within the document. hxt does let you extract elements based on location, but I don't see how to put the original document back together again. Maybe I'm missing something.

Then I noticed that the xml package (sometimes referred to as xml-light) has a Cursor written for it! So all I need to do is navigate the cursor down to where I need it, apply the update function and then I'm done. That's declarative enough for me.

There were two problems with this:

A lot of the cursor manipulation functions return maybe types
I didn't feel comfortable composing functions on cursors - if a sub-function goes off the deep end it could have left the cursor anywhere in the document DOM

So I did what any other Haskell programmer would do - I wrote my own XML editing monad to fix and then encapsulate the problems above.

> data Update a = ...

The Update type has instances for Monad, Functor, and Applicative - and to handle the failing traversal functions it has instances for Alternative and MonadPlus.

It has the primitive operations:


> runUpdate :: Cursor -> Update a -> Maybe (a, Cursor)

> perform :: (Cursor -> Maybe Cursor) -> Update ()

> asks :: (Cursor -> a) -> Update a

These are used to wrap up all of the functions from Text.XML.Light.Cursor in a straightforward way.

To give me more control over the composition of cursor update there are also the following primitives:


> sandbox :: Update a -> Update a

> run :: Update a -> Update ()

The function 'sandbox' executes the passed in action, but contains it to the current scope of the cursor - the action may not access the parent or siblings of the current node. In addition, the cursor is returned to its current position regardless of where the passed in action left it.

The function 'run' is the same as 'sandbox' - except that if the passed action fails we pretend that nothing happened.

So go nuts - you can now declare edits and traversals without worrying about how to fit them in to the bigger picture. We have combinators for that.

Am I off the deep end with this? Are there other tools on Hackage I should be using?

Adventures in Parsec

noreply@blogger.com (Antoine) — Mon, 28 Dec 2009 19:56:00 +0000

Part I - An introduction to Parsec

The basic type of a monadic parser is a function from an input string to a parse result, along with the rest of the string to be parsed.

One example is ReadS a = String -> [(a,String)], where returning a null list indicates no parse, and returning multiple values allows a parser to indicate ambiguity.

There are a few deficiencies in this data structure:

The representation of ambiguity can lead to large space leaks, as the traditional combinators allow for unlimited backtracking
It is difficult to wedge good error reporting into this setup.

The parsec parser, instead of returning a list of parses returns one of four results of the parse:

I errored and did not consume input
I errored and did consume input
I succeded and did not consume input
I succeded and did consume input

The idea is that once a parser consumes any input (that is, looks ahead more than one charecter) we prohibit backtracking. This limitation on backtracking means that we can drop the beging of the token-stream sooner, allowing it to be cleaned up by the garbage collector. There are also advantages to error reporting to making these distinctions.

Partridge & Wright seem to be the first to have introduced this splitting of the return value of a parse, but I can't find a non-paywalled version of their paper "Predictive parser combinators need four values to report errors." It seems like exactly the sort of paper I should be reading, but I'm also not sure on how to get in touch with the authors. Edit: Thanks to Chung-chieh Shan for sending me a copy of Partridge & Wright.

We still have a problem - we still need to hang on to the input string until we know that the parser succeeded or failed. In many cases we know that the parser consumes input long before we know that it was successful.

Parsec combines two data structures to return these four values:


data Consumed a = Consumed a | Empty a

data Reply a = Ok a State Message | Error Message

type Parser a = State -> Cosumed (Reply a)

Now our choice combinator can determine if a parser consumes input before we finish the parse - this means that we allow the GC to drop the head of the input as soon as the parser consumes any of it, solving the mentioned memory leak.

Part II - My obsession with functionalizing monad transformers

I have a bit of an obsession with the dual nature of data and functions, and converting algebraic data structures to their equivalent function form to see what I can uncover.

For example, in the mtl the ErrorT monad transformer is defined as follows:


newtype ErrorT e m a = ErrorT {
  runErrorT :: m (Either e a)
}

throwError :: e -> ErrorT e m a
throwError = ErrorT . return . Left

When we give this monad semantics, in between each (>>=) we check the value on the LHS, ad if it's in Left we short circuit what's on the RHS so that the error value returns all the way to the end of the computation. This means at each stage of the computation we have to do a case analysis of the inner Eitehr value. Since the inner type is 'm (Either e a)' we also need to perform a (>>=) operation in the inner monad to perform (>>=) in ErrorT.

An equivalent type is:


newtype ErrorT e m a = ErrorT {
  unError :: forall r . (e -> m r)  -- error!
                       -> (a -> m r)  -- success!
                       -> m r
}

The second function is our success continuation - it's passed in to an action, and called if the action successfully returns a value -


return :: a -> ErrorT e m a
return x = ErrorT $ \_ successK ->
   successK x

The first function is the error continuation, and is called if an action needs to terminate abnormally:


throwError :: e -> ErrorT e m a
throwError e = ErrorT $ \errorK _ ->
              errorK e

We then weave the continuations together in the implementation of (>>=) to get the
short-curcuiting we want:


(>>=) :: ErrorT e m a -> (a -> ErrorT e m b) -> ErrorT e m b
m >>= f = ErrorT $ \topErrorK topSuccessK->
   unError m topErrorK $ \x ->
   unError (f x) topErrorK topSuccessK

The LHS is given our top-level erorr handler to call if it errors out. If it succeds, it calls a lambda which evaluates the RHS. The RHS is given the same error handler as the LHS, and the success continuation for the RHS is the success continuation for the expression as a whole. So successes move from left to right, but if there's an error it can only go to one place.

Interesting points to note:

There's no case analysis on data structures - short circuiting works because every action gets passed the same error continuation (unless we implement a 'catch' combinator).
There are no constraints on the nature of the 'm' type variable. ErrorT is a monad independent of whatever it's wrapping.

I haven't benchmarked whether or not this is faster for anything, but I find the whole thing a lot of fun.

Part III - A faster Parsec 3

Parsec version 3 was released on hackage a bit back, and it improved on the prior version in two ways:

1) It was a monad transformer. This is pretty fun - as an exercise I wrote a unification engine in a monad, and then wrapped parsec around it. When it hit a syntax error in equality terms it could print the in-progress results of performing unification up until that point. Very fun.

2) It was parameterized over the input type. The previous version required a list of tokens as input.

The downside is that (when using the non-transformer compatibility layer) parsec-3 is 1.8x slower in some benchmarks as parsec-2.

But how can it be made faster without losing the new abstractions? The first thing I tried was to have parsec be split into two parsers - one transformer and one not, with two implementations of the core combinators. The core combinators would be moved into a type-class, and the higher-level combinators would then be polymorphic over either of them. Foks writing parsers could write them polymorphic, and then the folks running the parsers only pay for the new abstractions if they need it.

It worked, but the type-class itself didn't make any sense - it was more of a "put what I need over in this bucket" job than a careful consideration of what the core primitives of a monadic parser like parsec truly are. I also seem to remember that in introduced a problem in the compatibility layer, but it's been a while since I did that testing. You can see the haddocks here:

Text.Parsec.Class
Text.Parsec.Core
Text.Parsec.Combinator

This is also a radical restructuring of the parsec API - new constraints and new modules, and changing the meaning of another. Lots of fun.

Another approach which is a much smaller change to the visible part of parsec but is a much more radical change to the inner workings is to do to ParsecT what we did to ErrorT - return via continuations rather than an algebraic data type. Where ErrorT needed two continuations, ParsecT requires four:


newtype ParsecT s u m a
  = ParsecT {unParser :: forall b .
               State s u
            -> (a -> State s u -> ParseError -> m b) -- consumed ok
            -> (ParseError -> m b)                   -- consumed error
            -> (a -> State s u -> ParseError -> m b) -- empty ok
            -> (ParseError -> m b)                   -- empty error
            -> m b
           }

When the parser errors we call one of the error continuations with the information about the error. When the parse succeeds we call one of the success continuations, passing along the parsed value and the new state of the parse.

And this is practically as fast as parsec-2 for folks not using the new abstractions in parsec-3. I believe it's because we no longer pay for the abstraction in the core cobinators - none of the primitives or combinators place any constraints on the type of 'm', just as for the continuation-based ErrorT transformer.

But what of Patridge & Wright's space leak? Where is the laziness introduced in the Parsec technical report? I've gone from the nested structure back to a flat structure. How can this be as fast as the Parsec of the report if I've re-introduced the space leak?

It was the case analisys on the not-lazy-enough return value of the parser which introduced the space leak, but we don't have have that. We just pass continuations in to the parsers, which may call them as they will. As long as the core primitves aren't capable of calling the "I haven't consummed input" continuations after they have consumed input then we're free to garbage collect those continuations as well as the consummed bits of input. Space leak averted.

The darcs repo for the continuation based parsec is here: http://community.haskell.org/~aslatter/code/parsec/cps/

Appendix A: Further adventures in ErrorT

In case you needed convincing that the above formulation of ErrorT is equivalent to that in the mtl:


catch :: ErrorT e m a -> (e -> ErrorT e m a) ->ErrorT e m a
catch m handler = ErrorT $ \topErrorK topSuccessK ->
 let errorK e = unError (handler e) topErrorK topSuccessK
 in unError m errorK topSuccessK

runErrorT :: Monad m => ErrorT e m a -> m (Either e a)
runErrorT (ErrorT f) = f errorK successK
 where successK = return . Right
       errorK = return . Left

Using Haskeline

noreply@blogger.com (Antoine) — Sat, 18 Apr 2009 19:45:00 +0000

Earlier today I decided to unearth an old project of mine - figuring that the best way to learn two languages was to implement one in the other, I wrote a MUMPS interpreter in Haskell. I was learning MUMPS for work, and Haskell for fun.

Back when I wrote it, I used readline in the REPL part of the interpreter - during the cleanup I wanted to move away from readline as GHC doesn't ship with it any more, and sometimes it can be a pain to install on its own. So I switched to Haskeline. It doesn't ship with GHC either, but it's proven easier for me to install.

Haskeline has got a really friendly API, with all of the functions operating inside the InputT m monad transformer. "Great," I think, "I can just pile this on top of my existing monad transformers stack in the interpreter!"

All was not so simple, as InputT has it's own instance of MonadState and MonadReader which allows the user of the library to peer into the guts of the implementation. But I didn't want to monkey with my text entry, I just wanted it to work and get out of my way, and I wanted the rest of my code to use the MonadState instance further down the stack that I had already set up.

So I wrote a small wrapper for Haskeline that's more friendly to mtl-style monad transformer composition. As written, it only composes with MonadIO and MonadState, but it would be straightforward to do more.

My wrapper uses HaskelineT instead of InputT, and exposes the same core functions as Haskeline (except for withInterrupt). It doesn't do anything I couldn't do by peppering lifts all over the place, but this way feels a but cleaner to me - Haskeline keeps its workings to itself, and I don't have to think about the order of the layered monad transformers.


{-# LANGUAGE FlexibleInstances
           , MultiParamTypeClasses
           , UndecidableInstances
           , GeneralizedNewtypeDeriving
  #-}

import qualified System.Console.Haskeline as H
import System.Console.Haskeline.Completion
import System.Console.Haskeline.MonadException

import Control.Applicative
import Control.Monad.State

newtype HaskelineT m a = HaskelineT {unHaskeline :: H.InputT m a}
 deriving (Monad, Functor, Applicative, MonadIO, MonadException, MonadTrans, MonadHaskeline)

runHaskelineT :: MonadException m => H.Settings m -> HaskelineT m a -> m a
runHaskelineT s m = H.runInputT s (unHaskeline m)

runHaskelineTWithPrefs :: MonadException m => H.Prefs -> H.Settings m -> HaskelineT m a -> m a
runHaskelineTWithPrefs p s m = H.runInputTWithPrefs p s (unHaskeline m)

class MonadException m => MonadHaskeline m where
    getInputLine :: String -> m (Maybe String)
    getInputChar :: String -> m (Maybe Char)
    outputStr :: String -> m ()
    outputStrLn :: String -> m ()


instance MonadException m => MonadHaskeline (H.InputT m) where
    getInputLine = H.getInputLine
    getInputChar = H.getInputChar
    outputStr = H.outputStr
    outputStrLn = H.outputStrLn


instance MonadState s m => MonadState s (HaskelineT m) where
    get = lift get
    put = lift . put

instance MonadHaskeline m => MonadHaskeline (StateT s m) where
    getInputLine = lift . getInputLine
    getInputChar = lift . getInputChar
    outputStr = lift . outputStr
    outputStrLn = lift . outputStrLn

MaybeT - The CPS Version

noreply@blogger.com (Antoine) — Sun, 15 Feb 2009 19:24:00 +0000

> {-# LANGUAGE Rank2Types #-}
> import Control.Monad

I think I finally understand writing code into continuation passing style. I've understood it at an academic level for some time - but that's different from being able to write the code.

This post presents a different implementation of the Maybe monad transformer - usually presented as so:


data MaybeT m a = MaybeT {runMaybeT :: m (Maybe a)}

which can be used to add the notion of short-circuiting failure to any other monad (sortof a simpler version of ErrorT from the MTL).
I first came across MaybeT in a page on the Haskell Wiki.
This presentation of MaybeT uses the Church encoding of the data-type:

> newtype MaybeT m a = MaybeT {unMaybeT :: forall b . m b -> (a -> m b) -> m b}

Note the similarity to the Prelude function maybe. We can unwrap the transformer like so:


> runMaybeT :: Monad m => MaybeT m a -> m (Maybe a)
> runMaybeT m = unMaybeT m (return Nothing) (return . Just)

This runMaybeT should be a drop-in replacement for the old one.
The advantage here is that we can write the Monad and MonadPlus instances without calling bind or return in the underlying monad m, and without doing any case analysis on Just or Nothing values:

> instance Monad (MaybeT m) where
>     return x = MaybeT $ \_ suc -> suc a
> 
>     m >>= k = MaybeT $ \fail suc ->
>               unMaybeT m fail $ \x ->
>               unMaybeT (k x) fail suc
>
>     fail _ = mzero

> instance MonadPlus (MaybeT m) where
>     mzero = MaybeT $ \fail _ -> fail
>
>     m `mplus` n = MaybeT $ \fail suc ->
>                   unMaybeT m (unMaybeT n fail suc) suc

It's just a matter of threading the failure and success continuations to the right place at the right time.
To show that this is equivalent to the old implementation, here's a re-write of the old MaybeT data constructor from above:


> fromMaybe :: Monad m => m (Maybe a) -> MaybeT m a
> fromMaybe m = MaybeT $ \fail suc -> do
>                res <- m
>                case res of
>                  Nothing -> fail
>                  Just x -> suc x

So anything you can do with the other version, you can do with this version. And for most things it should be a drop-in replacement.

Dependencies in Hackage, revisited

noreply@blogger.com (Antoine) — Fri, 13 Feb 2009 02:00:00 +0000

In a previous post I described how to scrape the Hackage website to do reverse lookups on package dependency data for packages hosted on Hackage.

With the release of the new HTTP library (version 4000) that code doesn't work anymore. This post presents a different solution to the problem.

Instead of pulling data out of html documents, we instead load and parse the local .tar file that cabal-install uses to do its own dependency chasing.

You'll need tar and utf8-string from Hackage.

First, the necessary imports:


> import Data.Maybe
> import Data.List

> import Codec.Archive.Tar

> import Data.ByteString.Lazy (ByteString)
> import qualified Data.ByteString.Lazy as BS

> import qualified Data.ByteString.Lazy.UTF8 as UTF8

> import System.IO
> import System.Environment

> import Distribution.Text
> import Distribution.Package
> import Distribution.PackageDescription
> import Distribution.PackageDescription.Parse

And now the 'main' method.

The first argument is the name of the package you're checking dependencies for, and the second argument is the path to the Hackage index tar-file (for me this is ~/.cabal/packages/hackage.haskell.org/00-index.tar).


> main :: IO ()
> main = do
>  [target,tarball] <- getArgs
>  withFile tarball ReadMode $ \h -> do
>        contents <- BS.hGetContents h
>        let matches = matchesFromIndex contents (== target)
>        sequence_ $ map (print . disp) matches

And then we have the function which, given the contents of the tar-file as a ByteString, returns back the list of PackageIds which depend on the indicated package.


> matchesFromIndex :: ByteString -> (String -> Bool) -> [PackageId]
> matchesFromIndex index p =

>  let tarchive = readTarArchive index
>      cabalFiles = map UTF8.toString $ findCabalEntries tarchive
>      parseResults = map parsePackageDescription cabalFiles
>      gPckgDiscs = okayOnly parseResults
>      matches = filter (match p) gPckgDiscs

>  in map packageId matches

> okayOnly :: [ParseResult a] -> [a]
> okayOnly = mapMaybe fromOkay
>  where fromOkay (ParseOk _ a) = Just a
>        fromOkay _ = Nothing


> -- Does this package have a dependency which matches our
> -- query?
> match ::  (String -> Bool) -> GenericPackageDescription -> Bool
> match p pkg = any (matchDep p) (gPckgDeps pkg)

> -- Does this dependency match our query?
> matchDep :: (String -> Bool) -> Dependency -> Bool
> matchDep p (Dependency (PackageName name) _) = p name

There's a bit of black-magic going on here - I don't entirely understand the structure of the new 'library' and 'executable' sections of the .cabal file, but I scrape everything out which has the right type.


> gPckgDeps :: GenericPackageDescription -> [Dependency]
> gPckgDeps pkg = normalDeps ++ libDeps ++ execDeps
>  where
>   normalDeps = buildDepends $ packageDescription pkg

>   libDeps = case condLibrary pkg of
>               Nothing -> []
>               Just cndTree -> depsFromCndTree exLibDeps cndTree

>   execDeps = concatMap (depsFromCndTree exExecDeps . snd)
>                        (condExecutables pkg)

>   exLibDeps = pkgconfigDepends . libBuildInfo
>   exExecDeps = pkgconfigDepends . buildInfo

> depsFromCndTree f tree =
>   let x = condTreeData tree
    
>       parts = condTreeComponents tree
>       mdlTrees = map mdl parts
>       thrdTrees = mapMaybe thrd parts

>       trees = mdlTrees ++ thrdTrees


>   in f x ++
>      condTreeConstraints tree ++
>      concatMap (depsFromCndTree f) trees

>  where mdl (_,x,_) = x
>        thrd (_,_,x) = x

And this is the bit which takes the decoded tar-file and returns back the entries which look like they could be .cabal files.


> findCabalEntries :: TarArchive -> [ByteString]
> findCabalEntries TarArchive{archiveEntries = xs} = mapMaybe go xs

>  where go :: TarEntry -> Maybe ByteString
>        go x | fileType x /= TarNormalFile = Nothing
>             | isBoringName (fileName x) = Nothing
>             | otherwise = Just $ entryData x

>        fileType = tarFileType . entryHeader
>        fileName = tarFileName . entryHeader

>        isBoringName = not . isSuffixOf ".cabal"

Not too shabby.

Haskell Snippets

noreply@blogger.com (Antoine) — Sat, 21 Jun 2008 06:13:00 +0000

I'm a huge fan of the function mapMaybe, but once I move from the 'Maybe' monad into something more complex (such as ReaderT r Maybe) things become tricky.

First, what is mapMaybe?

Its type is: (a -> Maybe b) -> [a] -> [b]

It maps the input function over the list, and drops any values which evaluate to nothing. It's like a combination of map and filter, where the input function is given the option to either transform the input or filter it out.

But then I needed more information threaded around in my functions, and the types went from a -> Maybe b to a -> ReaderT r Maybe b.

So I needed:

> mapAlt :: Alternative f => (a -> f b) -> [a] -> f [b]

It's just like mapMaybe, except it works for any Alternative functor.

The output is still in the functor f so I can have it work for effectful monads and such, but it will always return a value (even if it's the empty list).

Here's the implementation:

> mapAlt f xs = go xs
>  where go [] = pure []
>        go (y:ys) = (pure (:) <*> f y <*> go ys)
>                <|> go ys

Links:

Hurrah for simple, useful functions.

HTML Scraping with TagSoup

noreply@blogger.com (Antoine) — Sun, 10 Feb 2008 23:04:00 +0000

Earlier today I wanted to know the packages on Hackage which stated a dependency on Parsec, so I wrote a command-line utility to do it. This post presents the utility.

The plan is simple: grab http://hackage.haskell.org/packages/archive/pkg-list.html, extract all of the links which look like links to packages, and then for each of the package-description pages find out if the dependency list includes parsec. If it does, print the package name.

First, a few preliminaries:


> import Data.Maybe
> import Network.HTTP
> import Network.URI
> import System.Environment
> import Text.HTML.TagSoup
> import Text.Regex.Base
> import Text.Regex.Posix.String
> import Text.Regex.Posix.Wrap

You could probably use a different Regex package if you wanted to without too much trouble.

First up, a few strings broken out of the body of the program for convenience should they need changing.


> name = "hackage-dep"
> version = "0.1.0"


> baseURIString = "http://hackage.haskell.org"
> packagesURI =
>    fromJust $ parseURI $ baseURIString ++ "/packages/archive/pkg-list.html"
> basePath = "/cgi-bin/hackage-scripts/package/"

The function parseURI comes from the Network.URI module. It converts a String to the URI datatype used by the Network.* modules.

Next, I need a few functions to fetch an HTML document given a URI:


> mkSimpleGet :: URI -> Request
> mkSimpleGet uri =
>   Request uri GET [Header HdrUserAgent (name ++ " v" ++ version)] []

> simpleGet :: URI -> IO (Result Response)
> simpleGet = simpleHTTP . mkSimpleGet

> body :: Result Response -> Either String String
> body (Right (Response (2,_,_) _ _ str)) = Right str
> body (Right (Response code _ _ _)) = Left $ printCode code
> body (Left e) = Left $ show e

> printCode :: ResponseCode -> String
> printCode (a,b,c) = show a ++ show b ++ show c

> errorString :: String -> String -> String
> errorString uri err =
>   "Error getting " ++ uri ++ "\n" ++ "Error: " ++ err

The two interesting functions here are simpleGet and body: simpleGet performs an HTTP GET request with the passed-in URI, and body extracts the body from the response if it was successful.
Now we can start on the HTML manipulation.


> type HTML = String

> links :: HTML -> [Tag]
> links = filter (~== TagOpen "a" []) . parseTags

links converts an HTML document into a list of link tags, using TagSoup.

And then the function packageInfo extracts the package name from a link to that package.


> type Package = String

> packageInfo :: Tag -> Maybe Package
> packageInfo (TagOpen "a" [])      = Nothing
> packageInfo t@(TagOpen "a" attrs) =
>     case fromAttrib "href" t of
>       [] -> Nothing
>       path -> info path
> packageInfo _ = Nothing

> packageName = "^" ++ basePath ++ "(.+)$"

> info :: String -> Maybe Package
> info str =
>     case str =~ packageName of
>       (_,_,_,[]) -> Nothing
>       (_,_,_,[package]) -> Just package
>       (_::(String,String,String,[String])) -> Nothing

And once I have a list of package names, I'll want to grab the web-page describing the package:


> packageURI :: Package -> URI
> packageURI =
>   fromJust . parseURI . ((baseURIString ++ basePath) ++)

> packageGet :: Package -> IO (Result Response)
> packageGet = simpleGet . packageURI

The idea is that I can call packageGet on an extracted Package, and then I can use the previously defined body function to get the HTML out of the HTTP response.

Now, let's get on with the main function:


> main :: IO ()
> main = do
>   arg <- (do {[arg] <- getArgs; return arg})
>      `catch`
>      (\_ -> error "Requires a single command line argument")
>   res <- simpleGet packagesURI
>   case body res of
>     Left str -> putStrLn $ errorString (show packagesURI) str
>     Right html -> findDeps (=~ arg) $ filterJust $ map packageInfo $ links html

The filterJust $ map packageInfo $ links html bit extracts a list of package names from the HTML list pulled off of hackage. The function findDeps takes this list along with a passed in testing function and prints out which packages depend on the package specified at the command line. The passed-in testing function is just a regex-match based on the single command-line argument.


> filterJust :: [Maybe a] -> [a]
> filterJust xs = [x | Just x <- xs]

> findDeps :: (String -> Bool) -> [Package] -> IO ()
> findDeps p ps = mapM_ (printIfDep p) ps

> printIfDep :: (String -> Bool) -> Package -> IO ()
> printIfDep p pTest = do
>   res <- packageGet pTest
>   case body res of
>     Left e     -> putStrLn $ errorString pTest e
>     Right html ->
>         if hasDep html p
>         then putStrLn pTest
>         else return ()

The function hasDep picks the "Dependencies" field out of the passed-in HTML text, and then returns true if the passed-in test returns true on any bit of string in the dependencies field.


> hasDep :: HTML -> (String -> Bool) -> Bool
> hasDep html p =
>     let tags = parseTags html
>         depTags = takeWhile (~/= (TagClose "tr")) $
>                   drop 1 $
>                   dropWhile (~/= (TagText "Dependencies")) $
>                   tags
>         depText = filterText depTags
>
>         filterText xs = [x | TagText x <- xs] :: [String]
>     in any p depText

After saving and compiling, executing ./Main parsec will (slowly) list all of the packages on Hackage which depend on Parsec. Success!

Exercise for the reader: Implement the above functionality by grabbing the 00-index.tar.gz off of Hackage instead of scraping HTML pages. This file contains all of the .cabal files for every version of every package hosted on Hackage. For bonus points cache the index on disk between calls.

Parsec as a monad transformer

noreply@blogger.com (Antoine) — Tue, 05 Feb 2008 06:18:00 +0000

The proposed Parsec3 package for Haskell has Parsec implemented as a monad transformer, which means I can do things like:


> data MyType
>     = Foo
>     | Baz
>     | Err
>  deriving Show


> parseMyType = (string "Foo" >> return Foo)
>           <|> (string "Baz" >> return Baz)


> parseNoBaz = callCC $ \k -> do
>                result <- parseMyType
>                validateResult k result
>                return result


> validateResult k Baz = k Err
> validateResult k _ = return ()


> manyNoBaz = parseNoBaz `sepBy` space


> test p s = flip runCont id (runPT p () "test" s)

Then, if I execute test manyNoBaz "Foo Foo Baz Foo" I get the result:


[Foo,Foo,Err,Foo]

This is a contrived example, but I think it's pretty neat.

Constraint synonyms in Haskell

noreply@blogger.com (Antoine) — Tue, 01 Jan 2008 19:29:00 +0000

Hello folks, and happy new year!

Earlier today I found myself writing the same sequence of long constraints on my type-signatures over and over again in a Haskell program I was working on. The program is still in flux, so that means the constraints may still change. As all of the functions call each-other, they need to have a similar set of constraints on their type signatures.

This means as the program evolves, I'll need to make a lot of similar changes all over the source file. I'm pretty lazy, so that doesn't sound like fun to me. At first I thought I could do something like this with regular type synonyms, but that requires all functions to share their entire type, not just a set of constraints.

There are a few ways I could've solved this problem:

Don't use type signatures
I'm not using any fancy type-level hacks, so the compiler doesn't really need them. But I like having them to prove to myself that I really do know what my code does, and to provide better error messages.
CPP Macros
I haven't tried this one - I just thought of it while writing this
Type Classes
Which is what this post is about

Let's say I have a number of functions whose type signatures are along the lines of:

> myFunc :: (Eq b, Show b, MyClass b, MyOtherClass b) => Int -> String -> b -> b

and I don't like typing the (Eq b, Show b, MyClass b, MyOtherClass b) part over and over again. I can define a typeclass which captures all of those constraints:

> class (Eq b, Show b, MyClass b, MyOtherClass b) => MyConstraints b

along with a rule to populate the class:

> instance (Eq b, Show b, MyClass b, MyOtherClass b) => MyContraints b

I can now re-write the type-signature for myFunc as follows:

> myFunc :: MyConstraints b => Int -> String -> b -> b

This works for the following reasons:

Memebership in the class "MyConstraints" implies membership in all of the other classes, due to the constraint on the class defintion.
Every type which satisfies the constraints is a member of the "MyConstraints" class.

As another check, if you load the module defining myFunc into GHCi and ask for its type at the interactive prompt, it will report it as

myFunc :: (Eq b, Show b, MyClass b, MyOtherClass b) => Int -> String -> b -> b

Which is exactly what I wanted.

Backwards State, or: The Power of Laziness

noreply@blogger.com (Antoine) — Sat, 01 Dec 2007 21:43:00 +0000

There's been a recent discussion of Automatic Differentiation in Haskell recently, which somehow found me reading Jerzy Karczmarczuk's paper "Lazy Time Reversal, and Automatic Differentiation," which then cited Philip Wadler's "The essence of functional programming" for the introduction of the backwards state monad, which I reproduce here because I think it's neat.

I'm going to assume that you're familiar with the Haskell state monad - in summary an action in the state monad is a function of the previous state, and produces a result paired with the next state.

The backwards state monad differs from this in that the flow of the state through the execution is revere to the flow of the results - that is, an action in the backwards state monad takes in the final value of the state and produces a result and the initial value.

This post is literate Haskell post - you should be able to copy and past it into a .lhs file and play with it in a Haskell interpreter. I use GHCi.

To that end, here's some of the up-front boilerplate so this all works:


> {-# LANGUAGE FlexibleInstances,
>              MultiParamTypeClasses,
>              RecursiveDo
>   #-}
> import Data.List
> import Control.Monad.State

An Example

Here's the exercise: Given a tree of items, transform the tree to a tree of Ints such that each element is mapped to an Int, starting at 0. If an element occurs more than once in the tree, it must be mapped to the same Int each time.

The solution given in Control.Monad.State.Lazy does a walk of the tree, and carries around a list of all of the elements seen so far using the state monad. Each node is mapped to its position in this list. That is, the first node seen is mapped to 0, the second to 1, etc..

But what if I wanted to switch that up? What if wanted the last node seen in the walk mapped to 0, the second to last mapped to 1, and so on? How much would I need to change in the already existing solution given in Control.Monad.State.Lazy?

Not much! I'd just need to use the backwards state monad, where the state flows backwards through the thread of execution.

This is what the modified solution would look like:


> data Tree a = Nil | Node a (Tree a) (Tree a) deriving (Show, Eq)
> type Table a = [a]


> numberTree :: Eq a => Tree a -> StateB (Table a) (Tree Int)
> numberTree Nil = return Nil
> numberTree (Node x t1 t2)
>        =  do num <- atomically $ numberNode x
>              nt1 <- numberTree t1
>              nt2 <- numberTree t2
>              return (Node num nt1 nt2)
>    where
>     numberNode :: Eq a => a -> State (Table a) Int
>     numberNode x
>        = do table <- get
>             (newTable, newPos) <- return (nNode x table)
>             put newTable
>             return newPos

>     nNode::  (Eq a) => a -> Table a -> (Table a, Int)
>     nNode x table
>        = case elemIndex x table of
>          Nothing -> (table ++ [x], length table)
>          Just i  -> (table, i)

And an evaluation function:


> numTree :: (Eq a) => Tree a -> Tree Int
> numTree t = evalStateB (numberTree t) []

Some test data:


> testTree = Node "Zero" (Node "One" (Node "Two" Nil Nil) (Node "One" (Node "Three" Nil Nil) Nil)) Nil

Executing numTree testTree will produce the output:
Node 3 (Node 1 (Node 2 Nil Nil) (Node 1 (Node 0 Nil Nil) Nil)) Nil
Which is exactly what we wanted!

This code is almost exactly the same as the solution given to the in-order problem in the source to Control.State.Lazy, the only changes are the use of the function evalStateB instead of the familiar evalState, and the use of the function atomically, and the StateB monad. The implementation of these will be explained bellow.

First the API, then the implementation.

The API

We have the new monad StateB s, where s is the type of the stored state.

StateB s is an instance of MonadState s, so get and put are as expected.

There is also:


> -- runStateB :: StateB s a -> s -> (a, s)
> evalStateB :: StateB s a -> s -> a
> execStateB :: StateB s a -> s -> s

which should look familiar. The trick is that the state s passed in to these functions is the final state, and the state returned is the initial state. In the example above, remember that the last element seen in the walk was given the first label, and the first element seen in the walk was given the last.

The default implementation of modify in Control.Monad.State.Class is implemented as follows:


-- modify :: MonadState s m => (s -> s) -> m ()
-- modify f = do
--     s <- get
--     put (f s)

In the StateB monad, this code will bottom-out, because of the circular data dependency of the two monadic actions - in the backwards state monad, (>>=) passes the result forward and the state backwards, which means that the above code has a nice loop where the first line grabs the updated state from the second line and tries to pass it in as an argument to the second line.

To make this work, we need a version of modify specific to StateB:


> modifyB :: (s -> s) -> StateB s ()

But if you want to modify the state and return the result, you'll need something more sophisticated:


> atomically :: State s a -> StateB s a

atomically converts an action under the normal state monad to a single action under StateB, allowing you do do complex updates to the state easily without bottoming out (using mdo notation also works).

Implementation

The base of the implementation is taken directly from Wadler's paper.

The StateB monad is almost the same as the State monad - each action of type a is a function of type \s -> (a,s). The difference is in the implementation of (>>=).

Let's start with the monad:


> newtype StateB s a = StateB {runStateB :: s -> (a,s)}

> instance Monad (StateB s) where
>     return = StateB . unitS
>     (StateB m) >>= f = StateB $ m `bindS` (runStateB . f)

Because wrapping and unwrapping the newtype annoys me, all of that is confined to the exported functions (like return and (>>=)). The functions that deal directly with the underlying type all have an 'S' suffix.


> m `bindS` k  = \s2 -> let (a, s0) = m s1
>                           (b, s1) = k a s2
>                       in  (b, s0)

> unitS a = \s2 -> (a, s2)

As you can see, the passed in state is acted on by the RHS of bindS, the intermediate state is consumed by the LHS, and the LHS produces the final state, s0. It looks too simple to work, but it does.
And the other API functions:


> execStateB m = snd . runStateB m

> evalStateB m = fst . runStateB m

> modifyB = StateB . modify'
>  where modify' f = \s -> ((), f s)

> atomically = StateB . runState

Just for funsies:


> instance Functor (StateB s) where
>     fmap f m = StateB $ mapS f (runStateB m)

> mapS f m = \s -> let (a, s') = m s in (f a, s')

> instance MonadState s (StateB s) where
>     get = StateB get'
>      where get' = \s -> (s,s)
>
>     put = StateB . put'
>      where put' s = const ((),s)

> instance MonadFix (StateB s) where
>     mfix = StateB . mfixS . (runStateB .)

> mfixS f = \s2 -> let (a,s0) = (f b) s1
>                      (b,s1) = (f a) s2
>                  in (b,s0)

The transformer

Now a treat for those of you still paying attention. I haven't really tested this, but it looks like it should work and that's good enough for me. A lot of this is in the style of the sources for Control.Monad.State.Lazy.


> newtype StateBT s m a = StateBT {runStateBT :: s -> m (a,s)}

> unitST a = \s -> return (a,s)

> m `bindST` k = \s2 -> mdo ~(a,s0) <- m s1
>                           ~(b,s1) <- k a s2
>                           return (b,s0)

> execStateBT :: Monad m => StateBT s m a -> s -> m s
> execStateBT m s = do ~(_,s') <- runStateBT m s
>                      return s'

> evalStateBT :: Monad m => StateBT s m a -> s -> m a
> evalStateBT m s = do ~(a,_)  <- runStateBT m s
>                      return a

> modifyBT :: Monad m => (s -> s) -> StateBT s m ()
> modifyBT = StateBT . modify'
>  where modify' f = \s -> return ((),f s)

> atomicallyT :: Monad m => State s a -> StateBT s m a
> atomicallyT m = StateBT $ \s-> return $ runState m s

> atomicallyTM :: Monad m => StateT s m a -> StateBT s m a
> atomicallyTM = StateBT . runStateT

> mapST f m = \s -> do ~(a,s') <- m s
>                      return (f a,s')

> liftST m = \s -> do a <- m
>                     return (a,s)

> mfixST f = \s2 -> mdo ~(a,s0) <- (f b) s1
>                       ~(b,s1) <- (f a) s2
>                       return (b,s0)

> instance Monad m => Functor (StateBT s m) where
>     fmap f m = StateBT $ mapST f (runStateBT m)

> instance MonadFix m => Monad (StateBT s m) where
>     return = StateBT . unitST
>     (StateBT m) >>= f = StateBT $ m `bindST` (runStateBT . f)    
>     fail = StateBT . const . fail

> instance MonadTrans (StateBT s) where
>     lift = StateBT . liftST

> instance MonadFix m => MonadState s (StateBT s m) where
>     get = StateBT get'
>      where get' = \s -> return (s,s)
> 
>     put = StateBT . put'
>      where put' s = const $ return ((),s)

> instance MonadFix m => MonadFix (StateBT s m) where
>     mfix = StateBT . mfixST . (runStateBT .)

Another quine

noreply@blogger.com (Antoine) — Wed, 04 Jul 2007 18:54:00 +0000

This quine is a lot like my first Haskell quine, except shorter.
(This one is technically not a quine, due to linebreaks, but it prints a quine when executed)


import System.IO
main=(putStr.map toEnum)p>>(putStr.show)p>>putStr "\n"
p=[105,109,112,111,114,116,32,83,
121,115,116,101,109,46,73,79,
10,109,97,105,110,61,40,112,
117,116,83,116,114,46,109,97,
112,32,116,111,69,110,117,109,
41,112,62,62,40,112,117,116,
83,116,114,46,115,104,111,119,
41,112,62,62,112,117,116,83,
116,114,32,34,92,110,34,10,
112,61]

Another quine, this time using the printf trick

noreply@blogger.com (Antoine) — Wed, 04 Jul 2007 03:22:00 +0000

I'm not sure how to make blogger give me a scrollbar to put code in.
You'll just have to remove linebreaks where needed.


import System.IO
import Text.Printf
main = let s = "import System.IO%cimport Text.Printf%cmain = let s = %c%s%c
in printf s (10 :: Int) (10 :: Int) (34 :: Int) s (34 :: Int) (10 :: Int)%c"
in printf s (10 :: Int) (10 :: Int) (34 :: Int) s (34 :: Int) (10 :: Int)

Hurrah for fixed points

noreply@blogger.com (Antoine) — Wed, 04 Jul 2007 02:09:00 +0000


import System.IO

-- My first haskell quine, revised

main :: IO ()
main = do (putStr . map toEnum) prog

          (putStr . breaker .  show) prog

          putStr "\n"

breaker :: String -> String
breaker = (unwordsBy ',' . f . wordsBy ',') where
  f xs
    | length xs > 8 = (take 8 xs) ++ (f . g) (drop 8 xs)
    | otherwise     = xs

  g  []    = []
  g (x:xs) = ("\n " ++ x):xs

-- Adapted from "lines" in the GHC List module
wordsBy :: Char -> String -> [String]
wordsBy _ "" = []
wordsBy c s  = let (l, s') = break (== c) s in
               l: case s' of
                    []      -> []
                    (_:s'') -> wordsBy c s''

-- Adapted from "unlines" in the GHC List module
unwordsBy :: Char -> [String] -> String
unwordsBy _ [] = ""
unwordsBy c ws = foldr1 (\w s -> w ++ c:s) ws

prog :: [Int]
prog = [105,109,112,111,114,116,32,83,
 121,115,116,101,109,46,73,79,
 10,10,45,45,32,77,121,32,
 102,105,114,115,116,32,104,97,
 115,107,101,108,108,32,113,117,
 105,110,101,44,32,114,101,118,
 105,115,101,100,10,10,109,97,
 105,110,32,58,58,32,73,79,
 32,40,41,10,109,97,105,110,
 32,61,32,100,111,32,40,112,
 117,116,83,116,114,32,46,32,
 109,97,112,32,116,111,69,110,
 117,109,41,32,112,114,111,103,
 10,10,32,32,32,32,32,32,
 32,32,32,32,40,112,117,116,
 83,116,114,32,46,32,98,114,
 101,97,107,101,114,32,46,32,
 32,115,104,111,119,41,32,112,
 114,111,103,10,10,32,32,32,
 32,32,32,32,32,32,32,112,
 117,116,83,116,114,32,34,92,
 110,34,10,10,98,114,101,97,
 107,101,114,32,58,58,32,83,
 116,114,105,110,103,32,45,62,
 32,83,116,114,105,110,103,10,
 98,114,101,97,107,101,114,32,
 61,32,40,117,110,119,111,114,
 100,115,66,121,32,39,44,39,
 32,46,32,102,32,46,32,119,
 111,114,100,115,66,121,32,39,
 44,39,41,32,119,104,101,114,
 101,10,32,32,102,32,120,115,
 10,32,32,32,32,124,32,108,
 101,110,103,116,104,32,120,115,
 32,62,32,56,32,61,32,40,
 116,97,107,101,32,56,32,120,
 115,41,32,43,43,32,40,102,
 32,46,32,103,41,32,40,100,
 114,111,112,32,56,32,120,115,
 41,10,32,32,32,32,124,32,
 111,116,104,101,114,119,105,115,
 101,32,32,32,32,32,61,32,
 120,115,10,10,32,32,103,32,
 32,91,93,32,32,32,32,61,
 32,91,93,10,32,32,103,32,
 40,120,58,120,115,41,32,61,
 32,40,34,92,110,32,34,32,
 43,43,32,120,41,58,120,115,
 32,10,10,45,45,32,65,100,
 97,112,116,101,100,32,102,114,
 111,109,32,34,108,105,110,101,
 115,34,32,105,110,32,116,104,
 101,32,71,72,67,32,76,105,
 115,116,32,109,111,100,117,108,
 101,10,119,111,114,100,115,66,
 121,32,58,58,32,67,104,97,
 114,32,45,62,32,83,116,114,
 105,110,103,32,45,62,32,91,
 83,116,114,105,110,103,93,10,
 119,111,114,100,115,66,121,32,
 95,32,34,34,32,61,32,91,
 93,10,119,111,114,100,115,66,
 121,32,99,32,115,32,32,61,
 32,108,101,116,32,40,108,44,
 32,115,39,41,32,61,32,98,
 114,101,97,107,32,40,61,61,
 32,99,41,32,115,32,105,110,
 10,32,32,32,32,32,32,32,
 32,32,32,32,32,32,32,32,
 108,58,32,99,97,115,101,32,
 115,39,32,111,102,10,32,32,
 32,32,32,32,32,32,32,32,
 32,32,32,32,32,32,32,32,
 32,32,91,93,32,32,32,32,
 32,32,45,62,32,91,93,10,
 32,32,32,32,32,32,32,32,
 32,32,32,32,32,32,32,32,
 32,32,32,32,40,95,58,115,
 39,39,41,32,45,62,32,119,
 111,114,100,115,66,121,32,99,
 32,115,39,39,10,10,45,45,
 32,65,100,97,112,116,101,100,
 32,102,114,111,109,32,34,117,
 110,108,105,110,101,115,34,32,
 105,110,32,116,104,101,32,71,
 72,67,32,76,105,115,116,32,
 109,111,100,117,108,101,10,117,
 110,119,111,114,100,115,66,121,
 32,58,58,32,67,104,97,114,
 32,45,62,32,91,83,116,114,
 105,110,103,93,32,45,62,32,
 83,116,114,105,110,103,10,117,
 110,119,111,114,100,115,66,121,
 32,95,32,91,93,32,61,32,
 34,34,10,117,110,119,111,114,
 100,115,66,121,32,99,32,119,
 115,32,61,32,102,111,108,100,
 114,49,32,40,92,119,32,115,
 32,45,62,32,119,32,43,43,
 32,99,58,115,41,32,119,115,
 10,10,112,114,111,103,32,58,
 58,32,91,73,110,116,93,10,
 112,114,111,103,32,61,32]

Sudoku Solving

noreply@blogger.com (Antoine) — Sat, 28 Apr 2007 20:45:00 +0000

I was inspired to write a Sudoku solver in Haskell to solve the Sudoku-related task on Project Euler.

The concept behind this solver is that the puzzle is represented by a collection of constraints, and the solver transforms these constraints until the puzzle is solved.

This approach works for many, but not all, Sudoku puzzles. At least it's fast!

First off: I'll be needing a few libraries to help out


> import Data.Array
> import Data.List hiding (group)
> import Text.ParserCombinators.Parsec
> import System.IO

Data Types

Let's start with the type I'm using to represent a Sudoku puzzle:


> type Sudoku a = Array (Int,Int) [a]

Each cell in the Sudoku puzzle is represented by a list of allowed values. A puzzle is solved when each cell can only take a single value.

A few functions to operate on the values of a Sudoku cell:

isD returns true if the supplied list is of length 1.


> isD :: [a] -> Bool
> isD (_:[]) = True
> isD  _     = False

stripD takes in a list of length 1, and returns the single value in the list.


> stripD :: [a] -> a
> stripD (a:[]) = a
> stripD _ = error "List supplied was not of unit length"

And now for functions that operate on whole puzzles:

isSolved returns true if the Sudoku puzzle is solved.


> isSolved :: Sudoku a -> Bool
> isSolved = and . map isD . elems

getS returns the contents of the cells specified by the input list, packed with the cell they came from


> getS :: Sudoku a -> [(Int,Int)] -> [((Int,Int),[a])]
> getS s ixs = [(ix, (s ! ix)) | ix <- ixs]

I drag the indices around for ease of reconstruction back into an array.

Solving

A Sudoku puzzle is initially presented with some cells filled in, and some cells empty.

In this solver, each cell of the Sudoku data-type contains a list of values that cell may take. So initially, each cell would either contain a list with a single element, or the list of numbers from 1 to 9.

The solver applies rules to groups of cells to add further constraints to the cells:


> type Rule a = [[a]] -> [[a]]

The solver takes a list of such rules, and applies them until they have no further effect. The rules that come first in the list are tried first. Once a rule succeeds in doing something, the solver then goes back to the beginning of the list.


> solveWithRules :: Eq a => [Rule a] -> Sudoku a -> Sudoku a
> solveWithRules [] s = s
> solveWithRules x s = helper x s where
>     helper [] s = s
>     helper (r:rs) s = if isSolved nextS
>                       then nextS
>                       else if s == nextS
>                            then helper rs nextS
>                            else helper x  nextS
>        where nextS = applyRule r s

applyRule takes a single rule and applies it to a Sudoku puzzle:


> applyRule :: Eq a => Rule a -> Sudoku a -> Sudoku a
> applyRule r = foldr1 (.) $ map (apply r) [group, column, row]

There are three interesting partitions of the cells of a Sudoku grid:

By row
By column
By 3x3 sub-grid

applyRule applies the specified rule, in turn, to each of row, column, and sub-grid.


> column, row, group :: Int -> [(Int,Int)]
> column n = [(x,n) | x <- [1..9]]
> row n    = [(n,x) | x <- [1..9]]
> group n  = [(x,y) | x <- [rowL !! (n-1)..(rowL !! (n-1))+2], 
>                     y <- [columnL !! (n-1) .. (columnL !! (n-1))+2]] where
>            rowL = [1,1,1,4,4,4,7,7,7]
>            columnL = [1, 4, 7, 1, 4, 7, 1, 4, 7]

The function apply takes the rule and the partition, and performs the application of the rule:


> apply :: Eq a => Rule a -> (Int -> [(Int,Int)]) -> Sudoku a -> Sudoku a
> apply r partition = \x -> array ((1,1),(9,9)) 
>                           $ concat 
>                           $ map (liftE r)
>                           $ map (getS x)
>                           $ map partition [1..9]

The function liftE is needed to convert a rule (which is of type [[a]] -> [[a]]) to be of type [((Int,Int),[a])] -> [((Int,Int),[a])]


> liftE :: ([a] -> [b]) -> [(i, a)] -> [(i, b)]
> liftE f = \x -> zip (fst (unzip x)) $ f $ snd (unzip x)

So I've laid out all of the mechanisms for applying rules to Sudoku puzzles. the next step is to define these rules.

Rules

All of the rules analyze a set of Sudoku cells, and further constrain the values of those cells (or do nothing).

Here's an example, rule a1:


> a1 :: Eq a => Rule a
> a1 x = map helper x where
>            helper (a:[]) = a:[]
>            helper as     = [b | b <- as, not $ elem b definites]
>            definites = [stripD y | y <- x, isD y]

Put simply: If a cell is fully constrained to a value, no other cell is allowd to take that value.

This reduction rule can be taken further: If two cells are jointly constrained to two values, no other cells may take those values, if three cells are jointly constrained to three values ... and on and on.

In order to apply the generalized form of rule a1, I'll need to figure out every way I can break a list of nine Sudoku cells into two groups:


> combos :: Integral n => [([n],[n])]
> combos = map (\x -> (x,[0..8]\\x))  $ powerset [0..8]

combos is a list of pairs of lists, representing every way to split a list of nine elements into two groups. Using a set of puzzle cells as an accumulator, I can fold across this list (or portions of it).
But first:


> powerset :: [a] -> [[a]]
> powerset [] = [[]]
> powerset (x:xs) = let p = powerset xs in p ++ map (x:) p

I didn't come up with this implementation of powerset, and I don't remember who did. Sorry!

Also, I'm really only interested in a portion of the combos list at any given time:


> subCombos n  = filter ( (==n) . fromIntegral .  length . fst) combos

subCombos n represents all of the ways to divide up a list into two groups such that one of the groups has n elements.
And here is the "super" rule:


> a :: (Eq a, Integral n) => n -> Rule a
> a n = \y -> foldr helper y (subCombos n) where
>       helper (a,b) x = if length (valuesIn a x) == length a
>                        then mapIndices (remove (valuesIn a x)) b x
>                        else x

(a 1) should produce the same result as a1. In practice, (a 1) is slower than a1.
I've used a few new functions up there:


> valuesIn :: (Integral n, Eq a) => [n] -> [[a]] -> [a]
> valuesIn a x = nub . concat . map ((x!!) . fromIntegral) $  a

Given a list of indices and a list of constraints, valuesIn returns a flat list of the values specified by the constraints at the indices given.


> remove :: Eq a => [a] -> [a] -> [a]
> remove a bs = [x | x <- bs, not (elem x a)]

remove takes in two lists, and returns a list composed of elements in the second list but not in the first. There's probably something a lot like this in the standard library.


> mapIndices :: Integral n => (a -> a) -> [n] -> [a] -> [a]
> mapIndices f ns as = helper 0 f ns as where
>   helper _ _ [] as = as
>   helper _ _ _  [] = []
>   helper i f ns (a:as) = if elem i ns
>                          then (f a):(helper (i+1) f (delete i ns) as)
>                          else a:(helper (i+1) f ns as)

mapIndices is a lot like map, but it passes through any list elements which are not at the indices specified.

Those are the rules!

Parsing

Parsec may be a bit much for this. Ah well.

A Sudoku puzzle consists of a bunch of numbers. Whitespace is ignored. Any non-numeric text preceding the numbers is ignored. The parser also consumes any non-numeric text following the Sudoku puzzle.


> parseSudoku :: (Integral a, Read a) => Parser (Sudoku a)
> parseSudoku = do many (noneOf nums)
>                  x <- many1 (do y <- digit
>                                 many space
>                                 return y)
>                  many (noneOf nums)
>                  return $ sudoku $ map (read . return) x
>     where nums = ['0'..'9']

I use a function sudoku in there. It creates a Sudoku puzzle from a list a more-friendly way:


> sudoku :: Integral a => [a] -> Sudoku a
> sudoku xs | (length xs == 81) = array ((1,1),(9,9))
>                                 $ zip [(x,y)| x <- [1..9], y <- [1..9] ]
>                                 $ map value xs
>           | otherwise         = error "Suduko must be constructed of 81 entries."

Here I use the value function:


> value :: Integral a => a -> [a]
> value 0 = [1..9]
> value a = a:[]

That covers parsing. It's time to look at the main function:

Main


> main :: IO ()
> main = do x <- hGetContents stdin
>           case (parse (many1 parseSudoku) "" x) of
>             Left err -> do putStr "Error reading input"
>                            print err
>             Right a ->  foldr1 (>>) $ map (printSudoku . solveWithRules rules) a
>   where rules = [a1, a 6, a 2]

Here's the breakdown:

I parse the standard input to a list of Sudoku puzzles
For each Sudoku, I attempt to solve them with a specified set of rules
Then I print them

Simple!

Here's how printing is handled:


> printSudoku :: Show a => Sudoku a -> IO ()
> printSudoku = printEntries . map unValue . elems

> unValue :: Show a => [a] -> Char
> unValue (a:[]) = head (show a)
> unValue []     = '/'
> unValue _      = '_'

> printEntries :: String -> IO ()
> printEntries [] = putStrLn "\n"
> printEntries s = do putStrLn (take 9 s)
>                     printEntries (drop 9 s)

elems takes an array and returns it's contents as a list. I then replace the contraint lists with single characters, and then I print them out nine at a time.

Test cases

Feed the remainder of this post on stdin into the compiled version of this post to try out the solver.

Puzzles from websudoku.com
Rating: Hard

000070009
400080001
093000008
040006200
010758040
006300080
700000560
600020004
500090000

+++

002150000
603002000
000003028
001006070
950000061
080400300
210600000
000200607
000039200

+++

200100730
000490020
060008000
006000004
940070012
800000900
000300090
050026000
037009005

+++

003040800
000006130
000350042
300000000
147000365
000000004
890025000
026100000
005080400

+++

040006003
000705800
000800460
020008007
006000200
800400050
039004000
007502000
200100040

+++

000900103
010607000
080000020
036080007
700109008
800070410
040000030
000204090
605001000

+++

090800701
060010000
000900200
038709005
000000000
400502680
004005000
000040060
105008070

+++

200600970
030000050
640007001
000003006
001502700
500400000
900200037
050000090
012009005

+++

080000000
000430090
100052006
761500030
005000100
030001975
200140009
040087000
000000040

+++

090800700
080001060
000050200
020500040
708010603
030006020
005020000
010300080
009007010

+++

070902000
050040107
400100082
086000000
200000006
000000290
830005004
607020010
000403070

+++

010008000
785030001
000001950
000020004
043000210
200050000
094700000
800090427
000400030

+++

003000000
100967002
020300068
609000010
040000050
010000409
450001080
200685001
000000600

+++

063078001
098500000
040000000
000250100
025000680
004061000
000000090
000004720
300680410

+++

000005800
090230070
730001002
100000085
600000001
920000004
200500067
060092010
001300000

+++

003059408
000810009
000000050
009002006
012000570
600300100
070000000
900024000
104980700

+++

580700200
000405080
000000007
079106000
062000150
000203470
600000000
020609000
008002015

Wow!

FS