tag:blogger.com,1999:blog-1855204231795641244Wed, 16 Sep 2015 18:59:59 +0000haskellICFPcategory theoryCaballatexmumpsparsingsudokuxmlFShttp://panicsonic.blogspot.com/search/label/haskellnoreply@blogger.com (Antoine)Blogger14125tag:blogger.com,1999:blog-1855204231795641244.post-3745702928249274164Tue, 28 Dec 2010 01:53:00 +00002010-12-28T16:26:42.661-06:00haskellxmlEditing XML in Haskelledit: Here's an hpaste for the solution I eventually came up with: <a href="http://hpaste.org/42628/xml_cursor_monad">http://hpaste.org/42628/xml_cursor_monad</a><br /><br />I recently found myself wanting to make small tweaks to an XML file not under my control before processing it in my Haskell app.<br /><br />My end goal would be a way to a) declare what in the XML I want to edit and then b) apply a function to the located element and have it update in place.<br /><br />The <a href="http://hackage.haskell.org/package/hxt">hxt</a> (Haskell Xml Toolkit) has a lot of pieces, and includes and XPath parser, so I looked at it first. But like XSLT, hxt seems geared towards XML processing - not editing. It has tools for applying an translation recursively through a tree, but that requires finding an element based on a predicate on the element, not based on it's location within the document. hxt does let you extract elements based on location, but I don't see how to put the original document back together again. Maybe I'm missing something.<br /><br />Then I noticed that the <a href="http://hackage.haskell.org/package/xml">xml</a> package (sometimes referred to as xml-light) has a <a href="http://hackage.haskell.org/packages/archive/xml/1.3.7/doc/html/Text-XML-Light-Cursor.html">Cursor</a> written for it! So all I need to do is navigate the cursor down to where I need it, apply the update function and then I'm done. That's declarative enough for me.<br /><br />There were two problems with this:<ol><li>A lot of the cursor manipulation functions return maybe types</li><li>I didn't feel comfortable composing functions on cursors - if a sub-function goes off the deep end it could have left the cursor anywhere in the document DOM</li></ol><br />So I did what any other Haskell programmer would do - I wrote my own XML editing monad to fix and then encapsulate the problems above.<br /><pre>> data Update a = ...<br /></pre><br />The <tt>Update</tt> type has instances for Monad, Functor, and Applicative - and to handle the failing traversal functions it has instances for Alternative and MonadPlus.<br /><br />It has the primitive operations:<br /><pre><br />> runUpdate :: Cursor -> Update a -> Maybe (a, Cursor)<br /><br />> perform :: (Cursor -> Maybe Cursor) -> Update ()<br /><br />> asks :: (Cursor -> a) -> Update a<br /></pre> <br />These are used to wrap up all of the functions from <a href="http://hackage.haskell.org/packages/archive/xml/1.3.7/doc/html/Text-XML-Light-Cursor.html">Text.XML.Light.Cursor</a> in a straightforward way.<br /><br />To give me more control over the composition of cursor update there are also the following primitives:<br /><pre><br />> sandbox :: Update a -> Update a<br /><br />> run :: Update a -> Update ()<br /></pre><br />The function 'sandbox' executes the passed in action, but contains it to the current scope of the cursor - the action may not access the parent or siblings of the current node. In addition, the cursor is returned to its current position regardless of where the passed in action left it.<br /><br />The function 'run' is the same as 'sandbox' - except that if the passed action fails we pretend that nothing happened.<br /><br />So go nuts - you can now declare edits and traversals without worrying about how to fit them in to the bigger picture. We have combinators for that.<br /><br />Am I off the deep end with this? Are there other tools on Hackage I should be using?http://panicsonic.blogspot.com/2010/12/editing-xml-in-haskell.htmlnoreply@blogger.com (Antoine)3tag:blogger.com,1999:blog-1855204231795641244.post-2876288379135710012Mon, 28 Dec 2009 19:56:00 +00002009-12-28T21:18:16.289-06:00haskellparsingAdventures in Parsec<span class="Apple-style-span" style="font-size: large;">Part I - An introduction to Parsec<br /></span><br />The basic type of a monadic parser is a function from an input string to a parse result, along with the rest of the string to be parsed.<br /><br />One example is <tt>ReadS a = String -> [(a,String)]</tt>, where returning a null list indicates no parse, and returning multiple values allows a parser to indicate ambiguity.<br /><br />There are a few deficiencies in this data structure:<br /><ul><li>The representation of ambiguity can lead to large space leaks, as the traditional combinators allow for unlimited backtracking</li><li>It is difficult to wedge good error reporting into this setup.</li></ul><br />The parsec parser, instead of returning a list of parses returns one of four results of the parse:<br /><br /><ul><li>I errored and did not consume input</li><li>I errored and did consume input</li><li>I succeded and did not consume input</li><li>I succeded and did consume input</li></ul><br />The idea is that once a parser consumes any input (that is, looks ahead more than one charecter) we prohibit backtracking. This limitation on backtracking means that we can drop the beging of the token-stream sooner, allowing it to be cleaned up by the garbage collector. There are also advantages to error reporting to making these distinctions.<br /><br />Partridge & Wright seem to be the first to have introduced this splitting of the return value of a parse, but I can't find a non-paywalled version of their paper "Predictive parser combinators need four values to report errors." It seems like exactly the sort of paper I should be reading, but I'm also not sure on how to get in touch with the authors. <em>Edit: Thanks to Chung-chieh Shan for sending me a copy of Partridge & Wright.</em><br /><br />We still have a problem - we still need to hang on to the input string until we know that the parser succeeded or failed. In many cases we know that the parser consumes input long before we know that it was successful.<br /><br />Parsec combines two data structures to return these four values:<pre><br />data Consumed a = Consumed a | Empty a<br /><br />data Reply a = Ok a State Message | Error Message<br /><br />type Parser a = State -> Cosumed (Reply a)</pre><br />Now our choice combinator can determine if a parser consumes input before we finish the parse - this means that we allow the GC to drop the head of the input as soon as the parser consumes any of it, solving the mentioned memory leak.<br /><br /><span class="Apple-style-span" style="font-size: large;">Part II - My obsession with functionalizing monad transformers</span><br /><br />I have a bit of an obsession with the dual nature of data and functions, and converting algebraic data structures to their equivalent function form to see what I can uncover.<br /><br />For example, in the mtl the ErrorT monad transformer is defined as follows:<pre><br />newtype ErrorT e m a = ErrorT {<br /> runErrorT :: m (Either e a)<br />}<br /><br />throwError :: e -> ErrorT e m a<br />throwError = ErrorT . return . Left</pre><br />When we give this monad semantics, in between each <tt>(>>=)</tt> we check the value on the LHS, ad if it's in Left we short circuit what's on the RHS so that the error value returns all the way to the end of the computation. This means at each stage of the computation we have to do a case analysis of the inner Eitehr value. Since the inner type is 'm (Either e a)' we also need to perform a <tt>(>>=)</tt> operation in the inner monad to perform <tt>(>>=)</tt> in ErrorT.<br /><br />An equivalent type is:<pre><br />newtype ErrorT e m a = ErrorT {<br /> unError :: forall r . (e -> m r) -- error!<br /> -> (a -> m r) -- success!<br /> -> m r<br />}</pre><br /><br />The second function is our success continuation - it's passed in to an action, and called if the action successfully returns a value -<pre><br />return :: a -> ErrorT e m a<br />return x = ErrorT $ \_ successK -><br /> successK x</pre><br />The first function is the error continuation, and is called if an action needs to terminate abnormally:<pre><br />throwError :: e -> ErrorT e m a<br />throwError e = ErrorT $ \errorK _ -><br /> errorK e<br /></pre><br />We then weave the continuations together in the implementation of (>>=) to get the<br />short-curcuiting we want:<pre><br />(>>=) :: ErrorT e m a -> (a -> ErrorT e m b) -> ErrorT e m b<br />m >>= f = ErrorT $ \topErrorK topSuccessK-><br /> unError m topErrorK $ \x -><br /> unError (f x) topErrorK topSuccessK<br /></pre><br />The LHS is given our top-level erorr handler to call if it errors out. If it succeds, it calls a lambda which evaluates the RHS. The RHS is given the same error handler as the LHS, and the success continuation for the RHS is the success continuation for the expression as a whole. So successes move from left to right, but if there's an error it can only go to one place.<br /><br />Interesting points to note:<br /><ul><li>There's no case analysis on data structures - short circuiting works because every action gets passed the same error continuation (unless we implement a 'catch' combinator).</li><li>There are no constraints on the nature of the 'm' type variable. ErrorT is a monad independent of whatever it's wrapping.</li></ul><br />I haven't benchmarked whether or not this is faster for anything, but I find the whole thing a lot of fun.<br /><br /><span class="Apple-style-span" style="font-size: large;">Part III - A faster Parsec 3<br /></span><br />Parsec version 3 was released on <a href="http://hackage.haskell.org">hackage</a> a bit back, and it improved on the prior version in two ways:<br /><br />1) It was a monad transformer. This is pretty fun - as an exercise I wrote a unification engine in a monad, and then wrapped parsec around it. When it hit a syntax error in equality terms it could print the in-progress results of performing unification up until that point. Very fun.<br /><br />2) It was parameterized over the input type. The previous version required a list of tokens as input.<br /><br />The downside is that (when using the non-transformer compatibility layer) parsec-3 is 1.8x slower in some benchmarks as parsec-2.<br /><br />But how can it be made faster without losing the new abstractions? The first thing I tried was to have parsec be split into two parsers - one transformer and one not, with two implementations of the core combinators. The core combinators would be moved into a type-class, and the higher-level combinators would then be polymorphic over either of them. Foks writing parsers could write them polymorphic, and then the folks running the parsers only pay for the new abstractions if they need it.<br /><br />It worked, but the type-class itself didn't make any sense - it was more of a "put what I need over in this bucket" job than a careful consideration of what the core primitives of a monadic parser like parsec truly are. I also seem to remember that in introduced a problem in the compatibility layer, but it's been a while since I did that testing. You can see the haddocks here:<br /><br /><a href="http://community.haskell.org/~aslatter/parsec/Text-Parsec-Class.html">Text.Parsec.Class</a><br /><a href="http://community.haskell.org/~aslatter/parsec/Text-Parsec-Core.html">Text.Parsec.Core</a><br /><a href="http://community.haskell.org/~aslatter/parsec/Text-Parsec-Combinator.html">Text.Parsec.Combinator</a><br /><br />This is also a radical restructuring of the parsec API - new constraints and new modules, and changing the meaning of another. Lots of fun.<br /><br />Another approach which is a much smaller change to the visible part of parsec but is a much more radical change to the inner workings is to do to ParsecT what we did to ErrorT - return via continuations rather than an algebraic data type. Where ErrorT needed two continuations, ParsecT requires four:<pre><br />newtype ParsecT s u m a<br /> = ParsecT {unParser :: forall b .<br /> State s u<br /> -> (a -> State s u -> ParseError -> m b) -- consumed ok<br /> -> (ParseError -> m b) -- consumed error<br /> -> (a -> State s u -> ParseError -> m b) -- empty ok<br /> -> (ParseError -> m b) -- empty error<br /> -> m b<br /> }<br /></pre><br />When the parser errors we call one of the error continuations with the information about the error. When the parse succeeds we call one of the success continuations, passing along the parsed value and the new state of the parse.<br /><br />And this is practically as fast as parsec-2 for folks not using the new abstractions in parsec-3. I believe it's because we no longer pay for the abstraction in the core cobinators - none of the primitives or combinators place any constraints on the type of 'm', just as for the continuation-based ErrorT transformer.<br /><br />But what of Patridge & Wright's space leak? Where is the laziness introduced in the Parsec technical report? I've gone from the nested structure back to a flat structure. How can this be as fast as the Parsec of the report if I've re-introduced the space leak?<br /><br />It was the case analisys on the not-lazy-enough return value of the parser which introduced the space leak, but we don't have have that. We just pass continuations in to the parsers, which may call them as they will. As long as the core primitves aren't capable of calling the "I haven't consummed input" continuations after they have consumed input then we're free to garbage collect those continuations as well as the consummed bits of input. Space leak averted.<br /><br />The darcs repo for the continuation based parsec is here: <a href="http://community.haskell.org/~aslatter/code/parsec/cps/">http://community.haskell.org/~aslatter/code/parsec/cps/</a><br /><br /><span class="Apple-style-span" style="font-size:large;">Appendix A: Further adventures in ErrorT</span><br /><br />In case you needed convincing that the above formulation of ErrorT is equivalent to that in the mtl:<pre><br />catch :: ErrorT e m a -> (e -> ErrorT e m a) ->ErrorT e m a<br />catch m handler = ErrorT $ \topErrorK topSuccessK -><br /> let errorK e = unError (handler e) topErrorK topSuccessK<br /> in unError m errorK topSuccessK<br /><br />runErrorT :: Monad m => ErrorT e m a -> m (Either e a)<br />runErrorT (ErrorT f) = f errorK successK<br /> where successK = return . Right<br /> errorK = return . Left</pre>http://panicsonic.blogspot.com/2009/12/adventures-in-parsec.htmlnoreply@blogger.com (Antoine)4tag:blogger.com,1999:blog-1855204231795641244.post-7634849216422982440Sat, 18 Apr 2009 19:45:00 +00002009-04-18T16:37:14.183-05:00haskellmumpsUsing HaskelineEarlier today I decided to unearth an old project of mine - figuring that the best way to learn two languages was to implement one in the other, I wrote a <a href="http://en.wikipedia.org/wiki/MUMPS">MUMPS</a> interpreter in <a href="http://haskell.org/">Haskell</a>. I was learning MUMPS for work, and Haskell for fun.<br /><br />Back when I wrote it, I used <a href="http://hackage.haskell.org/cgi-bin/hackage-scripts/package/readline">readline</a> in the REPL part of the interpreter - during the cleanup I wanted to move away from readline as <a href="http://haskell.org/ghc/">GHC</a> doesn't ship with it any more, and sometimes it can be a pain to install on its own. So I switched to <a href="http://hackage.haskell.org/cgi-bin/hackage-scripts/package/haskeline">Haskeline</a>. It doesn't ship with GHC either, but it's proven easier for me to install.<br /><br />Haskeline has got a really friendly API, with all of the functions operating inside the <tt>InputT m</tt> monad transformer. "Great," I think, "I can just pile this on top of my existing monad transformers stack in the interpreter!"<br /><br />All was not so simple, as <tt>InputT</tt> has it's own instance of MonadState and MonadReader which allows the user of the library to peer into the guts of the implementation. But I didn't want to monkey with my text entry, I just wanted it to work and get out of my way, and I wanted the rest of my code to use the MonadState instance further down the stack that I had already set up.<br /><br />So I wrote a small wrapper for Haskeline that's more friendly to mtl-style monad transformer composition. As written, it only composes with MonadIO and MonadState, but it would be straightforward to do more.<br /><br />My wrapper uses <tt>HaskelineT</tt> instead of <tt>InputT</tt>, and exposes the same core functions as Haskeline (except for <tt>withInterrupt</tt>). It doesn't do anything I couldn't do by peppering <tt>lift</tt>s all over the place, but this way feels a but cleaner to me - Haskeline keeps its workings to itself, and I don't have to think about the order of the layered monad transformers.<br /><hr/><pre><br />{-# LANGUAGE FlexibleInstances<br /> , MultiParamTypeClasses<br /> , UndecidableInstances<br /> , GeneralizedNewtypeDeriving<br /> #-}<br /><br />import qualified System.Console.Haskeline as H<br />import System.Console.Haskeline.Completion<br />import System.Console.Haskeline.MonadException<br /><br />import Control.Applicative<br />import Control.Monad.State<br /><br />newtype HaskelineT m a = HaskelineT {unHaskeline :: H.InputT m a}<br /> deriving (Monad, Functor, Applicative, MonadIO, MonadException, MonadTrans, MonadHaskeline)<br /><br />runHaskelineT :: MonadException m => H.Settings m -> HaskelineT m a -> m a<br />runHaskelineT s m = H.runInputT s (unHaskeline m)<br /><br />runHaskelineTWithPrefs :: MonadException m => H.Prefs -> H.Settings m -> HaskelineT m a -> m a<br />runHaskelineTWithPrefs p s m = H.runInputTWithPrefs p s (unHaskeline m)<br /><br />class MonadException m => MonadHaskeline m where<br /> getInputLine :: String -> m (Maybe String)<br /> getInputChar :: String -> m (Maybe Char)<br /> outputStr :: String -> m ()<br /> outputStrLn :: String -> m ()<br /><br /><br />instance MonadException m => MonadHaskeline (H.InputT m) where<br /> getInputLine = H.getInputLine<br /> getInputChar = H.getInputChar<br /> outputStr = H.outputStr<br /> outputStrLn = H.outputStrLn<br /><br /><br />instance MonadState s m => MonadState s (HaskelineT m) where<br /> get = lift get<br /> put = lift . put<br /><br />instance MonadHaskeline m => MonadHaskeline (StateT s m) where<br /> getInputLine = lift . getInputLine<br /> getInputChar = lift . getInputChar<br /> outputStr = lift . outputStr<br /> outputStrLn = lift . outputStrLn<br /></pre>http://panicsonic.blogspot.com/2009/04/using-haskeline.htmlnoreply@blogger.com (Antoine)0tag:blogger.com,1999:blog-1855204231795641244.post-5154509585929169654Sun, 15 Feb 2009 19:24:00 +00002009-02-15T18:37:30.921-06:00haskellMaybeT - The CPS Version<pre>> {-# LANGUAGE Rank2Types #-}<br />> import Control.Monad</pre>I think I finally understand writing code into continuation passing style. I've understood it at an academic level for some time - but that's different from being able to write the code.<br /><br />This post presents a different implementation of the Maybe monad transformer - usually presented as so:<pre><br />data MaybeT m a = MaybeT {runMaybeT :: m (Maybe a)}<br /></pre>which can be used to add the notion of short-circuiting failure to any other monad (sortof a simpler version of <tt><a href="http://hackage.haskell.org/packages/archive/mtl/1.1.0.2/doc/html/Control-Monad-Error.html#v:ErrorT">ErrorT</a></tt> from the <a href="http://hackage.haskell.org/cgi-bin/hackage-scripts/package/mtl">MTL</a>).<br />I first came across <tt>MaybeT</tt> in <a href="http://www.haskell.org/haskellwiki/New_monads/MaybeT">a page</a> on the <a href="http://www.haskell.org/haskellwiki/Haskell">Haskell Wiki</a>.<br />This presentation of MaybeT uses the Church encoding of the data-type:<pre>> newtype MaybeT m a = MaybeT {unMaybeT :: forall b . m b -> (a -> m b) -> m b}</pre>Note the similarity to the Prelude function <tt><a href="http://haskell.org/ghc/docs/latest/html/libraries/base/Prelude.html#v:maybe">maybe</a></tt>. We can unwrap the transformer like so:<pre><br />> runMaybeT :: Monad m => MaybeT m a -> m (Maybe a)<br />> runMaybeT m = unMaybeT m (return Nothing) (return . Just)<br /></pre>This <tt>runMaybeT</tt> should be a drop-in replacement for the old one.<br />The advantage here is that we can write the Monad and MonadPlus instances without calling <tt>bind</tt> or <tt>return</tt> in the underlying monad <tt>m</tt>, and without doing any case analysis on <tt>Just</tt> or <tt>Nothing</tt> values:<pre>> instance Monad (MaybeT m) where<br />> return x = MaybeT $ \_ suc -> suc a<br />> <br />> m >>= k = MaybeT $ \fail suc -><br />> unMaybeT m fail $ \x -><br />> unMaybeT (k x) fail suc<br />><br />> fail _ = mzero<br /><br />> instance MonadPlus (MaybeT m) where<br />> mzero = MaybeT $ \fail _ -> fail<br />><br />> m `mplus` n = MaybeT $ \fail suc -><br />> unMaybeT m (unMaybeT n fail suc) suc<br /></pre>It's just a matter of threading the failure and success continuations to the right place at the right time.<br />To show that this is equivalent to the old implementation, here's a re-write of the old <tt>MaybeT</tt> data constructor from above:<pre><br />> fromMaybe :: Monad m => m (Maybe a) -> MaybeT m a<br />> fromMaybe m = MaybeT $ \fail suc -> do<br />> res <- m<br />> case res of<br />> Nothing -> fail<br />> Just x -> suc x</pre><br />So anything you can do with the other version, you can do with this version. And for most things it should be a drop-in replacement.http://panicsonic.blogspot.com/2009/02/maybet-cps-version.htmlnoreply@blogger.com (Antoine)1tag:blogger.com,1999:blog-1855204231795641244.post-2987005566991346430Fri, 13 Feb 2009 02:00:00 +00002009-02-12T20:02:01.545-06:00CabalhaskellDependencies in Hackage, revisitedIn a <a href="http://panicsonic.blogspot.com/2008/02/html-scraping-with-tagsoup.html">previous post</a> I described how to scrape the <a href="http://hackage.haskell.org/packages/hackage.html">Hackage</a> website to do reverse lookups on package dependency data for packages hosted on Hackage.<br /><br />With the release of the new HTTP library (version 4000) that code doesn't work anymore. This post presents a different solution to the problem.<br /><br />Instead of pulling data out of html documents, we instead load and parse the local .tar file that <a href="http://hackage.haskell.org/trac/hackage/wiki/CabalInstall">cabal-install</a> uses to do its own dependency chasing.<br /><br />You'll need <a href="http://hackage.haskell.org/cgi-bin/hackage-scripts/package/tar">tar</a> and <a href="http://hackage.haskell.org/cgi-bin/hackage-scripts/package/utf8-string">utf8-string</a> from Hackage.<br /><br />First, the necessary imports:<pre><br />> import Data.Maybe<br />> import Data.List<br /><br />> import Codec.Archive.Tar<br /><br />> import Data.ByteString.Lazy (ByteString)<br />> import qualified Data.ByteString.Lazy as BS<br /><br />> import qualified Data.ByteString.Lazy.UTF8 as UTF8<br /><br />> import System.IO<br />> import System.Environment<br /><br />> import Distribution.Text<br />> import Distribution.Package<br />> import Distribution.PackageDescription<br />> import Distribution.PackageDescription.Parse<br /></pre>And now the 'main' method.<br /><br />The first argument is the name of the package you're checking dependencies for, and the second argument is the path to the Hackage index tar-file (for me this is <tt>~/.cabal/packages/hackage.haskell.org/00-index.tar</tt>).<pre><br />> main :: IO ()<br />> main = do<br />> [target,tarball] <- getArgs<br />> withFile tarball ReadMode $ \h -> do<br />> contents <- BS.hGetContents h<br />> let matches = matchesFromIndex contents (== target)<br />> sequence_ $ map (print . disp) matches </pre><pre><br /></pre>And then we have the function which, given the contents of the tar-file as a ByteString, returns back the list of PackageIds which depend on the indicated package.<pre><br />> matchesFromIndex :: ByteString -> (String -> Bool) -> [PackageId]<br />> matchesFromIndex index p =<br /><br />> let tarchive = readTarArchive index<br />> cabalFiles = map UTF8.toString $ findCabalEntries tarchive<br />> parseResults = map parsePackageDescription cabalFiles<br />> gPckgDiscs = okayOnly parseResults<br />> matches = filter (match p) gPckgDiscs<br /><br />> in map packageId matches<br /><br />> okayOnly :: [ParseResult a] -> [a]<br />> okayOnly = mapMaybe fromOkay<br />> where fromOkay (ParseOk _ a) = Just a<br />> fromOkay _ = Nothing<br /></pre><pre><br />> -- Does this package have a dependency which matches our<br />> -- query?<br />> match :: (String -> Bool) -> GenericPackageDescription -> Bool<br />> match p pkg = any (matchDep p) (gPckgDeps pkg)<br /><br />> -- Does this dependency match our query?<br />> matchDep :: (String -> Bool) -> Dependency -> Bool<br />> matchDep p (Dependency (PackageName name) _) = p name<br /></pre>There's a bit of black-magic going on here - I don't entirely understand the structure of the new 'library' and 'executable' sections of the <tt>.cabal</tt> file, but I scrape everything out which has the right type.<pre><br />> gPckgDeps :: GenericPackageDescription -> [Dependency]<br />> gPckgDeps pkg = normalDeps ++ libDeps ++ execDeps<br />> where<br />> normalDeps = buildDepends $ packageDescription pkg<br /><br />> libDeps = case condLibrary pkg of<br />> Nothing -> []<br />> Just cndTree -> depsFromCndTree exLibDeps cndTree<br /><br />> execDeps = concatMap (depsFromCndTree exExecDeps . snd)<br />> (condExecutables pkg)<br /><br />> exLibDeps = pkgconfigDepends . libBuildInfo<br />> exExecDeps = pkgconfigDepends . buildInfo<br /><br />> depsFromCndTree f tree =<br />> let x = condTreeData tree<br /> <br />> parts = condTreeComponents tree<br />> mdlTrees = map mdl parts<br />> thrdTrees = mapMaybe thrd parts<br /><br />> trees = mdlTrees ++ thrdTrees<br /><br /><br />> in f x ++<br />> condTreeConstraints tree ++<br />> concatMap (depsFromCndTree f) trees<br /><br />> where mdl (_,x,_) = x<br />> thrd (_,_,x) = x<br /><br /></pre>And this is the bit which takes the decoded tar-file and returns back the entries which look like they could be <tt>.cabal</tt> files.<pre><br />> findCabalEntries :: TarArchive -> [ByteString]<br />> findCabalEntries TarArchive{archiveEntries = xs} = mapMaybe go xs<br /><br />> where go :: TarEntry -> Maybe ByteString<br />> go x | fileType x /= TarNormalFile = Nothing<br />> | isBoringName (fileName x) = Nothing<br />> | otherwise = Just $ entryData x<br /><br />> fileType = tarFileType . entryHeader<br />> fileName = tarFileName . entryHeader<br /><br />> isBoringName = not . isSuffixOf ".cabal"<br /></pre><br /><br />Not too shabby.http://panicsonic.blogspot.com/2009/02/dependencies-in-hackage-revisited.htmlnoreply@blogger.com (Antoine)0tag:blogger.com,1999:blog-1855204231795641244.post-7688931170449915603Sat, 21 Jun 2008 06:13:00 +00002008-06-21T01:35:04.482-05:00haskellHaskell SnippetsI'm a huge fan of the function <tt style="color: rgb(51, 204, 0);">mapMaybe</tt>, but once I move from the 'Maybe' monad into something more complex (such as <tt style="color: rgb(51, 204, 0);">ReaderT r Maybe</tt>) things become tricky.<br /><br />First, what is <tt style="color: rgb(51, 204, 0);">mapMaybe</tt>?<br /><br />Its type is: <tt style="color: rgb(51, 204, 0);">(a -> Maybe b) -> [a] -> [b]</tt><br /><br />It maps the input function over the list, and drops any values which evaluate to nothing. It's like a combination of <tt style="color: rgb(51, 204, 0);">map</tt> and <tt style="color: rgb(51, 204, 0);">filter</tt>, where the input function is given the option to either transform the input or filter it out.<br /><br />But then I needed more information threaded around in my functions, and the types went from<span style="font-family:monospace;"> </span><tt style="color: rgb(51, 204, 0);">a -> Maybe b</tt> to <tt style="color: rgb(51, 204, 0);">a -> ReaderT r Maybe b</tt>.<br /><br />So I needed:<br /><br /><pre>> mapAlt :: Alternative f => (a -> f b) -> [a] -> f [b]</pre><br /><br />It's just like <tt style="color: rgb(51, 204, 0);">mapMaybe</tt>, except it works for any Alternative functor.<br /><br />The output is still in the functor <tt style="color: rgb(51, 204, 0);">f</tt> so I can have it work for effectful monads and such, but it will always return a value (even if it's the empty list).<br /><br />Here's the implementation:<br /><br /><pre>> mapAlt f xs = go xs<br />> where go [] = pure []<br />> go (y:ys) = (pure (:) <*> f y <*> go ys)<br />> <|> go ys</pre>Links:<br /><ul><li><a href="http://haskell.org/ghc/docs/latest/html/libraries/base/Data-Maybe.html#v%3AmapMaybe"><tt>mapMaybe</tt></a></li><li><a href="http://haskell.org/ghc/docs/latest/html/libraries/base/Control-Applicative.html#t%3AAlternative"><tt>Alternative</tt></a></li></ul>Hurrah for simple, useful functions.http://panicsonic.blogspot.com/2008/06/haskell-snippets.htmlnoreply@blogger.com (Antoine)1tag:blogger.com,1999:blog-1855204231795641244.post-6207998630007609259Sun, 10 Feb 2008 23:04:00 +00002008-02-10T18:14:17.719-06:00haskellHTML Scraping with TagSoupEarlier today I wanted to know the packages on <a href="http://hackage.haskell.org">Hackage</a> which stated a dependency on <a href="http://hackage.haskell.org/cgi-bin/hackage-scripts/package/parsec">Parsec</a>, so I wrote a command-line utility to do it. This post presents the utility.<br /><br />The plan is simple: grab <tt>http://hackage.haskell.org/packages/archive/pkg-list.html</tt>, extract all of the links which look like links to packages, and then for each of the package-description pages find out if the dependency list includes parsec. If it does, print the package name.<br /><br />First, a few preliminaries:<pre><br />> import Data.Maybe<br />> import Network.HTTP<br />> import Network.URI<br />> import System.Environment<br />> import Text.HTML.TagSoup<br />> import Text.Regex.Base<br />> import Text.Regex.Posix.String<br />> import Text.Regex.Posix.Wrap</pre><br />You could probably use a different Regex package if you wanted to without too much trouble.<br /><br />First up, a few strings broken out of the body of the program for convenience should they need changing.<pre><br />> name = "hackage-dep"<br />> version = "0.1.0"</pre><pre><br />> baseURIString = "http://hackage.haskell.org"<br />> packagesURI =<br />> fromJust $ parseURI $ baseURIString ++ "/packages/archive/pkg-list.html"<br />> basePath = "/cgi-bin/hackage-scripts/package/"</pre>The function <tt>parseURI</tt> comes from the <em>Network.URI</em> module. It converts a <tt>String</tt> to the <tt>URI</tt> datatype used by the <em>Network.*</em> modules.<br /><br />Next, I need a few functions to fetch an HTML document given a URI:<br /><pre><br />> mkSimpleGet :: URI -> Request<br />> mkSimpleGet uri =<br />> Request uri GET [Header HdrUserAgent (name ++ " v" ++ version)] []<br /><br />> simpleGet :: URI -> IO (Result Response)<br />> simpleGet = simpleHTTP . mkSimpleGet<br /><br />> body :: Result Response -> Either String String<br />> body (Right (Response (2,_,_) _ _ str)) = Right str<br />> body (Right (Response code _ _ _)) = Left $ printCode code<br />> body (Left e) = Left $ show e<br /><br />> printCode :: ResponseCode -> String<br />> printCode (a,b,c) = show a ++ show b ++ show c<br /><br />> errorString :: String -> String -> String<br />> errorString uri err =<br />> "Error getting " ++ uri ++ "\n" ++ "Error: " ++ err</pre><br />The two interesting functions here are <tt>simpleGet</tt> and <tt>body</tt>: <tt>simpleGet</tt> performs an HTTP GET request with the passed-in URI, and <tt>body</tt> extracts the body from the response if it was successful.<br />Now we can start on the HTML manipulation.<br /><pre><br />> type HTML = String<br /><br />> links :: HTML -> [Tag]<br />> links = filter (~== TagOpen "a" []) . parseTags</pre><tt>links</tt> converts an HTML document into a list of link tags, using TagSoup.<br /><br />And then the function <tt>packageInfo</tt> extracts the package name from a link to that package.<pre><br />> type Package = String<br /><br />> packageInfo :: Tag -> Maybe Package<br />> packageInfo (TagOpen "a" []) = Nothing<br />> packageInfo t@(TagOpen "a" attrs) =<br />> case fromAttrib "href" t of<br />> [] -> Nothing<br />> path -> info path<br />> packageInfo _ = Nothing<br /><br />> packageName = "^" ++ basePath ++ "(.+)$"<br /><br />> info :: String -> Maybe Package<br />> info str =<br />> case str =~ packageName of<br />> (_,_,_,[]) -> Nothing<br />> (_,_,_,[package]) -> Just package<br />> (_::(String,String,String,[String])) -> Nothing<br /></pre>And once I have a list of package names, I'll want to grab the web-page describing the package:<pre><br />> packageURI :: Package -> URI<br />> packageURI =<br />> fromJust . parseURI . ((baseURIString ++ basePath) ++)<br /><br />> packageGet :: Package -> IO (Result Response)<br />> packageGet = simpleGet . packageURI<br /></pre>The idea is that I can call <tt>packageGet</tt> on an extracted <tt>Package</tt>, and then I can use the previously defined <tt>body</tt> function to get the HTML out of the HTTP response.<br /><br />Now, let's get on with the <tt>main</tt> function:<pre><br />> main :: IO ()<br />> main = do<br />> arg <- (do {[arg] <- getArgs; return arg})<br />> `catch`<br />> (\_ -> error "Requires a single command line argument")<br />> res <- simpleGet packagesURI<br />> case body res of<br />> Left str -> putStrLn $ errorString (show packagesURI) str<br />> Right html -> findDeps (=~ arg) $ filterJust $ map packageInfo $ links html</pre><br />The <tt>filterJust $ map packageInfo $ links html</tt> bit extracts a list of package names from the HTML list pulled off of hackage. The function <tt>findDeps</tt> takes this list along with a passed in testing function and prints out which packages depend on the package specified at the command line. The passed-in testing function is just a regex-match based on the single command-line argument.<br /><pre><br />> filterJust :: [Maybe a] -> [a]<br />> filterJust xs = [x | Just x <- xs]<br /><br />> findDeps :: (String -> Bool) -> [Package] -> IO ()<br />> findDeps p ps = mapM_ (printIfDep p) ps<br /><br />> printIfDep :: (String -> Bool) -> Package -> IO ()<br />> printIfDep p pTest = do<br />> res <- packageGet pTest<br />> case body res of<br />> Left e -> putStrLn $ errorString pTest e<br />> Right html -><br />> if hasDep html p<br />> then putStrLn pTest<br />> else return ()</pre><br />The function <tt>hasDep</tt> picks the "Dependencies" field out of the passed-in HTML text, and then returns true if the passed-in test returns true on any bit of string in the dependencies field.<br /><pre><br />> hasDep :: HTML -> (String -> Bool) -> Bool<br />> hasDep html p =<br />> let tags = parseTags html<br />> depTags = takeWhile (~/= (TagClose "tr")) $<br />> drop 1 $<br />> dropWhile (~/= (TagText "Dependencies")) $<br />> tags<br />> depText = filterText depTags<br />><br />> filterText xs = [x | TagText x <- xs] :: [String]<br />> in any p depText</pre><br /><br />After saving and compiling, executing <tt>./Main parsec</tt> will (slowly) list all of the packages on Hackage which depend on Parsec. Success!<br /><br />Exercise for the reader: Implement the above functionality by grabbing the <tt>00-index.tar.gz</tt> off of Hackage instead of scraping HTML pages. This file contains all of the .cabal files for every version of every package hosted on Hackage. For bonus points cache the index on disk between calls.http://panicsonic.blogspot.com/2008/02/html-scraping-with-tagsoup.htmlnoreply@blogger.com (Antoine)1tag:blogger.com,1999:blog-1855204231795641244.post-2234558530996441398Tue, 05 Feb 2008 06:18:00 +00002008-02-10T18:15:22.165-06:00haskellParsec as a monad transformerThe proposed Parsec3 package for Haskell has Parsec implemented as a monad transformer, which means I can do things like:<pre><br />> data MyType<br />> = Foo<br />> | Baz<br />> | Err<br />> deriving Show<br /></pre><pre><br />> parseMyType = (string "Foo" >> return Foo)<br />> <|> (string "Baz" >> return Baz)<br /></pre><pre><br />> parseNoBaz = callCC $ \k -> do<br />> result <- parseMyType<br />> validateResult k result<br />> return result<br /></pre><pre><br />> validateResult k Baz = k Err<br />> validateResult k _ = return ()<br /></pre><pre><br />> manyNoBaz = parseNoBaz `sepBy` space<br /></pre><pre><br />> test p s = flip runCont id (runPT p () "test" s)<br /></pre><br />Then, if I execute <tt>test manyNoBaz "Foo Foo Baz Foo"</tt> I get the result:<pre><br />[Foo,Foo,Err,Foo]</pre><br />This is a contrived example, but I think it's pretty neat.http://panicsonic.blogspot.com/2008/02/parsec-as-monad-transformer.htmlnoreply@blogger.com (Antoine)0tag:blogger.com,1999:blog-1855204231795641244.post-7380884476404159976Tue, 01 Jan 2008 19:29:00 +00002008-02-10T18:14:58.744-06:00haskellConstraint synonyms in HaskellHello folks, and happy new year!<br /><br />Earlier today I found myself writing the same sequence of long constraints on my type-signatures over and over again in a Haskell program I was working on. The program is still in flux, so that means the constraints may still change. As all of the functions call each-other, they need to have a similar set of constraints on their type signatures.<br /><br />This means as the program evolves, I'll need to make a lot of similar changes all over the source file. I'm pretty lazy, so that doesn't sound like fun to me. At first I thought I could do something like this with regular type synonyms, but that requires all functions to share their entire type, not just a set of constraints.<br /><br />There are a few ways I could've solved this problem:<ul><li><h5>Don't use type signatures</h5> I'm not using any fancy type-level hacks, so the compiler doesn't really need them. But I like having them to prove to myself that I really do know what my code does, and to provide better error messages.</li><li><h5>CPP Macros</h5> I haven't tried this one - I just thought of it while writing this</li><li><h5>Type Classes</h5> Which is what this post is about</li></ul><br /><br />Let's say I have a number of functions whose type signatures are along the lines of:<pre>> myFunc :: (Eq b, Show b, MyClass b, MyOtherClass b) => Int -> String -> b -> b</pre><br />and I don't like typing the <tt>(Eq b, Show b, MyClass b, MyOtherClass b)</tt> part over and over again. I can define a typeclass which captures all of those constraints:<pre>> class (Eq b, Show b, MyClass b, MyOtherClass b) => MyConstraints b</pre><br />along with a rule to populate the class:<pre>> instance (Eq b, Show b, MyClass b, MyOtherClass b) => MyContraints b</pre><br />I can now re-write the type-signature for <tt>myFunc</tt> as follows:<pre>> myFunc :: MyConstraints b => Int -> String -> b -> b</pre><br /><br />This works for the following reasons:<ul><li>Memebership in the class "MyConstraints" implies membership in all of the other classes, due to the constraint on the class defintion.</li><li>Every type which satisfies the constraints is a member of the "MyConstraints" class.</li></ul><br /><br />As another check, if you load the module defining <tt>myFunc</tt> into GHCi and ask for its type at the interactive prompt, it will report it as <pre>myFunc :: (Eq b, Show b, MyClass b, MyOtherClass b) => Int -> String -> b -> b</pre><br />Which is exactly what I wanted.http://panicsonic.blogspot.com/2008/01/constraint-synonyms-in-haskell.htmlnoreply@blogger.com (Antoine)0tag:blogger.com,1999:blog-1855204231795641244.post-5408203968651496184Sat, 01 Dec 2007 21:43:00 +00002007-12-02T12:39:34.480-06:00haskellBackwards State, or: The Power of LazinessThere's been a recent discussion of Automatic Differentiation in Haskell recently, which somehow found me reading Jerzy Karczmarczuk's paper <a href="http://users.info.unicaen.fr/~karczma/arpap/revpearl.pdf">"Lazy Time Reversal, and Automatic Differentiation,"</a> which then cited Philip Wadler's <a href="http://citeseer.ist.psu.edu/wadler92essence.html">"The essence of functional programming"</a> for the introduction of the <em>backwards</em> state monad, which I reproduce here because I think it's neat.<br /><br />I'm going to assume that you're familiar with the Haskell <a href="http://haskell.org/ghc/docs/latest/html/libraries/mtl-1.1.0.0/Control-Monad-State-Lazy.html">state monad</a> - in summary an action in the state monad is a function of the previous state, and produces a result paired with the next state.<br /><br />The backwards state monad differs from this in that the flow of the state through the execution is revere to the flow of the results - that is, an action in the backwards state monad takes in the final value of the state and produces a result and the initial value.<br /><br />This post is literate Haskell post - you should be able to copy and past it into a <tt>.lhs</tt> file and play with it in a Haskell interpreter. I use GHCi.<br /><br />To that end, here's some of the up-front boilerplate so this all works:<pre><br />> {-# LANGUAGE FlexibleInstances,<br />> MultiParamTypeClasses,<br />> RecursiveDo<br />> #-}<br />> import Data.List<br />> import Control.Monad.State<br /></pre><h2>An Example</h2><br />Here's the exercise: Given a tree of items, transform the tree to a tree of <tt>Ints</tt> such that each element is mapped to an <tt>Int</tt>, starting at <tt>0</tt>. If an element occurs more than once in the tree, it must be mapped to the same <tt>Int</tt> each time.<br /><br />The solution given in <em>Control.Monad.State.Lazy</em> does a walk of the tree, and carries around a list of all of the elements seen so far using the state monad. Each node is mapped to its position in this list. That is, the first node seen is mapped to <tt>0</tt>, the second to <tt>1</tt>, etc..<br /><br />But what if I wanted to switch that up? What if wanted the <em>last</em> node seen in the walk mapped to <tt>0</tt>, the second to last mapped to <tt>1</tt>, and so on? How much would I need to change in the already existing solution given in <em>Control.Monad.State.Lazy</em>?<br /><br />Not much! I'd just need to use the <em>backwards</em> state monad, where the state flows backwards through the thread of execution.<br /><br />This is what the modified solution would look like:<pre><br />> data Tree a = Nil | Node a (Tree a) (Tree a) deriving (Show, Eq)<br />> type Table a = [a]<br /></pre><pre><br />> numberTree :: Eq a => Tree a -> StateB (Table a) (Tree Int)<br />> numberTree Nil = return Nil<br />> numberTree (Node x t1 t2)<br />> = do num <- atomically $ numberNode x<br />> nt1 <- numberTree t1<br />> nt2 <- numberTree t2<br />> return (Node num nt1 nt2)<br />> where<br />> numberNode :: Eq a => a -> State (Table a) Int<br />> numberNode x<br />> = do table <- get<br />> (newTable, newPos) <- return (nNode x table)<br />> put newTable<br />> return newPos<br /><br />> nNode:: (Eq a) => a -> Table a -> (Table a, Int)<br />> nNode x table<br />> = case elemIndex x table of<br />> Nothing -> (table ++ [x], length table)<br />> Just i -> (table, i)<br /></pre>And an evaluation function:<pre><br />> numTree :: (Eq a) => Tree a -> Tree Int<br />> numTree t = evalStateB (numberTree t) []<br /></pre>Some test data:<pre><br />> testTree = Node "Zero" (Node "One" (Node "Two" Nil Nil) (Node "One" (Node "Three" Nil Nil) Nil)) Nil<br /></pre>Executing <code>numTree testTree</code> will produce the output:<br /><code>Node 3 (Node 1 (Node 2 Nil Nil) (Node 1 (Node 0 Nil Nil) Nil)) Nil</code><br />Which is exactly what we wanted!<br /><br />This code is almost exactly the same as the solution given to the in-order problem in the source to <em>Control.State.Lazy</em>, the only changes are the use of the function <code>evalStateB</code> instead of the familiar <code>evalState</code>, and the use of the function <code>atomically</code>, and the <tt>StateB</tt> monad. The implementation of these will be explained bellow.<br /><br />First the API, then the implementation.<br /><h2>The API</h2><br />We have the new monad <tt>StateB s</tt>, where <tt>s</tt> is the type of the stored state.<br /><br /><tt>StateB s</tt> is an instance of <tt>MonadState s</tt>, so <tt>get</tt> and <tt>put</tt> are as expected.<br /><br />There is also:<pre><br />> -- runStateB :: StateB s a -> s -> (a, s)<br />> evalStateB :: StateB s a -> s -> a<br />> execStateB :: StateB s a -> s -> s<br /></pre><br />which should look familiar. The trick is that the state <tt>s</tt> passed in to these functions is the final state, and the state returned is the initial state. In the example above, remember that the last element seen in the walk was given the first label, and the first element seen in the walk was given the last.<br /><br />The default implementation of <tt>modify</tt> in Control.Monad.State.Class is implemented as follows:<br /><pre><br />-- modify :: MonadState s m => (s -> s) -> m ()<br />-- modify f = do<br />-- s <- get<br />-- put (f s)<br /></pre>In the <tt>StateB</tt> monad, this code will bottom-out, because of the circular data dependency of the two monadic actions - in the backwards state monad, <tt>(>>=)</tt> passes the result forward and the state backwards, which means that the above code has a nice loop where the first line grabs the updated state from the second line and tries to pass it in as an argument to the second line.<br /><br />To make this work, we need a version of modify specific to <tt>StateB</tt>:<pre><br />> modifyB :: (s -> s) -> StateB s ()<br /></pre>But if you want to modify the state and return the result, you'll need something more sophisticated:<pre><br />> atomically :: State s a -> StateB s a<br /></pre><tt>atomically</tt> converts an action under the normal state monad to a single action under <tt>StateB</tt>, allowing you do do complex updates to the state easily without bottoming out (using <tt>mdo</tt> notation also works).<br /><h2>Implementation</h2><br />The base of the implementation is taken directly from Wadler's paper.<br /><br />The <tt>StateB</tt> monad is almost the same as the <tt>State</tt> monad - each action of type <tt>a</tt> is a function of type <tt>\s -> (a,s)</tt>. The difference is in the implementation of <tt>(>>=)</tt>.<br /><br />Let's start with the monad:<pre><br />> newtype StateB s a = StateB {runStateB :: s -> (a,s)}<br /><br />> instance Monad (StateB s) where<br />> return = StateB . unitS<br />> (StateB m) >>= f = StateB $ m `bindS` (runStateB . f)<br /></pre>Because wrapping and unwrapping the newtype annoys me, all of that is confined to the exported functions (like <tt>return</tt> and <tt>(>>=)</tt>). The functions that deal directly with the underlying type all have an 'S' suffix.<pre><br />> m `bindS` k = \s2 -> let (a, s0) = m s1<br />> (b, s1) = k a s2<br />> in (b, s0)<br /><br />> unitS a = \s2 -> (a, s2)<br /></pre>As you can see, the passed in state is acted on by the RHS of <tt>bindS</tt>, the intermediate state is consumed by the LHS, and the LHS produces the final state, <tt>s0</tt>. It looks too simple to work, but it does.<br />And the other API functions:<pre><br />> execStateB m = snd . runStateB m<br /><br />> evalStateB m = fst . runStateB m<br /><br />> modifyB = StateB . modify'<br />> where modify' f = \s -> ((), f s)<br /><br />> atomically = StateB . runState<br /></pre>Just for funsies:<pre><br />> instance Functor (StateB s) where<br />> fmap f m = StateB $ mapS f (runStateB m)<br /><br />> mapS f m = \s -> let (a, s') = m s in (f a, s')<br /><br />> instance MonadState s (StateB s) where<br />> get = StateB get'<br />> where get' = \s -> (s,s)<br />><br />> put = StateB . put'<br />> where put' s = const ((),s)<br /><br />> instance MonadFix (StateB s) where<br />> mfix = StateB . mfixS . (runStateB .)<br /><br />> mfixS f = \s2 -> let (a,s0) = (f b) s1<br />> (b,s1) = (f a) s2<br />> in (b,s0)<br /></pre><h2>The transformer</h2><br />Now a treat for those of you still paying attention. I haven't really tested this, but it looks like it should work and that's good enough for me. A lot of this is in the style of the sources for <em>Control.Monad.State.Lazy</em>.<pre><br />> newtype StateBT s m a = StateBT {runStateBT :: s -> m (a,s)}<br /><br />> unitST a = \s -> return (a,s)<br /><br />> m `bindST` k = \s2 -> mdo ~(a,s0) <- m s1<br />> ~(b,s1) <- k a s2<br />> return (b,s0)<br /><br />> execStateBT :: Monad m => StateBT s m a -> s -> m s<br />> execStateBT m s = do ~(_,s') <- runStateBT m s<br />> return s'<br /><br />> evalStateBT :: Monad m => StateBT s m a -> s -> m a<br />> evalStateBT m s = do ~(a,_) <- runStateBT m s<br />> return a<br /><br />> modifyBT :: Monad m => (s -> s) -> StateBT s m ()<br />> modifyBT = StateBT . modify'<br />> where modify' f = \s -> return ((),f s)<br /><br />> atomicallyT :: Monad m => State s a -> StateBT s m a<br />> atomicallyT m = StateBT $ \s-> return $ runState m s<br /><br />> atomicallyTM :: Monad m => StateT s m a -> StateBT s m a<br />> atomicallyTM = StateBT . runStateT<br /><br />> mapST f m = \s -> do ~(a,s') <- m s<br />> return (f a,s')<br /><br />> liftST m = \s -> do a <- m<br />> return (a,s)<br /><br />> mfixST f = \s2 -> mdo ~(a,s0) <- (f b) s1<br />> ~(b,s1) <- (f a) s2<br />> return (b,s0)<br /><br />> instance Monad m => Functor (StateBT s m) where<br />> fmap f m = StateBT $ mapST f (runStateBT m)<br /><br />> instance MonadFix m => Monad (StateBT s m) where<br />> return = StateBT . unitST<br />> (StateBT m) >>= f = StateBT $ m `bindST` (runStateBT . f) <br />> fail = StateBT . const . fail<br /><br />> instance MonadTrans (StateBT s) where<br />> lift = StateBT . liftST<br /><br />> instance MonadFix m => MonadState s (StateBT s m) where<br />> get = StateBT get'<br />> where get' = \s -> return (s,s)<br />> <br />> put = StateBT . put'<br />> where put' s = const $ return ((),s)<br /><br />> instance MonadFix m => MonadFix (StateBT s m) where<br />> mfix = StateBT . mfixST . (runStateBT .)<br /></pre>http://panicsonic.blogspot.com/2007/12/backwards-state-or-power-of-laziness.htmlnoreply@blogger.com (Antoine)0tag:blogger.com,1999:blog-1855204231795641244.post-641554883730184347Wed, 04 Jul 2007 18:54:00 +00002007-07-08T03:17:43.088-05:00haskellAnother quineThis quine is a lot like my <a href="http://panicsonic.blogspot.com/2007/07/hurrah-for-fixed-points.html">first</a> Haskell quine, except shorter.<br />(This one is technically not a quine, due to linebreaks, but it prints a quine when executed)<pre><br />import System.IO<br />main=(putStr.map toEnum)p>>(putStr.show)p>>putStr "\n"<br />p=[105,109,112,111,114,116,32,83,<br />121,115,116,101,109,46,73,79,<br />10,109,97,105,110,61,40,112,<br />117,116,83,116,114,46,109,97,<br />112,32,116,111,69,110,117,109,<br />41,112,62,62,40,112,117,116,<br />83,116,114,46,115,104,111,119,<br />41,112,62,62,112,117,116,83,<br />116,114,32,34,92,110,34,10,<br />112,61]<br /></pre>http://panicsonic.blogspot.com/2007/07/another-quine.htmlnoreply@blogger.com (Antoine)0tag:blogger.com,1999:blog-1855204231795641244.post-8553048342040561175Wed, 04 Jul 2007 03:22:00 +00002007-07-08T03:18:27.835-05:00haskellAnother quine, this time using the printf trickI'm not sure how to make blogger give me a scrollbar to put code in.<br />You'll just have to remove linebreaks where needed.<br /><br /><pre><br />import System.IO<br />import Text.Printf<br />main = let s = "import System.IO%cimport Text.Printf%cmain = let s = %c%s%c<br />in printf s (10 :: Int) (10 :: Int) (34 :: Int) s (34 :: Int) (10 :: Int)%c"<br />in printf s (10 :: Int) (10 :: Int) (34 :: Int) s (34 :: Int) (10 :: Int)<br /></pre>http://panicsonic.blogspot.com/2007/07/another-quine-this-time-using-printf.htmlnoreply@blogger.com (Antoine)0tag:blogger.com,1999:blog-1855204231795641244.post-2817863058167391296Wed, 04 Jul 2007 02:09:00 +00002007-07-04T12:25:36.013-05:00haskellHurrah for fixed points<pre><br />import System.IO<br /><br />-- My first haskell quine, revised<br /><br />main :: IO ()<br />main = do (putStr . map toEnum) prog<br /><br /> (putStr . breaker . show) prog<br /><br /> putStr "\n"<br /><br />breaker :: String -> String<br />breaker = (unwordsBy ',' . f . wordsBy ',') where<br /> f xs<br /> | length xs > 8 = (take 8 xs) ++ (f . g) (drop 8 xs)<br /> | otherwise = xs<br /><br /> g [] = []<br /> g (x:xs) = ("\n " ++ x):xs<br /><br />-- Adapted from "lines" in the GHC List module<br />wordsBy :: Char -> String -> [String]<br />wordsBy _ "" = []<br />wordsBy c s = let (l, s') = break (== c) s in<br /> l: case s' of<br /> [] -> []<br /> (_:s'') -> wordsBy c s''<br /><br />-- Adapted from "unlines" in the GHC List module<br />unwordsBy :: Char -> [String] -> String<br />unwordsBy _ [] = ""<br />unwordsBy c ws = foldr1 (\w s -> w ++ c:s) ws<br /><br />prog :: [Int]<br />prog = [105,109,112,111,114,116,32,83,<br /> 121,115,116,101,109,46,73,79,<br /> 10,10,45,45,32,77,121,32,<br /> 102,105,114,115,116,32,104,97,<br /> 115,107,101,108,108,32,113,117,<br /> 105,110,101,44,32,114,101,118,<br /> 105,115,101,100,10,10,109,97,<br /> 105,110,32,58,58,32,73,79,<br /> 32,40,41,10,109,97,105,110,<br /> 32,61,32,100,111,32,40,112,<br /> 117,116,83,116,114,32,46,32,<br /> 109,97,112,32,116,111,69,110,<br /> 117,109,41,32,112,114,111,103,<br /> 10,10,32,32,32,32,32,32,<br /> 32,32,32,32,40,112,117,116,<br /> 83,116,114,32,46,32,98,114,<br /> 101,97,107,101,114,32,46,32,<br /> 32,115,104,111,119,41,32,112,<br /> 114,111,103,10,10,32,32,32,<br /> 32,32,32,32,32,32,32,112,<br /> 117,116,83,116,114,32,34,92,<br /> 110,34,10,10,98,114,101,97,<br /> 107,101,114,32,58,58,32,83,<br /> 116,114,105,110,103,32,45,62,<br /> 32,83,116,114,105,110,103,10,<br /> 98,114,101,97,107,101,114,32,<br /> 61,32,40,117,110,119,111,114,<br /> 100,115,66,121,32,39,44,39,<br /> 32,46,32,102,32,46,32,119,<br /> 111,114,100,115,66,121,32,39,<br /> 44,39,41,32,119,104,101,114,<br /> 101,10,32,32,102,32,120,115,<br /> 10,32,32,32,32,124,32,108,<br /> 101,110,103,116,104,32,120,115,<br /> 32,62,32,56,32,61,32,40,<br /> 116,97,107,101,32,56,32,120,<br /> 115,41,32,43,43,32,40,102,<br /> 32,46,32,103,41,32,40,100,<br /> 114,111,112,32,56,32,120,115,<br /> 41,10,32,32,32,32,124,32,<br /> 111,116,104,101,114,119,105,115,<br /> 101,32,32,32,32,32,61,32,<br /> 120,115,10,10,32,32,103,32,<br /> 32,91,93,32,32,32,32,61,<br /> 32,91,93,10,32,32,103,32,<br /> 40,120,58,120,115,41,32,61,<br /> 32,40,34,92,110,32,34,32,<br /> 43,43,32,120,41,58,120,115,<br /> 32,10,10,45,45,32,65,100,<br /> 97,112,116,101,100,32,102,114,<br /> 111,109,32,34,108,105,110,101,<br /> 115,34,32,105,110,32,116,104,<br /> 101,32,71,72,67,32,76,105,<br /> 115,116,32,109,111,100,117,108,<br /> 101,10,119,111,114,100,115,66,<br /> 121,32,58,58,32,67,104,97,<br /> 114,32,45,62,32,83,116,114,<br /> 105,110,103,32,45,62,32,91,<br /> 83,116,114,105,110,103,93,10,<br /> 119,111,114,100,115,66,121,32,<br /> 95,32,34,34,32,61,32,91,<br /> 93,10,119,111,114,100,115,66,<br /> 121,32,99,32,115,32,32,61,<br /> 32,108,101,116,32,40,108,44,<br /> 32,115,39,41,32,61,32,98,<br /> 114,101,97,107,32,40,61,61,<br /> 32,99,41,32,115,32,105,110,<br /> 10,32,32,32,32,32,32,32,<br /> 32,32,32,32,32,32,32,32,<br /> 108,58,32,99,97,115,101,32,<br /> 115,39,32,111,102,10,32,32,<br /> 32,32,32,32,32,32,32,32,<br /> 32,32,32,32,32,32,32,32,<br /> 32,32,91,93,32,32,32,32,<br /> 32,32,45,62,32,91,93,10,<br /> 32,32,32,32,32,32,32,32,<br /> 32,32,32,32,32,32,32,32,<br /> 32,32,32,32,40,95,58,115,<br /> 39,39,41,32,45,62,32,119,<br /> 111,114,100,115,66,121,32,99,<br /> 32,115,39,39,10,10,45,45,<br /> 32,65,100,97,112,116,101,100,<br /> 32,102,114,111,109,32,34,117,<br /> 110,108,105,110,101,115,34,32,<br /> 105,110,32,116,104,101,32,71,<br /> 72,67,32,76,105,115,116,32,<br /> 109,111,100,117,108,101,10,117,<br /> 110,119,111,114,100,115,66,121,<br /> 32,58,58,32,67,104,97,114,<br /> 32,45,62,32,91,83,116,114,<br /> 105,110,103,93,32,45,62,32,<br /> 83,116,114,105,110,103,10,117,<br /> 110,119,111,114,100,115,66,121,<br /> 32,95,32,91,93,32,61,32,<br /> 34,34,10,117,110,119,111,114,<br /> 100,115,66,121,32,99,32,119,<br /> 115,32,61,32,102,111,108,100,<br /> 114,49,32,40,92,119,32,115,<br /> 32,45,62,32,119,32,43,43,<br /> 32,99,58,115,41,32,119,115,<br /> 10,10,112,114,111,103,32,58,<br /> 58,32,91,73,110,116,93,10,<br /> 112,114,111,103,32,61,32]<br /></pre>http://panicsonic.blogspot.com/2007/07/hurrah-for-fixed-points.htmlnoreply@blogger.com (Antoine)0tag:blogger.com,1999:blog-1855204231795641244.post-3044746823768573384Sat, 28 Apr 2007 20:45:00 +00002007-07-03T21:12:25.529-05:00haskellsudokuSudoku SolvingI was inspired to write a Sudoku solver in Haskell to solve the Sudoku-related task on <a href="http://projecteuler.net/">Project Euler</a>.<br /><br />The concept behind this solver is that the puzzle is represented by a collection of constraints, and the solver transforms these constraints until the puzzle is solved.<br /><br />This approach works for many, but not all, Sudoku puzzles. At least it's fast!<br /><br />First off: I'll be needing a few libraries to help out<br /><pre><br />> import Data.Array<br />> import Data.List hiding (group)<br />> import Text.ParserCombinators.Parsec<br />> import System.IO<br /></pre><br /><h3>Data Types</h3><br />Let's start with the type I'm using to represent a Sudoku puzzle:<br /><pre><br />> type Sudoku a = Array (Int,Int) [a]<br /></pre><br />Each cell in the Sudoku puzzle is represented by a list of allowed values. A puzzle is solved when each cell can only take a single value.<br /><br />A few functions to operate on the values of a Sudoku cell:<br /><br /><tt>isD</tt> returns true if the supplied list is of length 1.<br /><pre><br />> isD :: [a] -> Bool<br />> isD (_:[]) = True<br />> isD _ = False<br /></pre><br /><tt>stripD</tt> takes in a list of length 1, and returns the single value in the list.<br /><pre><br />> stripD :: [a] -> a<br />> stripD (a:[]) = a<br />> stripD _ = error "List supplied was not of unit length"<br /></pre><br />And now for functions that operate on whole puzzles:<br /><br /><tt>isSolved</tt> returns true if the Sudoku puzzle is solved.<br /><pre><br />> isSolved :: Sudoku a -> Bool<br />> isSolved = and . map isD . elems<br /></pre><br /><br /><tt>getS</tt> returns the contents of the cells specified by the input list, packed with the cell they came from<br /><pre><br />> getS :: Sudoku a -> [(Int,Int)] -> [((Int,Int),[a])]<br />> getS s ixs = [(ix, (s ! ix)) | ix <- ixs]<br /></pre><br />I drag the indices around for ease of reconstruction back into an array.<br /><h3>Solving</h3><br />A Sudoku puzzle is initially presented with some cells filled in, and some cells empty.<br /><br />In this solver, each cell of the Sudoku data-type contains a list of values that cell may take. So initially, each cell would either contain a list with a single element, or the list of numbers from 1 to 9.<br /><br />The solver applies rules to groups of cells to add further constraints to the cells:<br /><pre><br />> type Rule a = [[a]] -> [[a]]<br /></pre><br />The solver takes a list of such rules, and applies them until they have no further effect. The rules that come first in the list are tried first. Once a rule succeeds in doing something, the solver then goes back to the beginning of the list.<br /><pre><br />> solveWithRules :: Eq a => [Rule a] -> Sudoku a -> Sudoku a<br />> solveWithRules [] s = s<br />> solveWithRules x s = helper x s where<br />> helper [] s = s<br />> helper (r:rs) s = if isSolved nextS<br />> then nextS<br />> else if s == nextS<br />> then helper rs nextS<br />> else helper x nextS<br />> where nextS = applyRule r s<br /></pre><br /><tt>applyRule</tt> takes a single rule and applies it to a Sudoku puzzle:<br /><pre><br />> applyRule :: Eq a => Rule a -> Sudoku a -> Sudoku a<br />> applyRule r = foldr1 (.) $ map (apply r) [group, column, row]<br /></pre><br />There are three interesting partitions of the cells of a Sudoku grid:<br /><ul><li>By row</li><li>By column</li><li>By 3x3 sub-grid</li></ul><tt>applyRule</tt> applies the specified rule, in turn, to each of row, column, and sub-grid.<br /><pre><br />> column, row, group :: Int -> [(Int,Int)]<br />> column n = [(x,n) | x <- [1..9]]<br />> row n = [(n,x) | x <- [1..9]]<br />> group n = [(x,y) | x <- [rowL !! (n-1)..(rowL !! (n-1))+2], <br />> y <- [columnL !! (n-1) .. (columnL !! (n-1))+2]] where<br />> rowL = [1,1,1,4,4,4,7,7,7]<br />> columnL = [1, 4, 7, 1, 4, 7, 1, 4, 7]<br /></pre><br />The function <tt>apply</tt> takes the rule and the partition, and performs the application of the rule:<br /><pre><br />> apply :: Eq a => Rule a -> (Int -> [(Int,Int)]) -> Sudoku a -> Sudoku a<br />> apply r partition = \x -> array ((1,1),(9,9)) <br />> $ concat <br />> $ map (liftE r)<br />> $ map (getS x)<br />> $ map partition [1..9]<br /></pre><br />The function <tt>liftE</tt> is needed to convert a rule (which is of type <tt>[[a]] -> [[a]]</tt>) to be of type <tt>[((Int,Int),[a])] -> [((Int,Int),[a])]</tt><br /><pre><br />> liftE :: ([a] -> [b]) -> [(i, a)] -> [(i, b)]<br />> liftE f = \x -> zip (fst (unzip x)) $ f $ snd (unzip x)<br /></pre><br />So I've laid out all of the mechanisms for applying rules to Sudoku puzzles. the next step is to define these rules.<br /><h3>Rules</h3><br /><br />All of the rules analyze a set of Sudoku cells, and further constrain the values of those cells (or do nothing).<br /><br />Here's an example, rule <tt>a1</tt>:<br /><pre><br />> a1 :: Eq a => Rule a<br />> a1 x = map helper x where<br />> helper (a:[]) = a:[]<br />> helper as = [b | b <- as, not $ elem b definites]<br />> definites = [stripD y | y <- x, isD y]<br /></pre><br />Put simply: If a cell is fully constrained to a value, no other cell is allowd to take that value.<br /><br />This reduction rule can be taken further: If two cells are jointly constrained to two values, no other cells may take those values, if three cells are jointly constrained to three values ... and on and on.<br /><br />In order to apply the generalized form of rule <tt>a1</tt>, I'll need to figure out every way I can break a list of nine Sudoku cells into two groups:<br /><pre><br />> combos :: Integral n => [([n],[n])]<br />> combos = map (\x -> (x,[0..8]\\x)) $ powerset [0..8]<br /></pre><br /><tt>combos</tt> is a list of pairs of lists, representing every way to split a list of nine elements into two groups. Using a set of puzzle cells as an accumulator, I can fold across this list (or portions of it).<br />But first:<br /><pre><br />> powerset :: [a] -> [[a]]<br />> powerset [] = [[]]<br />> powerset (x:xs) = let p = powerset xs in p ++ map (x:) p<br /></pre><br />I didn't come up with this implementation of powerset, and I don't remember who did. Sorry!<br /><br />Also, I'm really only interested in a portion of the <tt>combos</tt> list at any given time:<br /><pre><br />> subCombos n = filter ( (==n) . fromIntegral . length . fst) combos<br /></pre><br /><tt>subCombos n</tt> represents all of the ways to divide up a list into two groups such that one of the groups has <tt>n</tt> elements.<br />And here is the "super" rule:<br /><pre><br />> a :: (Eq a, Integral n) => n -> Rule a<br />> a n = \y -> foldr helper y (subCombos n) where<br />> helper (a,b) x = if length (valuesIn a x) == length a<br />> then mapIndices (remove (valuesIn a x)) b x<br />> else x<br /></pre><br /><tt>(a 1)</tt> should produce the same result as <tt>a1</tt>. In practice, <tt>(a 1)</tt> is slower than <tt>a1</tt>.<br />I've used a few new functions up there:<br /><pre><br />> valuesIn :: (Integral n, Eq a) => [n] -> [[a]] -> [a]<br />> valuesIn a x = nub . concat . map ((x!!) . fromIntegral) $ a<br /></pre><br />Given a list of indices and a list of constraints, <tt>valuesIn</tt> returns a flat list of the values specified by the constraints at the indices given.<br /><pre><br />> remove :: Eq a => [a] -> [a] -> [a]<br />> remove a bs = [x | x <- bs, not (elem x a)]<br /></pre><br /><tt>remove</tt> takes in two lists, and returns a list composed of elements in the second list but not in the first. There's probably something a lot like this in the standard library.<br /><pre><br />> mapIndices :: Integral n => (a -> a) -> [n] -> [a] -> [a]<br />> mapIndices f ns as = helper 0 f ns as where<br />> helper _ _ [] as = as<br />> helper _ _ _ [] = []<br />> helper i f ns (a:as) = if elem i ns<br />> then (f a):(helper (i+1) f (delete i ns) as)<br />> else a:(helper (i+1) f ns as)<br /></pre><br /><tt>mapIndices</tt> is a lot like map, but it passes through any list elements which are not at the indices specified.<br /><br />Those are the rules!<br /><h3>Parsing</h3><br />Parsec may be a bit much for this. Ah well.<br /><br />A Sudoku puzzle consists of a bunch of numbers. Whitespace is ignored. Any non-numeric text preceding the numbers is ignored. The parser also consumes any non-numeric text following the Sudoku puzzle.<br /><pre><br />> parseSudoku :: (Integral a, Read a) => Parser (Sudoku a)<br />> parseSudoku = do many (noneOf nums)<br />> x <- many1 (do y <- digit<br />> many space<br />> return y)<br />> many (noneOf nums)<br />> return $ sudoku $ map (read . return) x<br />> where nums = ['0'..'9']<br /></pre><br />I use a function <tt>sudoku</tt> in there. It creates a Sudoku puzzle from a list a more-friendly way:<br /><pre><br />> sudoku :: Integral a => [a] -> Sudoku a<br />> sudoku xs | (length xs == 81) = array ((1,1),(9,9))<br />> $ zip [(x,y)| x <- [1..9], y <- [1..9] ]<br />> $ map value xs<br />> | otherwise = error "Suduko must be constructed of 81 entries."<br /></pre><br />Here I use the <tt>value</tt> function:<br /><pre><br />> value :: Integral a => a -> [a]<br />> value 0 = [1..9]<br />> value a = a:[]<br /></pre>That covers parsing. It's time to look at the <tt>main</tt> function:<br /><br /><h3>Main</h3><br /><pre><br />> main :: IO ()<br />> main = do x <- hGetContents stdin<br />> case (parse (many1 parseSudoku) "" x) of<br />> Left err -> do putStr "Error reading input"<br />> print err<br />> Right a -> foldr1 (>>) $ map (printSudoku . solveWithRules rules) a<br />> where rules = [a1, a 6, a 2]<br /></pre><br />Here's the breakdown:<br /><ol><li>I parse the standard input to a list of Sudoku puzzles</li><li>For each Sudoku, I attempt to solve them with a specified set of rules</li><li>Then I print them</li></ol>Simple!<br /><br />Here's how printing is handled:<br /><pre><br />> printSudoku :: Show a => Sudoku a -> IO ()<br />> printSudoku = printEntries . map unValue . elems<br /><br />> unValue :: Show a => [a] -> Char<br />> unValue (a:[]) = head (show a)<br />> unValue [] = '/'<br />> unValue _ = '_'<br /><br />> printEntries :: String -> IO ()<br />> printEntries [] = putStrLn "\n"<br />> printEntries s = do putStrLn (take 9 s)<br />> printEntries (drop 9 s)<br /></pre><br /><br /><tt>elems</tt> takes an array and returns it's contents as a list. I then replace the contraint lists with single characters, and then I print them out nine at a time.<br /><h3>Test cases</h3><br />Feed the remainder of this post on <tt>stdin</tt> into the compiled version of this post to try out the solver.<br /><pre>Puzzles from websudoku.com<br />Rating: Hard<br /><br />000070009<br />400080001<br />093000008<br />040006200<br />010758040<br />006300080<br />700000560<br />600020004<br />500090000<br /><br />+++<br /><br />002150000<br />603002000<br />000003028<br />001006070<br />950000061<br />080400300<br />210600000<br />000200607<br />000039200<br /><br />+++<br /><br />200100730<br />000490020<br />060008000<br />006000004<br />940070012<br />800000900<br />000300090<br />050026000<br />037009005<br /><br />+++<br /><br />003040800<br />000006130<br />000350042<br />300000000<br />147000365<br />000000004<br />890025000<br />026100000<br />005080400<br /><br />+++<br /><br />040006003<br />000705800<br />000800460<br />020008007<br />006000200<br />800400050<br />039004000<br />007502000<br />200100040<br /><br />+++<br /><br />000900103<br />010607000<br />080000020<br />036080007<br />700109008<br />800070410<br />040000030<br />000204090<br />605001000<br /><br />+++<br /><br />090800701<br />060010000<br />000900200<br />038709005<br />000000000<br />400502680<br />004005000<br />000040060<br />105008070<br /><br />+++<br /><br />200600970<br />030000050<br />640007001<br />000003006<br />001502700<br />500400000<br />900200037<br />050000090<br />012009005<br /><br />+++<br /><br />080000000<br />000430090<br />100052006<br />761500030<br />005000100<br />030001975<br />200140009<br />040087000<br />000000040<br /><br />+++<br /><br />090800700<br />080001060<br />000050200<br />020500040<br />708010603<br />030006020<br />005020000<br />010300080<br />009007010<br /><br />+++<br /><br />070902000<br />050040107<br />400100082<br />086000000<br />200000006<br />000000290<br />830005004<br />607020010<br />000403070<br /><br />+++<br /><br />010008000<br />785030001<br />000001950<br />000020004<br />043000210<br />200050000<br />094700000<br />800090427<br />000400030<br /><br />+++<br /><br />003000000<br />100967002<br />020300068<br />609000010<br />040000050<br />010000409<br />450001080<br />200685001<br />000000600<br /><br />+++<br /><br />063078001<br />098500000<br />040000000<br />000250100<br />025000680<br />004061000<br />000000090<br />000004720<br />300680410<br /><br />+++<br /><br />000005800<br />090230070<br />730001002<br />100000085<br />600000001<br />920000004<br />200500067<br />060092010<br />001300000<br /><br />+++<br /><br />003059408<br />000810009<br />000000050<br />009002006<br />012000570<br />600300100<br />070000000<br />900024000<br />104980700<br /><br />+++<br /><br />580700200<br />000405080<br />000000007<br />079106000<br />062000150<br />000203470<br />600000000<br />020609000<br />008002015<br /><br />Wow!</pre>http://panicsonic.blogspot.com/2007/04/sudoku-solving.htmlnoreply@blogger.com (Antoine)0