Blog Archive

Monday, December 27, 2010

Editing XML in Haskell

edit: Here's an hpaste for the solution I eventually came up with:

I recently found myself wanting to make small tweaks to an XML file not under my control before processing it in my Haskell app.

My end goal would be a way to a) declare what in the XML I want to edit and then b) apply a function to the located element and have it update in place.

The hxt (Haskell Xml Toolkit) has a lot of pieces, and includes and XPath parser, so I looked at it first. But like XSLT, hxt seems geared towards XML processing - not editing. It has tools for applying an translation recursively through a tree, but that requires finding an element based on a predicate on the element, not based on it's location within the document. hxt does let you extract elements based on location, but I don't see how to put the original document back together again. Maybe I'm missing something.

Then I noticed that the xml package (sometimes referred to as xml-light) has a Cursor written for it! So all I need to do is navigate the cursor down to where I need it, apply the update function and then I'm done. That's declarative enough for me.

There were two problems with this:
  1. A lot of the cursor manipulation functions return maybe types
  2. I didn't feel comfortable composing functions on cursors - if a sub-function goes off the deep end it could have left the cursor anywhere in the document DOM

So I did what any other Haskell programmer would do - I wrote my own XML editing monad to fix and then encapsulate the problems above.
> data Update a = ...

The Update type has instances for Monad, Functor, and Applicative - and to handle the failing traversal functions it has instances for Alternative and MonadPlus.

It has the primitive operations:

> runUpdate :: Cursor -> Update a -> Maybe (a, Cursor)

> perform :: (Cursor -> Maybe Cursor) -> Update ()

> asks :: (Cursor -> a) -> Update a

These are used to wrap up all of the functions from Text.XML.Light.Cursor in a straightforward way.

To give me more control over the composition of cursor update there are also the following primitives:

> sandbox :: Update a -> Update a

> run :: Update a -> Update ()

The function 'sandbox' executes the passed in action, but contains it to the current scope of the cursor - the action may not access the parent or siblings of the current node. In addition, the cursor is returned to its current position regardless of where the passed in action left it.

The function 'run' is the same as 'sandbox' - except that if the passed action fails we pretend that nothing happened.

So go nuts - you can now declare edits and traversals without worrying about how to fit them in to the bigger picture. We have combinators for that.

Am I off the deep end with this? Are there other tools on Hackage I should be using?


lpsmith said...

Sounds neat! However, is this show and tell, or just tell?

Antoine said...

Hah hah ... I can show too. Let me get a link to hpaste in the article.

mightybyte said...

FYI: The Hexpat XML library also has a cursor interface. It was actually ported from the xml package.



  • House
  • Ride Back