The latest break both Urban Dictionary and Fail++ had were caused by HTML markup changes on the respective sites.

I kept putting off being able to push parsing logic from my server because I was lazy and kept thinking “I will do it next release”. Well – the changes caught me with my pants down – the inability to update parsing “on the fly” required me to rush and publish a new app and wait for a few days until the marketplace updated it. A great option for getting data off HTML is RegEx – the Windows Phone platform supports it and it works pretty well… If you are smart enough or patient enough to use it on HTML. I am neither.

So instead, I decided to add XPath support for the great Html Agility Pack parser that exists on CodePlex. I understand XPath and I generally don’t feel like my brain explodes when I author it.

Here’s the repackaged code for your consumption. There are a few caveats:

It does not support all axis types – specifically, it does not support “previous node”. It does not support all XPath functions – you can see a full list in the FunctionNode.cs file. It does support a few functions that are not in the xpath spec. Some string parsing functions, RegEx support and .OuterHtml/.InnerHtml support among others. It probably won’t pass any XPath tests, but then again, it’s working on HTML and not XPath, so I am not to worried about it.

I am going to upload this to CodePlex as soon as it’s back online.

The sample app inside the solution has a few simple examples, but overall, it’s what you would expect.

Oh, also, I didn’t write a lot of comments – sorry about that.

This project uses the XPathParser project on codeplex as source code and references version 1.4 of HAP (the binary is in the package – I did not test it with more recent versions).

Download