Over time, I’ve heard multiple people in the Rust ecosystem ask for a library like Nokogiri in Ruby or lxml in Python, for parsing and serializing HTML and XML documents, traversing, manipulating, and querying the tree, etc. This can be useful for scraping web pages, testing web apps, pre-processing documents, …

This library does not exist. Yet.

The name

Nokogiri (鋸) is the Japanese work for saw, the wood-cutting tool.

It may be premature since there is no code to attach to it, but I’ve had some fun looking in my Japanese dictionary. I’d like this Rust library to be called Kuchiki (朽木). It is the Japanese word for decayed tree or rotted tree, which seems appropriate for a Rust library for tree manipulation.

According to our rustacean friend Tetsuharu / @saneyuki, the word is much more interesting than I initially thought!

朽木 has been used as the family name, and the word has some “nihilistic”, “seasoned” nuances in Japanese.

The word is felt as a mossy garden like Koke-dera and others http://en.wikipedia.org/wiki/Saihō-ji_(Kyoto) … http://tabijikan.jp/2014/06/12/2579/ …

I don’t think it’s incongruity/taboo word in Japanese, and feel it is compatible with the style of “Rust”

Big pieces

Kuchiki does not exist yet, but some of the more important components do.

html5ever is an HTML parser written in Rust. We use it in Servo and we maintain it, but it was designed as an independent library that other projects can use. It’s good.

Some people might be interested in XML in addition to HTML. I’m not familiar with either of them, but both xml-rs and RustyXML seem to be active. xml-rs is on crates.io.

We’ve just exctracted Servo’s CSS Selector matching code into an independent library, rust-selectors, that I will be maintaining. The API needs a lot of work to be made nicer (it was not designed for use outside of Servo), but it exists.

What’s left to do

Improve the rust-selectors API. I will work on this… at some point in time.

Define a good tree representation. html5ever has two of them in its src/sink directory which can be used as a starting point, but Kuchiki will probably want many more convenience methods for traversing and modifying the tree.

directory which can be used as a starting point, but Kuchiki will probably want many more convenience methods for traversing and modifying the tree. Glue everything together

A million other things I’m not thinking of right now

Lead and maintain the project

Who’s in?

I can help if some specific areas, but unfortunately Servo does not leave me enough time to lead this project. Who’s interested?