Oxidizing sourmash: WebAssembly

27 Aug 2018

sourmash calculates MinHash signatures for genomic datasets, meaning we are reducing the data (via subsampling) to a small representative subset (a signature) capable of answering one question: how similar is this dataset to another one? The key here is that a dataset with 10-100 GB will be reduced to something in the megabytes range, and two approaches for doing that are:

The user install our software in their computer. This is not so bad anymore (yay bioconda!), but still requires knowledge about command line interfaces and how to install all this stuff. The user data never leaves their computer, and they can share the signatures later if they want to.

Provide a web service to calculate signatures. In this case no software need to be installed, but it's up to someone (me?) to maintain a server running with an API and frontend to interact with the users. On top of requiring more maintenance, another drawback is that the user need to send me the data, which is very inefficient network-wise and lead to questions about what I can do with their raw data (and I'm not into surveillance capitalism, TYVM).

But... what if there is a third way?

What if we could keep the frontend code from the web service (very user-friendly) but do all the calculations client-side (and avoid the network bottleneck)? The main hurdle here is that our software is implemented in Python (and C++), which are not supported in browsers. My first solution was to write the core features of sourmash in JavaScript, but that quickly started hitting annoying things like JavaScript not supporting 64-bit integers. There is also the issue of having another codebase to maintain and keep in sync with the original sourmash, which would be a relevant burden for us. I gave a lab meeting about this approach, using a drag-and-drop UI as proof of concept. It did work but it was finicky (dealing with the 64-bit integer hashes is not fun). The good thing is that at least I had a working UI for further testing

In "Oxidizing sourmash: Python and FFI" I described my road to learn Rust, but something that I omitted was that around the same time the WebAssembly support in Rust started to look better and better and was a huge influence in my decision to learn Rust. Reimplementing the sourmash C++ extension in Rust and use the same codebase in the browser sounded very attractive, and now that it was working I started looking into how to use the WebAssembly target in Rust.

WebAssembly?

From the official site,

WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable target for compilation of high-level languages like C/C++/Rust, enabling deployment on the web for client and server applications.

You can write WebAssembly by hand, but the goal is to have it as lower level target for other languages. For me the obvious benefit is being able to use something that is not JavaScript in the browser, even though the goal is not to replace JS completely but complement it in a big pain point: performance. This also frees JavaScript from being the target language for other toolchains, allowing it to grow into other important areas (like language ergonomics).

Rust is not the only language targeting WebAssembly: Go 1.11 includes experimental support for WebAssembly, and there are even projects bringing the scientific Python to the web using WebAssembly.

But does it work?

With the Rust implementation in place and with all tests working on sourmash, I added the finishing touches using wasm-bindgen and built an NPM package using wasm-pack : sourmash is a Rust codebase compiled to WebAssembly and ready to use in JavaScript projects.

(Many thanks to Madicken Munk, who also presented during SciPy about how they used Rust and WebAssembly to do interactive visualization in Jupyter and helped with a good example on how to do this properly =] )

Since I already had the working UI from the previous PoC, I refactored the code to use the new WebAssembly module and voilà! It works!. But that was the demo from a year ago with updated code and I got a bit better with frontend development since then, so here is the new demo:

sourmash + Wasm Drag & drop a FASTA or FASTQ file here to calculate the sourmash signature. k-mer size: scaled: number of hashes: Input type: DNA/RNA Protein Track abundance? Download

For the source code for this demo, check the sourmash-wasm directory.

Next steps

The proof of concept works, but it is pretty useless right now. I'm thinking about building it as a Web Component and making it really easy to add to any webpage.

Another interesting feature would be supporting more input formats (the GMOD project implemented a lot of those!), but more features are probably better after something simple but functional is released =P

Next time!

Where we will go next? Maybe explore some decentralized web technologies like IPFS and dat, hmm? =]

Comments?

Updates