At PDFTron, we seek to provide customers the fastest way to view, annotate, form-fill, and edit documents across every platform. Thus when we move to a new technology, we must find out how well it performs -- and make it as fast as possible. Our focus on performance, accuracy, and reliability has earned us significant clients, including Fortune 500 companies, and helped us to secure US$71 million in growth funding earlier this year.

In 2017, we ported our entire native PDF SDK to JavaScript using WebAssembly for Edge, Firefox, iOS, and Android Chrome -- just a few months after WebAssembly (a.k.a. Wasm) was released. We combined it with PNaCl and Asm.js, both part of our solution since 2015, to guarantee the best pure client-side web viewing and editing experience across every browser.

Now that Google Chrome is the first to support WebAssembly multithreading, we were curious about its potential to further enhance our UX. Therefore, we benchmarked WebAssembly threads to see how it would render compared to our current implementation using PNaCl. We needed to find out: Could threaded WebAssembly perform faster than its Chrome precursor PNaCl? And could it handle the memory requirements of our customers’ most demanding documents?

Try the Benchmark Tool!

You can use the benchmark tool to test the performance of Wasm threading vs. PNaCl with your own documents. The app runs entirely in the browser, so you will need to use the latest version of Chrome.

(Note: Tiered WebAssembly compilation can sometimes interfere with the results. Since we wanted to compare top rendering speed, not first load time, you may notice a small delay before the benchmark runs.)

linkBackground

The core purpose of Wasm and its technological precursors PNaCl and Asm.js is to allow execution of native code on the web. The eventual goal is to have performance very close to that of a native app -- running on websites.

When first introduced in 2017, WebAssembly (a.k.a. Wasm) didn’t support multithreading -- and the feature would have been hugely beneficial to our customers. Some of their documents are massive, including reports over a thousand pages long and files exceeding 1GB.

Since WebAssembly could not initially deliver the capabilities needed to ensure smooth performance with all documents, we continued to use PNaCl for Chrome. This proved technically superior as PNaCl (Google’s Portable Native Client) supported multithreading. Additionally, PNaCl supported powerful caching and memory management capabilities -- allowing us to stream linearized PDF content into Chrome and have users open documents as large as 2GB in seven seconds.

Browser vendors had several understandable reservations about PNaCl, however. Crucially, PNaCl was not well-specified being based on LLVM bitcode. It was also seen as proprietary technology. Thus the writing has been on the wall for some time -- and as of last month, PNaCl was switched off within Chrome 76 in favor of WebAssembly.

A number of applications out there still use PNaCl on Chrome -- included among them is PDFTron’s client-only WebViewer solution, where PNaCl is used when available.

linkWhere Threaded Wasm is Faster

For us to justify switching to WebAssembly on Chrome to our customers, threaded WebAssembly would have to be at least as fast as PNaCl.

Wasm loads and initializes our viewer library a bit faster the first time a user views a document due to what is referred to as Tiered Compilation. Afterwards, the viewer can be cached client side for when a user loads their second or third document.

Additionally, Threaded Wasm seems to perform basic math operations such as addition, multiplication, and division a little faster than PNaCl.

linkWhere Wasm is Slower

While adding threads to Wasm is a significant step forward performance-wise, the rendering speed of threaded WebAssembly still cannot match PNaCl for many real-world use cases.

Run the benchmark and see the difference firsthand using our test documents or by uploading your own. For our simple and moderately complex test documents, PNaCl renders faster (62% and 23% faster respectively). Measured in fractions of a second, this difference is small in practical terms and thus likely to go unnoticed.

The large and complex test documents, however, captured a significant difference. PNaCl rendered at double the speed (122% faster) -- on the order of a full second or more.

This performance difference is a major concern for us, as switching to threaded WebAssembly will result in a noticeably slower speeds for our Chrome users with their most demanding documents.

linkOther Major Issues

When evaluating threaded WebAssembly on Chrome, we also encountered three other major issues that can significantly impact our customer UX:

linkChrome does not support growth of memory for Threaded WebAssembly

Due to an outstanding issue, threaded WebAssembly on Chrome cannot yet grow memory. This is a huge issue for us. Total memory usage can’t be known ahead of time as it depends heavily on what documents are loaded as well as the kinds of processes users perform.

Furthermore, allocating a large memory chunk up front when it may not be needed will often lead to resource allocation issues elsewhere. Using a gigabyte of memory, for example, when you may only need 50 megabytes, would unnecessarily impact the user.

linkThreaded WebAssembly lacks key memory management capabilities

A second major issue is that threaded Wasm currently lacks many of PNaCl’s key memory management capabilities.

Like native applications, PNaCl runs in a separate process from Chrome and thus possesses its own control over memory and is not subject to Chrome’s memory allocation limits. In contrast, threaded WebAssembly must store all memory in a large JavaScript memory buffer (SharedArrayBuffer). As a result, any memory we allocate to WebAssembly threads subtracts from the total memory that could be used elsewhere in the website.

These memory limitations can be worked around (by storing the memory in storage like IndexedDB) and mitigated by adjusting the code to load from a separate buffer. But all of these options add further performance overhead and complexity to the program.

linkProper Caching of Modules

A final issue relates to module caching. Threaded WebAssembly has not yet implemented the proper caching of modules present in PNaCl, and there is at present zero workaround.

Therefore, every time you need to reload a module with threaded WebAssembly, such as when a page is loaded a second time, users face longer delays and heavier CPU usage. Additionally, tiered compilation kicks in to slow down initial performance -- when this could be completely avoided with proper module caching.

linkConclusion

It’s still early days for threaded WebAssembly, and thus we are still very optimistic that performance will improve in the months ahead. Already, the Chromium team are on top of issues related to module caching as well as growth of native memory.

We wanted to thank browser vendors and the W3C WebAssembly working group for all their hard work on advancing Wasm and implementing WebAssembly threads. Additionally, we wanted to communicate that we are always open to collaborating with you to make the web platform and our products faster.

linkHow To Set Up the PNaCl Origin Trial

If you still wish to use PNaCl on your site, this is possible using the PNaCl origin trial, which was extended to March 10, 2020. In order to set this up, see the developer guide. As you can see in the guide, it is necessary to obtain an origin trial token and then include it either as a meta tag in the parent html page or in the HTTP response headers provided by your server. These instructions are relevant for any application making use of PNaCl -- including any application based on the client-only WebViewer.

Whether you’re a browser vendor or a customer, don’t hesitate to get in touch with me directly at david@pdftron.com or on Twitter (@DavidTippett12). We also have our official Twitter account (@PDFTron). We’re always looking for feedback about our articles or our PDF SDK, which we constantly strive to make more reliable, accurate, and fast across every platform.