W3C

PDF files have long been an awkward fit with the Web, but a new project from the developers of Firefox shows how online PDFs are changing for the better.

For years, the only way to view them was with viewer software from Adobe Systems, which created the Portable Document Format in the 1990s. Clicking a link to a PDF often meant a wait as the software loaded, followed by an alien interface, framed within the browser window, that meant actions like searching and printing were different. It's faster today, but PDFs still don't feel like native Web documents.

But PDF has become an international standard, and now PDFs are becoming less obstreperous. Google started indexing PDF content and showing PDFs in search results years ago, helping to ensure their utility on the Internet. And browsers have begun handling them better, too.

Google's Chrome, for example, added a PDF reader directly into the browser so that Adobe Reader, Mac OS X's Preview, or other third-party applications aren't required. (Well, except in cases where Chrome's plug-in isn't up to snuff; happily, it now sometimes warns you when a PDF has elements it can't handle.) Chrome is tackling the performance issue, too, making a PDF reader plug-in that uses the Native Client software technology.

Now Mozilla has begun a project of its own called pdf.js: a PDF reader that uses Web technology, not native software, to render PDFs in the browser. Eventually it will be built directly into Firefox, said programmer Andreas Gal in a blog post last week.

Thus, while Google is working on native-code PDF abilities--software tailored for a specific processor--Mozilla is working on an approach that uses the browser's engine instead.

We intend to use pdf.js to render PDFs "natively," within Firefox itself. Our most immediate goal is to implement the most commonly used PDF features so we can render a large majority of the PDFs found on the web. We believe we can reach that point in less than 3 months (the entire code so far is less than one month old, and it already renders a large set of PDF features). Initially we will make a Firefox extension available to interested users that enables inline PDF rendering using pdf.js, but our ultimate goal is of course shipping pdf.js with Firefox. This will result in a substantial usability but also security improvement for our users. pdf.js uses only safe Web languages and doesn't contain any native code pieces attackers could exploit.

Indeed, security has been a problem for PDF reading on the Web. Adobe's widely used free Reader software needs regular attention as new security vulnerabilities are uncovered, some of zero-day problems that emerge before a patch is ready. Browser technology is by no means immune to security problems, but Web applications don't get the same privileges granted to native software, so that makes attacks harder.

The project uses JavaScript, the programming language of Web pages and Web applications, to interpret the PDF coding. It should be noted that Gal has been involved for years in improving Firefox's JavaScript execution speed. Another Web standard in use is the HTML5 Canvas technology for two-dimensional drawing.

For a look at how well the project compares to other PDF rendering software, check at the screenshots below.

Canvas is fast, something Mozilla likes given the sour sentiments that often arise at the prospect of loading a PDF. But it's got drawbacks, too, said Chris Jones in a blog post. For one thing, it's a low-level interface that doesn't easily let people select text. For another, high-quality printing is hard.

To get around those drawbacks, Mozilla also might use a PDF renderer using another Web technology, Scalable Vector Graphics (SVG). The idea is to render a quick version using Canvas, then swap in a more elaborate SVG-based version after it's been created, Jones said, mentioning that other approaches are possible, too.

To gauge progress, people can open a Web-based version of pdf.js showing a 2009 research paper about JavaScript that Gal and others wrote. Ordinarily I'd include a parenthetical warning to readers that they link leads to a PDF, but in this case, it leads to an ordinary Web page that shows a PDF.

Mozilla hopes the pdf.js will improve people's experience with PDFs, but ultimately help phase out the technology, too.

"It's important to note that we're not trying to promote PDF to a first-class web citizen like HTML5 is," Gal said. "Instead we hope that a browser-native PDF renderer written on the Web platform allows Web technologies to subsume PDF."

Perhaps the work will make PDF fade into the background. But people use PDFs for its advantages in formatting flexibility, archiving information in a standard file format, and sharing documents across a variety of operating systems and programs.

It seems possible to me, therefore, that Mozilla work to make PDFs easier and safer to use on the Web might actually strengthen the technology's position.

screenshot by Stephen Shankland/CNET

screenshot by Stephen Shankland/CNET