Under ministerial memo n. 2581/2014, all Italian schools are required to only choose textbooks that either are available as ebooks or offer supplementary digital content (also referred to using the extremely technical term mixed textbooks). Now, given that the average Italian secondary school student will incur each year in ~300€ of expenses for their (still largely paper-based) textbooks, you would expect the digital part of the school publishing industry to be much more developed than it currently is.

It turns out that digital textbooks aren’t a bad idea: a quick search shows that only buying the digital edition of my math textbook would have saved me more than 40€ over the course of 5 years, which, considering I attend 11 different courses, sums up to about 440€, without taking into account that most of those courses require more than one textbook per annum. This alone would mean skipping about 1.5 years of expenses in a 5 year scholastic cycle. Adding to that not having to actually carry those books around, it’s easy to see why it would be tempting to ditch print books altogether, were it not for the fact that the technologies publishers have developed for their digital offering more than make up for the reduced hassle.

More often than not, a digital textbook is actually more of a suite of content that nobody asked for, malfunctioning and ghost-town empty “digital classrooms” and eventually the actual textbook, all delivered to you by a proprietary, closed source reader that provides basically the same features of all other proprietary readers: a PDF viewer, a video player, some kind of highlighting/note-taking feature. And obviously, an attempt at implementing some DRM protection: no native way to backup the book you bought or to use your favourite document reader. Hell, some of those readers don’t even allow you to copy and paste.

So, at the start of this scholastic year, I decided to take the ebooks that came with my print-mixed-books for a ride and see how far I could get in getting a standard PDF file out of them.

Introducing the opponents

While there are many publishers on the Italian textbook market and each of them implements their own (or third-party) flavour of DRM abandonware, most of my books where from three companies:

Zanichelli , which relies on the BooktabZ platform. It is probably the best one of all these and it’s almost usable. The viewer usually works fine, but you have to choose between seeing two pages side by side zoomed out in orbit by default or seeing a single page while inspecting each of the font’s pixels. It provides some useless mindmap tools and a digital whiteboard, with its ironic Export to PDF button. A web-based version is also available.

All in all it wouldn’t be too bad to use, were it not for the fact that it only accounts for about 33% of my textbooks.

, which relies on the platform. It is probably the best one of all these and it’s almost usable. The viewer usually works fine, but you have to choose between seeing two pages side by side zoomed out in orbit by default or seeing a single page while inspecting each of the font’s pixels. It provides some useless mindmap tools and a digital whiteboard, with its ironic Export to PDF button. A web-based version is also available. All in all it wouldn’t be too bad to use, were it not for the fact that it only accounts for about 33% of my textbooks. Loescher , which gave birth to the myLIM reader, an improved (?) version of their less internationally named miaLIM. It implements sane defaults for the zoom levels of both its single page and side by side views, but provides useless note taking tools which require you to click on a note to see it, despite the only reason for taking notes on a book being the ability have additional information available at a glimpse. Its highlight feature is only useful if you’re a teacher in front of a classroom using a LIM, which is too bad since a major chunk of its audience are actually students. I wasn’t able to find a web-based version, despite the application being clearly Electron-based.

, which gave birth to the reader, an improved (?) version of their less internationally named miaLIM. It implements sane defaults for the zoom levels of both its single page and side by side views, but provides useless note taking tools which require you to click on a note to see it, despite the only reason for taking notes on a book being the ability have additional information available at a glimpse. Its highlight feature is only useful if you’re a teacher in front of a classroom using a LIM, which is too bad since a major chunk of its audience are actually students. I wasn’t able to find a web-based version, despite the application being clearly Electron-based. Hoepli with the Scuolabook platform, which is more of a third-party service to which they offloaded their digital distribution and that actually sells ebooks from many Italian publishers (including Zanichelli, loescher and Loescher, according to their website). It too sins of providing useless (and sometimes malfunctioning) drawing tools (its highlighting is on point though) and an hilarious web-based notepad with no markup, but the main fault of an otherwise acceptable reader is disallowing copy-pasting in its web-based version, I guess because the server does not trust the client enough to provide it the complete document and only sends an image instead.

Surprisingly, all three platforms support Linux at this point in time, which is probably a by-product of employing cross-platform technologies as a silver bullet to cut down development costs.

From a legal standpoint, all this should be covered by the private use exception from art. 5.2(b) of the 2001/29/EC European copyright directive. I was not able to find specific terms of use for BooktabZ and MyLIM (I guess I had to accept them at some point, but I can only wonder at their enforceability), while Scuolabook explicitly provides for one readable backup copy at art. 22.2 of their ToS. Still, this post is for research purposes, do no harm, be excellent to each other, etc.

It’s myBook, myLIM

myLIM was actually the last one I managed to crack, yet probably the easiest. When I started looking into it, it was still called miaLIM, used Ionic and actually seemed to be implementing some kind of encryption handled by a native .dll which I didn’t spend too much time trying to reverse engineer. At the time it did not even support Linux, so for the time being I stuck with using Wine on my laptop.

The tables turned a few months later, when the platform switched to what seems to be a completely Electron based application, which basically meant that the source code was now free real estate. And in fact, here are some comments that look like they came straight out of a tutorial:

1 2 3 4 5 6 7 8 9 10 11 <!-- cordova script (this will be a 404 during development) --> < script src = "cordova.js" ></ script > <!-- vendor's js --> < script src = "bundle/vendor/vendor.bundle.js" ></ script > < script src = "bundle/vendor/angular.bundle.js" ></ script > < script src = "bundle/vendor/ionic.bundle.js" ></ script > < script src = "bundle/vendor/react.bundle.js" ></ script > < script src = "bundle/vendor/pdfjs.bundle.js" ></ script > <!-- your app's js --> < script src = "bundle/app.bundle.js" ></ script > </ head >

You got to admire someone who bundles Angular, Ionic and React together. This time there seemed to be no obvious attempt at encryption, so it was time to see where this app was storing its files. A quick ls -l on the process descriptor led me to the ~/.config/myLIM folder:

1 2 3 4 $ ls -l /proc/9306/fd 100 -> ~/.config/myLIM/Cookies 93 -> ~/.config/myLIM/GPUCache/index 99 -> ~/.config/myLIM/Local Storage/file__0.localstorage

The folder seemed to be a standard Chromium scratchpad, but also contained the (nonstandard) directory localData , that is used by the app as permanent storage, including the book files. The only form of encryption used seems to be changing the PDF files extension to .lil and using the book’s ISBN number as filename.

1 2 $ file 9788853801000 /9788853801000.lil 9788853801000 /9788853801000.lil: PDF document, version 1 .4

So that was pretty disappointing.

Puzzled by BooktabZ

Despite what the name might suggest, BooktabZ is probably the piece of software that is more actively mantained, updated and all-around bearable of all these readers (it even uses Qt!). Since it looks like there’s actually someone caring for it, I was prepared for more of a fight than myLIM would eventually put up.

And in fact, it did look like there had been a serious attempt at DRM, since the usual ls on the descriptor for the btb process led to the storage folder at .local/share/duDat/BooktabZ/<hash>/ , yet most of the contents I was interested in were garbling nonsense. There also were a sqlite3 database used to store metadata (and my plaintext account password, go figure), another sqlite database used to collect an event queue for analytics (which I guess I cannot opt out of) and a debug log, but nothing that could actually help me get to the book I had bought.

Just as I was preparing myself to start disassembling the reader, I noticed that when I opened a chapter from one of my ebooks in the viewer, two interesting lines would pop up in the file descriptor.

1 2 3 $ ls -l /proc/15473/fd 23 -> ~/.local/share/duDat/BooktabZ/<hash>/tmp/DOCT5472 ( deleted ) 24 -> ~/.local/share/duDat/BooktabZ/<hash>/tmp/DOCT5472 ( deleted )

A few minutes later I confirmed that the reader was, in fact, decrypting single book chapters, writing them to disk as a PDF file, I guess reading from them and then deleting them. It was then trivial to write a script that would listen for changes in that directory and cp out whatever ended up in it, which meant all I had to do then was just open the chapters one by one with the script running and then piece them together using pdfunite .

Scuolabook and xref tables

After my experience with BooktabZ, the first thing I did with Scuolabook was immediately checking its descriptor with a book open in the reader, only to find out that it kept open a mysterious .pdp file which, despite being recognized as a PDF file by file , couldn’t be opened in a normal viewer.

The specific error Evince was returning was “Failed to read the document catalog”, which meant something was up with the cross reference (xref) table of the file. The xref table is basically a section of a PDF document that maps some of the objects that it is composed of to their location (byte offset) within the file and to their version number. It looks something like this:

1 2 3 4 5 xref # A xref table begins here 0 3 # The first object has index 0, this table contains 3 records 0000000000 12312 n # At byte index 0 we have record #0, revision 12312 0000000012 00000 n # At byte index 12 we have record #1, revision 0 0000000123 00000 n # At byte index 123 we have record #2, revision 0

Now, messing with the xref table is a fairly standard method of controlling distribution of PDF files, since there is no way to effectively reconstruct the index without knowing how it was altered. Which is why, before starting to try and figure it out, I indulged in a Google search and found that someone else had already written a DRM remover for Scuolabook.

The source of that software reveals that Scuolabook uses a magic number as a sort of secret key and XOR’s it with the offsets in xref tables and then also messes with the object’s own index, so that object #0 might read as #1. Knowing this, it is easy to recover first the original offset and then fix the object index.

Bonus: Pearson

I still had one last book left to extract, and it was from Pearson, a publisher whose name does not carry much of a reputation, and as such does not trust its customers with an offline reader and instead serves its books from a webserver as unencrypted .swf files.

Which means that extracting an image of the page and applying OCR only involved jumping through a few hoops automatizing the process of skipping through the book and saving the browsing session as a HAR file.

Conclusion

I rate these DRM protections 6.85/10 – could do better, but fortunately they don’t.