This edition contains the following feature content:

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.

Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (2 posted)

Device trees have become, in a relatively short time, the preferred way to inform the kernel of the available hardware on systems where that hardware is not discoverable — most ARM systems, among others. In short, a device tree is a textual description of a system's hardware that is compiled to a simple binary format and passed to the kernel by the bootloader. The source format for device trees has been established for a long time — longer than Linux has been using it. Perhaps it's time for a change, but a proposal for a new device-tree source format has generated a fair amount of controversy in the small corner of the community that concerns itself with such things.

It may be hard to believe that, as recently as 2011, the use of device trees for the ARM architecture was controversial. Over time, though, device-tree proponents won out; ARM board files are mostly gone and unlamented, and the kernel source tree contains 2400 or so device-tree files. The source format for device trees, though, was never really discussed; the kernel community simply went with the format that was laid out in OpenFirmware many years ago. It was well defined, and there didn't seem to be any real reason for anybody to seek to change it.

As a simple example, consider this fragment of the Beaglebone Black device tree:

leds { pinctrl-names = "default"; pinctrl-0 = <&user_leds_s0>; compatible = "gpio-leds"; led2 { label = "beaglebone:green:heartbeat"; gpios = <&gpio1 21 GPIO_ACTIVE_HIGH>; linux,default-trigger = "heartbeat"; default-state = "off"; }; };

This fragment describes the green "heartbeat" LED; it indicates which GPIO line it is connected to and more. A device-tree source file for a complex system-on-chip contains a great deal of text like the above. This device-tree source (DTS) file will be fed to the compiler ( dtc ) to generate a binary "device-tree blob" (or DTB) file. That file, in turn, will be handed to the kernel when the system boots.

Pantelis Antoniou recently showed up on the device-tree list with a proposal for a new device-tree compiler called yamldt . It produces the same DTB files as dtc , but the source format is different. Rather than go with the classic DTS format, yamldt deals with device trees expressed in the YAML format. So, the above fragment, in the new scheme, would look like this:

leds: pinctrl-names: "default" pinctrl-0: *user_leds_s0 compatible: "gpio-leds" led2: label: "beaglebone:green:heartbeat" gpios: [ *gpio1, 21, GPIO_ACTIVE_HIGH ] linux,default-trigger: "heartbeat" default-state: "off"

Antoniou would appear to have a number of reasons for wanting to make this change. The DTS language is used only for device trees; few developers are familiar with it. YAML, instead, is ubiquitous and well understood. DTS requires its own parser, while YAML parsers exist for almost every language one can imagine. Text editors tend to have built-in modes for editing YAML files. Simplified YAML can even be easily understood by low-level code like bootloaders; that, in turn, makes it possible (and relatively easy) for the bootloader to edit the device tree on the fly at system boot time. This editing is needed to support device-tree overlays for dynamic hardware. "I feel that the reliance on DTS has been holding progress back in expressing modern hardware", he said; switching to a different source language is his way of addressing that problem.

Another important issue, though, is validation of device trees. There are, in a sense, three components to any device-tree implementation. The device-tree "bindings" are a specification of how device trees are to be expressed; there are over 1000 binding descriptions in the kernel tree. For example, Documentation/devicetree/bindings/leds/common.txt specifies how all LED devices should be described in a DTS file. The real arbiter of what is correct, though, is the code in drivers that interprets device trees; what happens there may or may not match what the bindings say. Then there are the actual DTS files which may not properly match either the bindings or the driver code. There would be, as Tom Rini pointed out, real value in a tool that could validate a DTS file against the relevant bindings and, maybe someday, also help with the implementation of device-tree consumer code in the kernel.

Using YAML for both bindings and device-tree source files holds out the possibility of that kind of validation. Each type of device-tree node can be described in terms of the fields it may (and must) have and the data types for each; the compiler (or one of the existing YAML schema-checking tools) can then ensure that any source file follows the rules. This kind of validation has not yet been implemented, and it will not be an easy job since the bindings files now are not in any sort of machine-readable format. But the possibility is there.

Even so, Antoniou's work would appear to be facing some fairly strong headwinds. One of the kernel's device-tree maintainers, Frank Rowand, replied:

Keep in mind one of the reasons Linus says he is very direct is to avoid leading a developer on, so that they don't waste a lot of time trying to resolve the maintainer's issues instead of realizing that the maintainer is saying "no". Please read my current answer as being "no, not likely to ever be accepted", not "no, not in the current form".

Rowand softened his position a bit after learning more about the work, but he was not alone in expressing skepticism. Perhaps that is not surprising; kernel developers who come in proposing major changes often get a conservative response at the beginning.

Some of the developers involved appear to fear the prospect of converting vast numbers of DTS files into a new format. That task is not hard — Antoniou has tested his tools by converting the entire set of in-kernel DTS files to YAML, then verifying that they compile to the same DTB files — but it does involve a lot of churn. Antoniou is not actually proposing such a change, though; he seems to see YAML as an alternative format, rather than a replacement for DTS. But, as Rob Herring (another kernel device-tree maintainer) put it: "If YAML solves a bunch of problems, then of course we'd want to convert DTS files at some point".

David Gibson, the dtc maintainer, had a number of concerns, though he did concede that, had YAML been well established when dtc was written, it might have been chosen rather than DTS. But he pointed out that YAML is a more expressive language than DTS; in particular, it can express things that cannot be rendered into DTB in any straightforward way. That might, he suggests, encourage developers to write device trees that cannot be properly compiled. He also, like some others, suggested that the YAML source format does not help much with validation. Bindings could be put into YAML and used to validate device trees in the current DTS format. Antoniou naturally disagreed, saying that the ability to use and track type information in YAML device-tree files will be an important part of a future validation mechanism.

As the conversation wound down, Grant Likely said that the important contribution in this work may be the definition of a data model for device-tree data. That model needs to exist before any sort of formal validation can be done. In the end, that may be the part of this proposal that has the most influence going forward, though it is hard to tell for sure, since this conversation didn't reach any firm conclusions. Like the ARM device-tree conversion itself, this looks like a topic that will need several iterations on the mailing lists before some sort of consensus emerges.

Comments (26 posted)

When a small business contemplates getting away from a proprietary accounting tool like QuickBooks in favor of free software like GnuCash, the first order of business is usually finding a way to liberate that business's accounting data for input into a new system. Strangely enough, Intuit, the creator of QuickBooks, never quite got around to making that easy to do. But it turns out that, with a bit of effort, this move can be made. Getting there involves wandering through an undocumented wilderness; this article is at attempt to make things easier for the next people to come along.

This article is not a review of GnuCash as a business accounting application — that will come later. But GnuCash seemed like a reasonable place to start. But pity the poor explorer who goes searching for information on how to move from QuickBooks to GnuCash and stumbles into this text in the GnuCash FAQ:

At this time there is no way to import from Quickbooks, and there are no plans to add that functionality. The Quickbooks QBW data format is a proprietary, non-documented file format. So until someone documents the file format or donates a QBW file parser your best bet for importing your QB data into GnuCash would be to output your data in a CSV format and either import the CSV data directly or convert the CSV to QIF and use the QIF importer.

Such discouragement from the authors of the application itself is certainly enough to send the hunt for a new accounting package elsewhere. Interestingly, it seems that most people who have looked at this problem have concluded that the proprietary QuickBooks format makes the whole task impossible. But it is possible to get most of a business's data out of QuickBooks in a relatively useful machine-readable form, especially if your use of QuickBooks is relatively simple. Here is how I did it, starting with QuickBooks Pro 2015.

Liberating the data

It would be nice if one could select an option to extract the entire contents of a QuickBooks company file in some sort of open format. Failing that, one has to do it in two steps, and the result is not quite as complete as one would like.

QuickBooks stores much of its information in "lists", and those can be extracted in the IIF format. IIF is also said to be proprietary, but it's text-based and relatively easy to parse. Even so, some time spent searching for an available Python module for reading IIF was in vain, strangely; there must certainly be many of them out there. Oh well, when in doubt, reinvent a new wheel and go forward.

Anyway, when faced with the QuickBooks main screen, one can follow the menus through File → Utilities → Export → Lists to IIF Files and get a dialog with the set of lists that can be written out. The lists of the highest likely interest (and the ones that my importer tool can deal with) are the lists of customers and vendors, and the chart of accounts. There are a number of others available, including the "other names" list, but the purpose of that list seems to have a place to put a name when QuickBooks is absolutely determined to add it to a list, and no other list is applicable. The employee list will be of interest to some, undoubtedly, and perhaps the payment terms list.

The other piece of the puzzle is the set of transactions stored in the general ledger — the data that one uses an accounting system to track. That can be had by pulling up the Edit → Find dialog, selecting "Advanced", and entering the date range of interest; the result of the search can be exported to a CSV file with the transaction data. Be careful, though: QuickBooks 2015, at least, will silently cap the number of lines written to this file to 32768, so if you have a lot of data to export, it will need to be done in multiple steps.

That gets most of what most people are likely to want, with a couple of exceptions. One is the initial balances for the accounts. The chart of accounts is exported with balance information, but those are the final balances and unlikely to be of any real use. There does not appear to be an easy way to avoid copying down and entering initial balances by hand. The other missing piece is bills and invoices. The relevant entries in accounts payable and accounts receivable will be there, but the surrounding metadata will not. If that data is needed, it will need to be moved by hand — a somewhat painful prospect if one has a lot of outstanding items.

Feeding it to GnuCash

This data, once extracted, can eventually be fed to just about any sort of accounting system, but the job is likely to be different for each. GnuCash, happily, offers a Python-based interface (Python 2 only) that can be used to manipulate its database. Unhappily, one of the most accurate and complete parts of the documentation on the Python bindings can be found on the GnuCash wiki: "Python bindings have been recently added to gnucash. There is still very little documentation and probably few people would know how to use it." Developers wanting to use the Python bindings are left scrounging the web looking for examples to crib from.

It is worth noting that there is another GnuCash interface called piecash. It has the advantages of being better documented and being ported to Python 3. On the other hand, piecash only works with GnuCash files stored in a relational database — a GnuCash mode that appears to be poorly developed and maintained and, as a result, generally not enabled by distributors. The stock GnuCash bindings, instead, will work with both database-backed and native XML files.

What follows is an overview of two programs I wrote using the GnuCash Python bindings. qb_iif_to_gc is a simple importer for the list data extracted above; it needs to be run first to set up a new GnuCash file. Then qb_trans_to_gc can be run to import the transaction data. All of this code can be pulled from the repository listed at the end of this article.

One starts, of course, by installing the Python bindings, which are usually packaged separately from the application. The gnucash module provides access to basic functionality, while gnucash.gnucash_business has some of the more business-oriented features. Beyond that, it's not unusual to need to go beyond what has been provided in those files; in that case, gnucash.gnucash_core_c has a much more extensive — but lower-level — interface generated directly with the SWIG tool.

The first order of business is to open a GnuCash session to work with a file:

session = gnucash.Session(file, ignore_lock=False, is_new=False, force_new=False)

The file parameter is, of course, the name of the GnuCash file to work with. It uses the URI notation, so the way to specify a local XML file would be with something like xml://path/to/file.gnucash . If the file exists, only the file name needs to be provided and things will work. When creating a new, empty file (by passing is_new=True ), though, the full URI must be provided. The ignore_lock parameter can be used to open a file that is locked elsewhere — most helpful when debugging a script that may not have gotten around to properly closing the file the last time it exited.

When the job is done, changes to the session should be written out and the session closed with calls like:

session.save() session.end()

Beyond that, the only real use for the session within a script is to extract session.book , which is the "book" containing the actual accounting data. One other bit of useful data to grab at the outset is the default currency to use with accounts; my scripts do:

book = session.book ctable = book.get_table() dollars = ctable.lookup('CURRENCY', 'USD')

LWN is based in the US, so we can happily pretend that everybody just uses dollars and be done with it. If your company manages accounts in more than one currency, a little more attention will need to be paid here.

Accounts

Before much of anything else, it is necessary to set up a chart of accounts. Depending on how one is doing the migration, one might wish to create a new chart by hand or to simply import the chart from QuickBooks. One other small twist is that the GnuCash way of organizing accounts groups them by type in a hierarchy; all expense accounts start with Expenses/ , for example. QuickBooks does not do things that way. There appears to be nothing in GnuCash that requires this organization, so following it is a matter of choice. The qb_iff_to_gc tool will, if run with -r , reparent accounts to fit them within the default GnuCash model.

One creates an account in GnuCash by instantiating an Account object, setting its parameters, and slotting it into the hierarchy. The root of the hierarchy can be had by calling the book's get_root_account() method. So a simple example of creating a new income account would look something like this:

acct = gnucash.Account(book) acct.BeginEdit() acct.SetName('Advertising Income') acct.SetType(gnucash.ACCT_TYPE_INCOME) acct.SetCommodity(dollars) book.get_root_account().append(acct) acct.CommitEdit()

The BeginEdit() and CommitEdit() methods are common to most GnuCash objects and, as one might expect, they are used to bracket a set of changes to that object. There is also a RollbackEdit() method should one change one's mind. One thing that jumps out from the example code found around the net is that almost nobody bothers with these calls except when dealing with Transaction objects. The code works just fine without them, but I feel that they have been provided for a reason and it is probably safer to call them.

The IIF file exported by QuickBooks provides the name and account type, so filling in this information is relatively simple. But the above example creates a top-level account. When creating a hierarchy, there's a bit more to do. LWN's chart of accounts includes one that is exported by QuickBooks as " Professional Fees:Freelance Authors "; in GnuCash that would need to be stored as Professional Fees/Freelance Authors , or even Expenses/Professional Fees/Freelance Authors . Creating that requires walking the hierarchy. The necessary code looks something like this:

def find_parent(name, root): sname = name.split(':') parent = root for acct in sname[:-1]: parent = parent.lookup_by_name(acct) if not parent: print 'Failed to find container account', acct return root, sname[-1] return parent, sname[-1]

The parent account returned by this function is the one on which append() should be called to add the new account to the hierarchy. Note that this function assumes that the upper-level accounts already exist. When importing a chart of accounts exported by QuickBooks without changes, that will always be true. If account names are being remapped (a feature built into qb_iff_to_gc ), though, that condition may not hold and intermediate accounts may need to be created while walking the tree.

Vendors and customers

An established business is likely to have a long list of vendors and customers; fortunately, those can be imported automatically. Doing the import requires dealing with the GnuCash business interface, which looks a little different from what we have seen so far. A vendor with a given name would be created with a call like:

vendor = gnucash_business.Vendor(book = book, id = book.VendorNextID(), currency = dollars, name = name)

The explicit call to book.VendorNextID() is needed to get an ID number to associate with the vendor; the rest should be relatively self-explanatory. Setting the vendor's contact information requires a few extra calls, though, to set up an address object:

addr = vendor.GetAddr() addr.BeginEdit() addr.SetName(ventry['PRINTAS'] or name) addr.SetAddr1(ventry['ADDR1']) addr.SetAddr2(ventry['ADDR2']) addr.SetAddr3(ventry['ADDR3']) addr.SetPhone(ventry['PHONE1']) addr.CommitEdit()

Here, ventry is the vendor entry from the QuickBooks IIF file; most of the information copies over in a fairly straightforward way. There is one little glitch, though: sometimes vendors have a tax ID number that must be used for sending (for example) 1099 forms or reports to the government. GnuCash doesn't seem to have a way of storing that number directly, which is a bit of a shortcoming, unfortunately. There is, though, a way of storing notes with a vendor, so I chose to stash the tax ID there. The high-level interface provides no access to the notes, though, so one has to resort to the low-level interface:

if ventry['TAXID']: inst = vendor.get_instance() gnucash_core_c.gncVendorSetNotes(inst, ventry['TAXID'])

Let's just say that a certain amount of digging was required to figure that out.

The interface for customer data is almost identical:

cust = gnucash_business.Customer(book = book, id = book.CustomerNextID(), currency = dollars, name = name)

Once again, a new ID number must be explicitly generated. Contact information for the customer is set by calling its GetAddr() method as is done for vendors above.

And that is the core of the qb_iif_to_gc program; interested readers can read the full script or grab the repository. There is more information that can be exported an imported in this manner; in particular, some will certainly have a need to import employee information. We don't store that information in QuickBooks, though, so I haven't implemented an import function for that data.

Transactions

Once the chart of accounts is in place, it's time to populate the ledger with transaction data; that is the task of the qb_trans_to_gc tool. Importing this data is actually fairly straightforward, but a couple of terms need to be defined first. A "transaction" describes a complete financial operation, such as LWN buying a case of beer using a debit card. A transaction is made up of two or more "splits", each of which describes an entry in a single ledger. In this case, one split is entered into the ledger for the bank account, decreasing its balance by the cost of the beer. The other split increases the balance of the appropriate expense account ("Office Supplies" in this case). The amounts in the splits differ only in sign in this case — the amount paid for the beer equals (the absolute value of) the amount taken from the bank account.

A more complicated transaction can have more splits. Paying an employee may involve a big debit from a bank account, with corresponding splits for the money paid to the employee, taxes paid to the government, withholdings for employee benefits, etc. An LWN subscription purchased with PayPal generate three splits: the funds come in via "Subscription Income", then out to the PayPal "bank" account and the expense account for PayPal fees. Regardless of the number of splits, the amounts involved must add up to zero in the end.

QuickBooks dumps each split into the CSV file as a separate line, with nothing to mark when one transaction stops and the next begins. It does, however, include the in-transaction balance in each line, so a balance of zero is a reasonably good end-of-transaction indicator. It is not foolproof; one could easily construct a transaction with a balance of zero partway through but, in the real world, that tends not to happen. So, by keying on that zero balance, it's possible to assemble a transaction from the splits provided by QuickBooks.

A transaction is, unsurprisingly, represented by a Transaction object, created and initialized with code like the following (where entry ) is the first line from the CSV file describing the transaction):

trans = gnucash.Transaction(book) trans.BeginEdit() trans.SetCurrency(dollars) trans.SetDescription(entry['Name']) if entry['Num']: trans.SetNum(entry['Num']) trans.SetDate(day, month, year)

The description field for a transaction is often the counterparty to the transaction — the beer store, for example. Those of us old enough to remember writing checks will also remember that they have a number associated with them; if a number is present, SetNum() can be used to add it here.

Note that nothing done so far describes any money changing hands; that's all done in the splits. Before getting there, though, we should look at how GnuCash represents numbers like currency amounts. These amounts are often given as decimal values, but anybody who has taken a numerical analysis class understands the danger of storing them as floating-point numbers. GnuCash doesn't do that; instead, these numbers are stored as scaled integers in the GncNumeric class. Here is the utility function I bashed together to create such values from the (text) floating-point values output by QuickBooks:

SCALE = 1000 def GCVal(value): dollars, cents = map(int, value.split('.')) ival = dollars*SCALE if value[0] != '-': ival += cents*(SCALE/100) else: ival -= cents*(SCALE/100) return gnucash.GncNumeric(ival, SCALE)

With that in place, creating a split is a matter of instantiating a Split object, filling in the relevant information, and attaching it to the transaction:

split = gnucash.Split(book) split.SetValue(GCVal(entry['Amount'])) split.SetAccount(LookupAccount(entry['Account'])) split.SetMemo(entry['Memo']) split.SetParent(trans)

Note that, unusually, splits do not have BeginEdit() and CommitEdit() methods.

One final complication is that, in an existing company file, most of the transactions are likely to be marked "reconciled"; that is information that should not be lost. GnuCash stores this information in the splits rather than in the transaction object. Unfortunately, this functionality isn't available in the high-level interface, so setting the flag (to either " y " or " n ") requires going low:

gnucash_core_c.xaccSplitSetReconcile(split.get_instance(), reconciled)

Once all of the splits are in place for a transaction, a call to CommitEdit() completes the job.

Next steps

These two scripts will yield a mostly complete GnuCash file, with the omissions described above, of course. A good confidence-building measure at this point is to generate some reports in GnuCash and verify that they match what QuickBooks says. In my case, an obvious next step is to toss the scripts used with the painful QuickBooks data import process and write new ones to push company data directly into the GnuCash file. With the tools described above, that should not be that hard to do.

These scripts are available under the GPL; they can be cloned from the repository at git://git.lwn.net/qb-escape.git . As the process of investigating accounting systems continues, this repository will likely accumulate more import scripts. Stay tuned.

Comments (13 posted)

It is well understood that old and unmaintained software tends to be a breeding ground for security problems. These problems are never welcome, but they are particularly worrying when the software in question is a net-facing tool like a web browser. Standalone browsers are (hopefully) reasonably well maintained, but those are not the only web browsers out there; they can also be embedded into applications. The effort to do away with one unmaintained embedded browser is finally approaching its conclusion, but the change appears to have caught some projects unaware.

In early 2016, Michael Catanzaro sounded the alarm about security issues with the widely used WebKitGTK+ browser engine. At the time, security issues were turning up in WebKitGTK+ with great regularity, but nobody was calling them out as such; as a result, they were not getting CVE numbers and distributors were not bothering to ship updates. That created a situation where Linux desktop systems were routinely running software that was known to have security issues that, in many cases, could be exploited via a hostile web page or HTML email attachment.

Eighteen months later, Linux users can count themselves as the beneficiaries of a great deal of focused work on the part of Catanzaro and his colleagues. WebKit vulnerabilities and, in particular, vulnerabilities that show up in WebKitGTK+, are now regularly fixed by most (but not all) mainstream Linux distributions. Even the abandoned QtWebKit engine has picked up a new maintainer and is now "only" a year and a half behind the current WebKit release which, Catanzaro notes, "is an awful lot better than being four years behind". Progress is being made.

Progress in one part of the software ecosystem has a way of highlighting the relative lack of progress elsewhere, though. As Catanzaro explains in detail in his 2016 post, there are multiple versions of the WebKitGTK+ API, one of which ("WebKit1") was last supported in WebKitGTK+ 2.4, which was released in early 2014 and has not seen any security fixes for a long time. To stay current, applications need to move forward to the current WebKitGTK+ API, a process which can involve significant amounts of pain depending on how involved the application is in the rendering process. For applications that just need the engine to render some HTML, the API change is evidently not that hard to deal with. Or, at least, that is the case if moving to a current WebKitGTK+ release doesn't present other sorts of API issues. That is where the rub turns out to be.

GNOME is based on the GTK+ toolkit; as one would expect from the name, WebKitGTK+ also uses GTK+. But WebKitGTK+ versions after 2.4 only use GTK+ version 3.x, having left support for GTK+ 2 behind. GTK+ 3.0 was released in early 2011, meaning that application developers have had over six years to make the change. But, should there happen to be any laggard applications out there that have not moved forward to current GTK+, they will also be unable to move to anything resembling a current WebKitGTK+ browser engine. This could be a problem, as distributions are starting to simply remove their WebKitGTK+ 2.4 packages, breaking any applications that depend on it in the process.

Which applications might those be? On a Fedora 26 system, a query turns up these packages:

The Banshee music player. The 2.6.2 release shipped by Fedora came out in 2014; the project's web page mentions a 2.9 development release that also came out in 2014.

The claws-mail email client and, in particular, the "fancy" plugin used to render HTML mail. Claws seems to have a reasonably active development community, but it would appear to be more focused on producing minor releases with obnoxious changes to default behavioral settings than moving to GTK+ 3. A request for GTK+ 3 support filed in 2011 drew the response: " Good luck to whoever tries to work on that task ". More recently, developers have started to work on the problem more seriously, but it's not clear when the work will be done.

". More recently, developers have started to work on the problem more seriously, but it's not clear when the work will be done. gmusicbrowser, another music player. It last released in 2015; the repository shows no commits for over a year.

The GnuCash financial application. GnuCash, too, seems to have an active (if small) development community. The 2015 bug report also shows a slow start to the problem, but GnuCash is nearing a 2.8 release that should include the required updates. The project is looking at moving away from WebKitGTK+ altogether, since it's overkill for the task of drawing a few charts.

The kazehakase web browser. The latest release shown on the web site is from 2009.

The Lekhonee WordPress publishing tool, which does not appear to have a current web page anywhere.

mono-tools. The last commit was a license change in 2016.

SparkleShare, a file-synchronization application, last released in 2015.

Techne, a simulation package last released in 2011.

Tech Talk PSE, billed as " superior technical demonstration software ". One commit in 2015, otherwise nothing since 2012.

A quick check on an openSUSE Tumbleweed system turns up others, including the Midori browser (last release in 2015).

One suspects that many of the above packages could simply vanish with few users even noticing. But some of them, including claws-mail and GnuCash, have significant user communities. They have, at this point, been fairly publicly caught out and revealed as failing to keep up with changes in the environment in which they run. The results go beyond shipping an application dependent on an outmoded toolkit; an email client should not be feeding arbitrary attachments to a browser engine with known security problems, for example.

Users of some of the faster-moving distributions are already seeing the effects of this move. Arch has already dropped WebKitGTK+ 2.4, and Fedora will do the same in its next release. Expect some scrambling as the developers of affected applications (those which still have developers, obviously) scramble to do forward-porting work that probably should have been completed several years ago and completely unmaintained applications just disappear. Development communities can be just like the rest of us: happy to procrastinate until the deadline looms. Arguably, distributors should make a point of imposing such deadlines more often.

Comments (66 posted)

The kernel is a huge program; among other things, that means that many problems encountered by a kernel developer have already been solved somewhere else in the tree. But those solutions are not always well known or documented. Recently, a seasoned developer confessed to having never encountered the "genpool" memory allocator. This little subsystem does not appear in the kernel documentation, and is likely to be unknown to others as well. In the interest of fixing both of those problems, here is an overview of genpool (or "genalloc") and what it does.

There are a number of memory-allocation subsystems in the kernel, each aimed at a specific need. Sometimes, however, a kernel developer needs to implement a new allocator for a specific range of special-purpose memory; often that memory is located on a device somewhere. The author of the driver for that device can certainly write a little allocator to get the job done, but that is the way to fill the kernel with dozens of poorly tested allocators. Back in 2005, Jes Sorensen lifted one of those allocators from the sym53c8xx_2 driver and posted it as a generic module for the creation of ad hoc memory allocators. This code was merged for the 2.6.13 release; it has been modified considerably since then.

The action begins with the creation of a pool using one of:

#include <linux/genalloc.h> struct gen_pool *gen_pool_create(int min_alloc_order, int nid); struct gen_pool *devm_gen_pool_create(struct device *dev, int min_alloc_order, int nid, const char *name);

A call to gen_pool_create() will create a pool. The granularity of allocations is set with min_alloc_order ; it is a log-base-2 number like those used by the page allocator, but it refers to bytes rather than pages. So, if min_alloc_order is passed as 3 , then all allocations will be a multiple of eight bytes. Increasing min_alloc_order decreases the memory required to track the memory in the pool. The nid parameter specifies which NUMA node should be used for the allocation of the housekeeping structures; it can be -1 if the caller doesn't care.

The "managed" interface devm_gen_pool_create() ties the pool to a specific device. Among other things, it will automatically clean up the pool when the given device is destroyed.

A pool is shut down with:

void gen_pool_destroy(struct gen_pool *pool);

It's worth noting that, if there are still allocations outstanding from the given pool , this function will take the rather extreme step of invoking BUG() , crashing the entire system. You have been warned.

A freshly created pool has no memory to allocate. It is fairly useless in that state, so one of the first orders of business is usually to add memory to the pool. That can be done with one of:

int gen_pool_add(struct gen_pool *pool, unsigned long addr, size_t size, int nid); int gen_pool_add_virt(struct gen_pool *pool, unsigned long virt, phys_addr_t phys, size_t size, int nid);

A call to gen_pool_add() will place the size bytes of memory starting at addr (in the kernel's virtual address space) into the given pool , once again using nid as the node ID for ancillary memory allocations. The gen_pool_add_virt() variant associates an explicit physical address with the memory; this is only necessary if the pool will be used for DMA allocations.

The functions for allocating memory from the pool (and putting it back) are:

unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size); void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma); extern void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size);

As one would expect, gen_pool_alloc() will allocate size bytes from the given pool . The gen_pool_dma_alloc() variant allocates memory for use with DMA operations, returning the associated physical address in the space pointed to by dma . This will only work if the memory was added with gen_pool_add_virt() . Note that this function departs from the usual genpool pattern of using unsigned long values to represent kernel addresses; it returns a void * instead.

That all seems relatively simple; indeed, some developers clearly found it to be too simple. After all, the interface above provides no control over how the allocation functions choose which specific piece of memory to return. If that sort of control is needed, the following functions will be of interest:

unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size, genpool_algo_t algo, void *data); extern void gen_pool_set_algo(struct gen_pool *pool, genpool_algo_t algo, void *data);

Allocations with gen_pool_alloc_algo() specify an algorithm to be used to choose the memory to be allocated; the default algorithm can be set with gen_pool_set_algo() . The data value is passed to the algorithm; most ignore it, but it is occasionally needed. One can, naturally, write a special-purpose algorithm, but there is a fair set already available:

gen_pool_first_fit is a simple first-fit allocator; this is the default algorithm if none other has been specified.

is a simple first-fit allocator; this is the default algorithm if none other has been specified. gen_pool_first_fit_align forces the allocation to have a specific alignment (passed via data in a genpool_data_align structure).

forces the allocation to have a specific alignment (passed via in a structure). gen_pool_first_fit_order_align aligns the allocation to the order of the size. A 60-byte allocation will thus be 64-byte aligned, for example.

aligns the allocation to the order of the size. A 60-byte allocation will thus be 64-byte aligned, for example. gen_pool_best_fit , as one would expect, is a simple best-fit allocator.

, as one would expect, is a simple best-fit allocator. gen_pool_fixed_alloc allocates at a specific offset (passed in a genpool_data_fixed structure via the data parameter) within the pool. If the indicated memory is not available the allocation fails.

There is a handful of other functions, mostly for purposes like querying the space available in the pool or iterating through chunks of memory. Most users, however, should not need much beyond what has been described above. With luck, wider awareness of this module will help to prevent the writing of special-purpose memory allocators in the future.

Comments (4 posted)

Nonvolatile memory offers the promise of fast, byte-addressable storage that persists over power cycles. Taking advantage of that promise requires the imposition of some sort of directory structure so that the persistent data can be found. There are a few approaches to the implementation of such structures, but the usual answer is to employ a filesystem, since managing access to persistent data is what filesystems were created to do. But traditional filesystems are not a perfect match to nonvolatile memory, so there is a natural interest in new filesystems that were designed for this media from the beginning. The recently posted NOVA filesystem is a new entry in this race.

The filesystems that are currently in use were designed with a specific set of assumptions in mind. Storage is slow, so it is worth expending a considerable amount of CPU power and memory to minimize accesses to the underlying device. Rotational storage imposes a huge performance penalty on non-sequential operations, so there is great value in laying out data consecutively. Sector I/O is atomic; either an entire sector will be written, or it will be unchanged. All of these assumptions (and more) are wired deeply into most filesystems, but they are all incorrect for nonvolatile memory devices. As a result, while filesystems like XFS or ext4 can be sped up considerably on such devices, the chances are good that a filesystem designed from the beginning with nonvolatile memory in mind will perform better and be more resistant to data corruption.

NOVA is intended to be such a filesystem. It is not just unsuited for regular block devices, it cannot use them at all, since it does not use the kernel's block layer. Instead, it works directly with storage mapped into the kernel's address space. A filesystem implementation gives up a lot if it avoids the block layer: request coalescing, queue management, prioritization of requests, and more. On the other hand, it saves the overhead imposed by the block layer and, when it comes to nonvolatile memory performance, cutting down on CPU overhead is a key part of performing well.

NOVA filesystem structure

Like most filesystems, NOVA starts with a superblock — the top-level data structure that describes the filesystem and provides the locations of the other data structures. One of those is the inode table, an inode being the internal representation of a file (or directory) within the filesystem. The NOVA inode table is set up as a set of per-CPU arrays, allowing any CPU to allocate new inodes without having to take cross-processor locks.

Free space is also split across a system's CPUs; it is managed in a red-black tree on each processor to facilitate coalescing of free regions. Unlike the inode tables, the free lists are maintained in normal RAM, not nonvolatile memory. They are written back when the filesystem is unmounted; if the filesystem is not unmounted properly, the free list will be rebuilt with a scan of the filesystem as a whole.

Perhaps the most interesting aspect of NOVA is how the inodes are stored. On a filesystem like ext4, the on-disk inode is a well-defined structure containing much of a file's metadata. To make things fast, NOVA took a different approach based on log-structured filesystems. The result is that an inode in the table is just a pair of pointers to a data structure that looks something like this:

Each active inode consists of a log describing the changes that have been made to the file; all that is found in the inode structure itself is a pair of pointers indicating the first and last valid log entries. Those entries are stored (in nonvolatile memory) in chunks of at least 4KB, organized as a linked list. Each log entry will indicate an event like:

The attributes of the file have been changed — a change in the permission bits, for example.

An entry has been added to a directory (for directory inodes, obviously).

A link to the file was added.

Data has been written to the file.

The case of writing data is worth looking at a bit more closely. If a process writes to an empty file, there will be no data pages already allocated. The NOVA implementation will allocate the needed memory from the per-CPU free list and copy the data into that space. It will then append an entry to the inode log indicating the new length of the file and pointing to the location in the array where the data was written. Finally, an atomic update of the inode's tail pointer will complete the operation and make it visible globally.

If, instead, a write operation overwrites existing data, things are done a little differently. NOVA is a copy-on-write (COW) filesystem, so the first step is, once again, to allocate new (nonvolatile) memory for the new data. Data is copied from the old pages into the new if necessary, then the new data is added. A new entry is added to the log pointing to the new pages, the tail pointer is updated, and the old log entry for those pages is invalidated. At that point, the operation is complete and the old data pages can be freed for reuse.

Thus, the "on-disk" inode in NOVA isn't really a straightforward description of the file it represents. It is perhaps better thought of as a set of instructions that, when followed in order (and skipping the invalidated ones) will yield a complete description of the file and its layout in memory. This structure has the advantage of being quite fast to update when the file changes, with minimal locking required. It will obviously be a bit slower when it comes to accessing an existing file. NOVA addresses that by assembling a compact description of the file in RAM when the file is opened. Even that act of assembly should not be all that slow. Remember that the whole linked-list structure is directly addressable by the CPU. Storing this type of structure on a rotating disk, or even on a solid-state disk accessed as a normal block device, would be prohibitively slow, but direct addressability changes things.

There is another interesting feature enabled by this log structure. Each entry in the log contains an "epoch number" that is set when the entry is created. That makes it possible to create snapshots by incrementing the global epoch number, and associating the previous number with a pointer to the snapshot. When the snapshot is mounted, any log entries with an epoch number greater than the snapshot's number can be simply ignored to give a view of the file as it existed when the snapshot was taken. There are some details to manage, of course: entries associated with snapshots cannot be invalidated, and those entries have to be passed over when the snapshot is not in use. But it is still an elegant solution to the problem.

DAX and beyond

Readers may be wondering about how NOVA interacts with the kernel's DAX interface, which exists to allow applications to directly map files in nonvolatile memory into their address space, bypassing the kernel entirely for future accesses. It can be hard to make direct mapping work well with a COW-based write mechanism. In this 2016 paper describing NOVA [PDF], the authors say they don't even try. Rather than support DAX, NOVA supports an alternative mechanism called "atomic mmap" which copies data into "replica pages" and maps those instead. In a sense, atomic mmap reimplements a part of the page cache.

One can imagine that this approach was seen as being suboptimal; direct access to nonvolatile memory is one of that technology's most compelling features. Happily, the posted patch set does claim to support DAX. As far as your editor can tell from the documentation and the code, NOVA disables COW for the portions of a file that have been mapped into a process's address space, so changes are made in place. One significant shortcoming is that pages that have been mapped into a process's address space cannot be written to with write() . There is some relatively complex logic (described in this other paper [PDF]) to ensure that the filesystem does the right thing when taking a snapshot of a file that is currently directly mapped into some process's address space.

There are a number of self-protection measures built into NOVA, including checksumming for data and metadata. One of the more interesting mechanisms seems likely to prove controversial, though. One possible hazard of having your entire storage array mapped into the kernel's address space is that writing to a stray pointer can directly corrupt persistent data. That would not be a concern in a bug-free kernel but, well, that is not the world we live in. In an attempt to prevent inadvertent overwriting of data, NOVA can keep the entire array mapped read-only. When a change must be made, the processor's write-protect bit is temporarily cleared, allowing the kernel to bypass the memory permissions. Disabling write protection has been deemed too dangerous in the past; it seems unlikely that the idea will get a better reception now. Protection against stray writes is a valuable feature, though, so hopefully another way to implement it can be found.

There are a few other things that will need to be fixed before NOVA can be seriously considered for merging upstream. For example, it only works on the x86-64 architecture and, due to the per-CPU inode table structure, it is impossible to move a NOVA filesystem from one system to another if the two machines do not have the same number of CPUs. NOVA doesn't support access control lists or disk quotas. There is no filesystem checker tool. And so on. The developers are aware of these issues and expect to deal with them.

The fact that the developers do want to take care of those details and get the filesystem upstream is generally encouraging, but it is especially so given that NOVA comes from the academic world (from the University of California at San Diego in particular). Academic work has a discouraging tendency to stop when the papers are published and the grant money runs out, so the free-software world in general gets far less code from universities than one might expect. With luck, NOVA will be one development that escapes academia and becomes useful to the wider world.

There are, of course, many other aspects of this filesystem that cannot be covered in such a short article. See the two papers referenced above and the documentation in the patch itself for more information. This appears to be a project to keep an eye on; if all goes well, it will show the way forward for getting full performance out of the huge, fast nonvolatile memory arrays that, we're told, we'll all be able to get sometime soon.

Comments (48 posted)