This edition contains the following feature content:

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.

Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Deficiencies in the startup time for Python, along with the collections.namedtuple() data structure being identified as part of the problem, led Guido van Rossum to decree that named tuples should be optimized. That immediately set off a mini-storm of thoughts about the data structure and how it might be redesigned in the original python-dev thread, but Van Rossum directed participants over to python-ideas, where a number of alternatives were discussed. They ranged from straightforward tweaks to address the most pressing performance problems to elevating named tuples to be a new top-level data structure—joining regular tuples, lists, sets, dictionaries, and so on.

A named tuple simply adds field names for the entries in a tuple so that they can be accessed by index or name. For example:

>>> from collections import namedtuple >>> Point = namedtuple('Point', ['x', 'y']) >>> p = Point(1,2) >>> p.y 2 >>> p[1] 2

The existing implementation builds a Python class implementing the named tuple; it is the building process that is the worst offender in terms of startup performance. A bug was filed in November 2016; more recently the bug was revived and various benchmarks of the performance of named tuples were added to it. By the looks, there is room for a good bit of optimization, but the fastest implementation may not be the winner—at least for now.

To some extent, the current named tuple implementation has been a victim of its own success. It is now routinely used in the standard library and in other popular modules such that its performance has substantially contributed to Python's slow startup time. The existing implementation creates a _source attribute with pure Python code to create a class, which is then passed to exec() to build it. That attribute is then available for use in programs or to directly create the named tuple class by incorporating the source code. The pull request currently under consideration effectively routes around most of the use of exec() , though it is still used to add the __new__() function to create new instances of the named tuple class.

After Van Rossum's decree, Raymond Hettinger reopened the bug with an explicit set of goals. His plan was to extend the patch set from the pull request so that it was fully compatible with the existing implementation and to measure the impact of it, including for alternative Python implementations (e.g. PyPy, Jython). But patch author Jelle Zijlstra wondered if it made sense to investigate a C-based implementation of named tuples created by Joe Jevnik.

Benchmarks were posted. Jevnik summarized his findings about the C version as follows: "type creation is much faster; instance creation and named attribute access are a bit faster". Zijlstra's benchmarks of his own version showed a 4x speedup for creating the class (versus the existing CPython implementation) and roughly the same performance as CPython for instantiation and attribute access. Those numbers caused Zijlstra to suggest using the C version:

Joe's cnamedtuple is about 40x faster for class creation than the current implementation, and my PR only speeds class creation up by 4x. That difference is big enough that I think we should seriously consider using the C implementation.

There are some downsides to a C implementation, however. As the original bug reporter, Naoki Inada, pointed out, maintenance is more difficult for C-based code. In addition, only CPython can directly benefit from it; alternative language implementations will either need to reimplement it or forgo it.

Class creation performance is only one area that could use improvement, however. Victor Stinner noted that accessing tuple values by name was nearly twice as slow when compared to the somewhat similar, internal PyStructSequence that is used for things like sys.version_info . It would be desirable for any named tuple upgrade to find a way to reduce the access-by-name overhead, several said. In fact, Giampaolo Rodolà pointed out that the asyncio module could serve nearly twice as many requests per second if the performance of PyStructSequence could be attained.

But Rodolà would like to go even further than that. He proposed new syntax that would allow the creation of named tuples on the fly. He gave two possibilities for how that might look:

>>> ntuple(x=1, y=0) (x=1, y=0) >>> (x=1, y=0) (x=1, y=0)

Either way (or both) would be implemented in C for speed. It would allow named tuples to be created without having to describe them up front, as is done now. But it would also remove one of the principles that guided the design of named tuples, as Tim Peters said:

How do you propose that the resulting object T know that T.x is 1. T.y is 0, and T.z doesn't make sense? Declaring a namedtuple up front allows the _class_ to know that all of its instances map attribute "x" to index 0 and attribute "y" to index 1. The instances know nothing about that on their own, and consume no more memory than a plain tuple. If your `ntuple()` returns an object implementing its own mapping, it loses a primary advantage (0 memory overhead) of namedtuples.

Post-decree, Ethan Furman moved the discussion to python-ideas and suggested looking at his aenum module as a possible source for a new named tuple. But that implementation uses metaclasses, which could lead to problems when subclassing as Van Rossum pointed out.

Jim Jewett's suggestion to make named tuples simply be a view into a dictionary ran aground on too many incompatibilities with the existing implementation. Python dictionaries are now ordered by default and are optimized for speed, so they might be a reasonable choice, Jewett said. As Greg Ewing and others noted, though, that would lose many of the attributes that are valued for named tuples, including low memory overhead, access by index, and being a subclass of tuple.

Rodolà revived his proposal for named tuples without a declaration, but there are a number of problems with that approach. One of the main stumbling blocks is the type of these on-the-fly named tuples—effectively each one created would have its own type even if it had the same names in the same order. That is wasteful of memory, as is having each instance know about the mapping from indexes to names; the current implementation puts that in the class, which can be reused. There might be ways to cache these on-the-fly named tuple types to avoid some of the wasted memory, however. Those problems and concern that it would be abused led Van Rossum to declare the "bare" syntax (e.g. (x=1, y=0) ) proposal as dead.

But the discussion of ntuple(x=1, y=0) continued for a while before seemingly running aground as well. Part of the problem is that it combines two things in an unexpected way: declaring the order of the fields in the named tuple and using keyword arguments where order should not matter. For the x and y case, it is fairly clear, but named tuples could be used for types where the order is not so clear. As Steven D'Aprano put it:

we want to define the order of the fields according to the order we give keyword arguments;

we want to give keyword arguments in any order without caring about the field order. We can't have both, and we can't give up either without being a surprising source of annoyance and bugs. I don't see any way that this proposal can be anything by a subtle source of bugs. We have two *incompatible* requirements:We can't have both, and we can't give up either without being a surprising source of annoyance and bugs. As far as I am concerned, this kills the proposal for me. If you care about field order, then use namedtuple and explicitly define a class with the field order you want. If you don't care about field order, use SimpleNamespace.

He elaborated on the ordering problem by giving an example of a named tuple that stored the attributes of elementary particles (e.g. flavor, spin, charge) which do not have an automatic ordering. That argument seemed to resonate with several thread participants.

So it would seem that a major overhaul of the interface for building named tuples is not likely anytime soon—if ever. The C reimplementation has some major performance benefits (and could presumably pick up the PyStructSequence performance for access by name), but it would seem that the first step will be to merge Zijlstra's Python-based implementation. That will allow for a fallback with better performance for alternative implementations, while still leaving open the possibility of replacing it with an even faster C version later.

Comments (33 posted)

The first stop in the search for a free accounting system that can replace QuickBooks is a familiar waypoint: the GnuCash application. GnuCash has been around for many years and is known primarily as a personal-finance tool, but it has acquired some business features as well. The question is: are those business features solid enough to allow the program to serve as a replacement for QuickBooks?

The first order of business is importing existing data into the system. That is not a straightforward task, but it can be done; see this article for the gory details. The result was a 1.8MB XML file containing the company's accounting data since the beginning of 2016. Starting GnuCash with that file takes about 20 seconds on a reasonably modern laptop. It's amazing how long 20 seconds can seem sometimes.

GnuCash has had, for some years, the ability to store its data in a relational database manager rather than an XML file. Attempts to use this feature with PostgreSQL proved fruitless; saving a file to a database yielded the helpful information that the server "experienced an error or encountered bad or corrupt data". Saving to an SQLite file worked, at the cost of expanding the storage used to 17MB and with no noticeable improvement in startup or save speed. The pre-2.8 development version of GnuCash crashed when asked to read an SQLite file. But, then, that version proved crash-prone in general.

Overall, the database back-end mechanism feels too unstable to trust in a role like this. Even if PostgreSQL worked, GnuCash still doesn't support concurrent access to the accounting data, wiping out what would otherwise be one of the big advantages of storing it on a central server.

Accounting functionality

GnuCash is a solid tool for basic accounting; it implements a standard double-entry bookkeeping scheme and most basic tasks are relatively straightforward and quick. The application wants to work in an "everything in one window" mode, meaning that one ends up juggling a number of tabs in normal usage. It is possible to break out a tab into a separate window, but that's not the default behavior. The register window mostly works as one would expect, though the column-resizing behavior is so strange that there is an FAQ entry dedicated to it. For the most part, though, this is the core part of GnuCash, and it has worked reasonably well for a long time.

Running GnuCash in a business mode brings a few new features to bear. It adds "accounts payable" and "accounts receivable" accounts for the management of bills and invoices, for example. With regard to these accounts, it's worth noting that GnuCash assumes that accrual accounting is in use; it has no option to do accounting on a cash basis. That can result in some confusing differences in reports if data for a cash-basis company is brought over.

GnuCash has basic support for tracking customers and vendors. On the customer side, along with shipping information and such, it can track sales-tax (or VAT) information and payment terms. Similar information is tracked on the vendor side. One significant shortcoming is the inability to perform tax reporting for vendors — to generate 1099 forms for consultants in the US, for example. There is support for managing bills from vendors and invoices to customers, including the usual set of reports to see which vendors need to be paid and which customers are running late.

There is also basic support for the tracking of employees, but no support for activities like payroll processing, which is fine; that work is best outsourced anyway. It is possible to manage and pay expense reports, which are essentially treated the same as invoices. The interface for this activity is not fully intuitive; some information is filled in via a pop-up dialog, followed by a full-window screen for the rest. In general, the business support is functional, but much of it feels clunky and bolted on. The business screens clearly have not received the same level of user-interface work that has gone into the core accounting functions; their operation is not always clear, and they are visually less attractive.

In the "visually attractive" area, report generation in GnuCash works reasonably well. One can get the usual array of tabular reports, bar charts, pie charts, and more. The "custom report" mechanism allows a number of different elements to be combined in a single page if desired.

Getting data into the system is clearly an important part of the accounting task. This data comes from banks, payroll processors, credit-card processors, and more; in LWN's case, it also comes from the web site itself. GnuCash comes off reasonably well in this regard. Data files in the QFX or CSV formats can be imported by the application. The import process is rather tedious and click-heavy at the outset, but the transaction-classifying machinery learns quickly and does a pretty good job — most of the time. There is no support for the IIF format provided by some companies, unfortunately. There is claimed support for direct online access to banks that play along, but I was unable to get it to work, despite using a bank that is said to work.

If all else fails, of course, the Python bindings can be used to feed data directly into a GnuCash file. The biggest problem here is the single-user nature of those files; a user wanting to import some data with an external script while GnuCash is running will be disappointed.

Most other functionality that one would expect is there. Check printing works and has a reasonably flexible mechanism for describing specific check formats. Relative to QuickBooks, which can queue up a series of checks to print and automatically assign check numbers to transactions, the interface is more work to use. Happily, the need to print checks is falling even in the US, but one does still need to do it on occasion. Beyond that, GnuCash can do reconciliation, scheduled transactions, budgeting, and so on. It also has a mechanism for assigning tax categories to income and expense accounts — a useful tool for dealing with accountants, in the US at least.

Development community

An important thing to keep in mind when one is considering relying on a free-software project is the health of that project's development community. It would not do to end up stuck with an accounting system that is no longer developed or maintained. A number of free accounting systems seem to be maintained as platforms for add-on consulting or services businesses, a situation that, while not without its own hazards, at least provides a way for developers to be paid for their work. GnuCash, which has its roots more in the personal-finance realm, does not appear to have that sort of commercial ecosystem around it. Thus it relies entirely on volunteer developers.

The result is a relatively small community and relatively slow development — but GnuCash is nonetheless a project that appears to have some staying power. The last major GnuCash release was 2.6.0 at the very end of 2013. There has been a steady stream of maintenance releases since, up to 2.6.17 on July 2. There is a 2.8 release that is said to be nearing completion, though it felt pretty unstable when tested for this article. GnuCash 2.8 does not appear to bring a lot in the way of new features, but it does bring the rather overdue GTK+ 3 transition.

Since 2.6.0, the project has merged just short of 2,600 changesets from 96 developers. Three of those developers (John Ralls, Geert Janssens, and Robert Fewell) have accounted for 75% of those changes. The 2.6.13 release came out in September 2016; since then, there have been 1,590 changes from 52 developers. The same three developers accounted for 84% of the changes during that time. So GnuCash is dependent on a small set of dedicated developers with a few dozen folks putting in an occasional fix. It is not a single-person project, but it is not a huge community either; the loss of one of those top-three developers would hurt.

In conclusion

The bottom line of all this is clear enough: GnuCash is indeed a viable tool for a small company's accounting needs. The basic accounting features are there, and it is relatively easy to integrate into a company's operation if one isn't afraid of doing a little scripting around the edges. The core functionality is reasonably easy to use. It is fully free software, with no company trying to sell the proprietary modules needed to obtain its full functionality.

This survey of accounting systems will not stop here, though, and that is a good thing. While GnuCash can do the job, it is not a perfect fit. The business functionality often feels like an afterthought, and the relatively small size of the development community is a bit worrisome. The facts that GnuCash is only now pulling together a release to move to GTK+ 3 and that the 2.8.0 release, after more than three years, brings almost no new features suggest a lack of development momentum. If GnuCash is the best there is, that will be good enough, but it's hard not to hope that there is something better out there.

Comments (14 posted)

At DebConf17, John Sullivan, the executive director of the FSF, gave a talk on the supposed decline of the use of copyleft licenses use free-software projects. In his presentation, Sullivan questioned the notion that permissive licenses, like the BSD or MIT licenses, are gaining ground at the expense of the traditionally dominant copyleft licenses from the FSF. While there does seem to be a rise in the use of permissive licenses, in general, there are several possible explanations for the phenomenon.

When the rumor mill starts

Sullivan gave a recent example of the claim of the decline of copyleft in an article on Opensource.com by Jono Bacon from February 2017 that showed a histogram of license usage between 2010 and 2017 (seen below).

In Black Duck's sample, the most popular variant of the GPL – version 2 – is less than half as popular as it was (46% to 19%). Over the same span, the permissive MIT has gone from 8% share to 29%, while its permissive cousin the Apache License 2.0 jumped from 5% to 15%.

From that, Bacon elaborates possible reasons for the apparent decline of the GPL. The graphic used in the article was actually generated by Stephen O'Grady in a January article, The State Of Open Source Licensing , which said:

Sullivan, however, argued that the methodology used to create both articles was problematic. Neither contains original research: the graphs actually come from the Black Duck Software "KnowledgeBase" data, which was partly created from the old Ohloh web site now known as Open Hub.

To show one problem with the data, Sullivan mentioned two free-software projects, GNU Bash and GNU Emacs, that had been showcased on the front page of Ohloh.net in 2012. On the site, Bash was (and still is) listed as GPLv2+, whereas it changed to GPLv3 in 2011. He also claimed that "Emacs was listed as licensed under GPLv3-only, which is a license Emacs has never had in its history", although I wasn't able to verify that information from the Internet archive. Basically, according to Sullivan, "the two projects featured on the front page of a site that was using [the Black Duck] data set were wrong". This, in turn, seriously brings into question the quality of the data:

I reported this problem and we'll continue to do that but when someone is not sharing the data set that they're using for other people to evaluate it and we see glimpses of it which are incorrect, that should give us a lot of hesitation about accepting any conclusion that comes out of it.

Reproducible observations are necessary to the establishment of solid theories in science. Sullivan didn't try to contact Black Duck to get access to the database, because he assumed (rightly, as it turned out) that he would need to "pay for the data under terms that forbid you to share that information with anybody else". So I wrote Black Duck myself to confirm this information. In an email interview, Patrick Carey from Black Duck confirmed its data set is proprietary. He believes, however, that through a "combination of human and automated techniques", Black Duck is "highly confident at the accuracy and completeness of the data in the KnowledgeBase". He did point out, however, that "the way we track the data may not necessarily be optimal for answering the question on license use trend" as "that would entail examination of new open source projects coming into existence each year and the licenses used by them".

In other words, even according to Black Duck, its database may not be useful to establish the conclusions drawn by those articles. Carey did agree with those conclusions intuitively, however, saying that "there seems to be a shift toward Apache and MIT licenses in new projects, though I don't have data to back that up". He suggested that "an effective way to answer the trend question would be to analyze the new projects on GitHub over the last 5-10 years." Carey also suggested that "GitHub has become so dominant over the recent years that just looking at projects on GitHub would give you a reasonable sampling from which to draw conclusions".

Indeed, GitHub published a report in 2015 that also seems to confirm MIT's popularity (45%), surpassing copyleft licenses (24%). The data is, however, not without its own limitations. For example, in the above graph going back to the inception of GitHub in 2008, we see a rather abnormal spike in 2013, which seems to correlate with the launch of the choosealicense.com site, described by GitHub as "our first pass at making open source licensing on GitHub easier".

In his talk, Sullivan was critical of the initial version of the site which he described as biased toward permissive licenses. Because the GitHub project creation page links to the site, Sullivan explained that the site's bias could have actually influenced GitHub users' license choices. Following a talk from Sullivan at FOSDEM 2016, GitHub addressed the problem later that year by rewording parts of the front page to be more accurate, but that any change in license choice obviously doesn't show in the report produced in 2015 and won't affect choices users have already made. Therefore, there can be reasonable doubts that GitHub's subset of software projects may not actually be that representative of the larger free-software community.

In search of solid evidence

So it seems we are missing good, reproducible results to confirm or dispel these claims. Sullivan explained that it is a difficult problem, if only in the way you select which projects to analyze: the impact of a MIT-licensed personal wiki will obviously be vastly different from, say, a GPL-licensed C compiler or kernel. We may want to distinguish between active and inactive projects. Then there is the problem of code duplication, both across publication platforms (a project may be published on GitHub and SourceForge for example) but also across projects (code may be copy-pasted between projects). We should think about how to evaluate the license of a given project: different files in the same code base regularly have different licenses—often none at all. This is why having a clear, documented and publicly available data set and methodology is critical. Without this, the assumptions made are not clear and it is unreasonable to draw certain conclusions from the results.

It turns out that some researchers did that kind of open research in 2016 in a paper called "The Debsources Dataset: Two Decades of Free and Open Source Software" [PDF] by Matthieu Caneill, Daniel M. Germán, and Stefano Zacchiroli. The Debsources data set is the complete Debian source code that covers a large history of the Debian project and therefore includes thousands of free-software projects of different origins. According to the paper:

The long history of Debian creates a perfect subject to evaluate how FOSS licenses use has evolved over time, and the popularity of licenses currently in use.

Sullivan argued that the Debsources data set is interesting because of its quality: every package in Debian has been reviewed by multiple humans, including the original packager, but also by the FTP masters to ensure that the distribution can legally redistribute the software. The existence of a package in Debian provides a minimal "proof of use": unmaintained packages get removed from Debian on a regular basis and the mere fact that a piece of software gets packaged in Debian means at least some users found it important enough to work on packaging it. Debian packagers make specific efforts to avoid code duplication between packages in order to ease security maintenance. The data set covers a period longer than Black Duck's or GitHub's, as it goes all the way back to the Hamm 2.0 release in 1998. The data and how to reproduce it are freely available under a CC BY-SA 4.0 license.

Sullivan presented the above graph from the research paper that showed the evolution of software license use in the Debian archive. Whereas previous graphs showed statistics in percentages, this one showed actual absolute numbers, where we can't actually distinguish a decline in copyleft licenses. To quote the paper again:

The top license is, once again, GPL-2.0+, followed by: Artistic-1.0/GPL dual-licensing (the licensing choice of Perl and most Perl libraries), GPL-3.0+, and Apache-2.0.

Indeed, looking at the graph, at most do we see a rise of the Apache and MIT licenses and no decline of the GPL per se, although its adoption does seem to slow down in recent years. We should also mention the possibility that Debian's data set has the opposite bias: toward GPL software. The Debian project is culturally quite different from the GitHub community and even the larger free-software ecosystem, naturally, which could explain the disparity in the results. We can only hope a similar analysis can be performed on the much larger Software Heritage data set eventually, which may give more representative results. The paper acknowledges this problem:

Debian is likely representative of enterprise use of FOSS as a base operating system, where stable, long-term and seldomly updated software products are desirable. Conversely Debian is unlikely representative of more dynamic FOSS environments (e.g., modern Web-development with micro libraries) where users, who are usually developers themselves, expect to receive library updates on a daily basis.

The Debsources research also shares methodology limitations with Black Duck: while Debian packages are reviewed before uploading and we can rely on the copyright information provided by Debian maintainers, the research also relies on automated tools (specifically FOSSology) to retrieve license information.

Sullivan also warned against "ascribing reason to numbers": people may have different reasons for choosing a particular license. Developers may choose the MIT license because it has fewer words, for compatibility reasons, or simply because "their lawyers told them to". It may not imply an actual deliberate philosophical or ideological choice.

Finally, he brought up the theory that the rise of non-copyleft licenses isn't necessarily at the detriment of the GPL. He explained that, even if there is an actual decline, it may not be much of a problem if there is an overall growth of free software to the detriment of proprietary software. He reminded the audience that non-copyleft licenses are still free software, according to the FSF and the Debian Free Software Guidelines, so their rise is still a positive outcome. Even if the GPL is a better tool to accomplish the goal of a free-software world, we can all acknowledge that the conversion of proprietary software to more permissive—and certainly simpler—licenses is definitely heading in the right direction.

[I would like to thank the DebConf organizers for providing meals for me during the conference.]

Comments (20 posted)

Power-efficient workqueues were first introduced in the 3.11 kernel release; since then, fifty or so subsystems and drivers have been updated to use them. These workqueues can be especially useful on handheld devices (like tablets and smartphones), where power is at a premium. ARM platforms with power-efficient workqueues enabled on Ubuntu and Android have shown significant improvements in energy consumption (up to 15% for some use cases).

Workqueues (wq) are the most common deferred-execution mechanism used in the Linux kernel for cases where an asynchronous execution context is required. That context is provided by the worker kernel threads, which are woken whenever a work item is queued for them. A workqueue is represented by the workqueue_struct structure, and work items are represented by struct work_struct . The latter includes a pointer to a function which is called by the worker (in process context) to execute the work. Once the worker has finished processing all the work items queued on the workqueue, it becomes idle.

The most common APIs used to queue work are:

bool queue_work(struct workqueue_struct *wq, struct work_struct *work); bool queue_work_on(int cpu, struct workqueue_struct *wq, struct work_struct *work); bool queue_delayed_work(struct workqueue_struct *wq, struct delayed_work *dwork, unsigned long delay); bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq, struct delayed_work *work, unsigned long delay);

The first two functions queue the work for immediate execution, while the other two queue it to run after delay jiffies have passed. The work queued by queue_work_on() (and queue_delayed_work_on() ) is executed by the worker thread running on the designated cpu . The work queued by queue_work() (and queue_delayed_work() ), instead, can be run by any CPU in the system (though it doesn't really happen that way, as will be described later).

The workqueue pinning problem

A fairly common use case for workqueues in the kernel is to repetitively run and requeue the work from the work function itself, as we need to do some task periodically. For example:

static void foo_handler(struct work_struct *work) { struct delayed_work *dwork = to_delayed_work(work); /* Do some work here */ queue_delayed_work(system_wq, dwork, 10); } void foo_init(void) { struct delayed_work *dwork = kmalloc(sizeof(*dwork), GFP_KERNEL); INIT_DEFERRABLE_WORK(dwork, foo_handler); queue_delayed_work(system_wq, dwork, 10); }

foo_init() allocates the delayed work structure and queues it with a ten-jiffy delay. The work handler ( foo_handler() ) performs the periodic work and queues itself again.

One might think that the work will be executed on any CPU (whichever the kernel finds to be most appropriate). But that's not really true. The workqueue core will most likely queue it on the local CPU (the CPU where queue_delayed_work() was called), unless the local CPU isn't part of the global wq_unbound_cpumask . On an eight-core platform, for example, the work function shown above will be executed on the same CPU every time, even if that CPU is idle and some of the other seven CPUs were not.

The wq_unbound_cpumask is the mask of CPUs that are allowed to execute work which isn't queued to a particular CPU (i.e. work queued with queue_work() and queue_delayed_work()). It can be found in sysfs as devices/virtual/workqueue/cpumask . This mask is used to keep such work items confined to a specific group of CPUs and can be useful in cases like heterogeneous CPU architectures, where we want to execute such work items on low-power CPUs only, or with CPU isolation, where we don't want such work items to execute on CPUs doing important, performance-sensitive work. This mask can't be used to get rid of the pinning problem described above, though; if the local CPU is part of the wq_unbound_cpumask , then queue_work() will keep queuing the work there.

It is probably fine (from power-efficiency point of view) if a CPU is interrupted to run a workqueue while it is doing some other work. But if the CPU is brought out of the idle state solely to service the timer and queue the work, more power than necessary will be consumed. Pinning may also not be good for performance in certain cases, as the selected CPU may not be the best available CPU to run the work function. Also, the scheduler is unable to load-balance this work with other CPUs and the response time of the work function may increase if the target CPU is currently busy.

Power-efficient workqueues

The power-efficient workqueue infrastructure is disabled by default, as we may want the same work items to be either power or performance-oriented depending on the current system configuration. These workqueues can be enabled by either passing workqueue.power_efficient=true on the kernel command line or enabling the CONFIG_WQ_POWER_EFFICIENT configuration option. The command line can also be used to disable this feature (if enabled in the kernel configuration) by setting workqueue.power_efficient=false .

Once the power-efficient workqueue functionality is enabled, a workqueue can be made to run in the power-efficient mode by passing the WQ_POWER_EFFICIENT flag to alloc_workqueue() when creating the workqueue. There are two system-level workqueues that run in this mode as well: system_power_efficient_wq and system_freezable_power_efficient_wq ; they can be used when a private workqueue is not needed.

Instead of running work on the local CPU, the workqueue core asks the scheduler to provide the target CPU for the work queued on unbound workqueues (which includes those marked as power-efficient). So they will not get pinned on a single CPU as can happen with regular workqueues.

Unfortunately, that does not mean that the scheduler always picks the optimal CPU to run a workqueue task. The algorithm responsible for picking the CPU for a task is complex but, more likely than not, the scheduler will pick the least busy CPU among those sharing the same last-level cache. For a multi-cluster platform, it will most likely pick a CPU from the same cluster. But if the work handler doesn't finish quickly, load balancing will happen and that may move the task to another, possibly idle, CPU.

Thus, with the current design of Linux kernel scheduler, we may not get the best results (though they should still be good enough) with power-efficient workqueues. There is ongoing work (strongly pushed by the ARM community) to make the scheduler more power-aware and power-efficient in general; this work will also benefit power-efficient workqueues. Currently, they are a bit more useful (from power-efficiency point of view) with the Android kernel, which carries some scheduler modifications to make it more energy-aware.

It is natural to wonder whether all workqueues should run in the power-efficient mode. But power-efficient workqueues have one disadvantage: they may end up executing the work item on a different CPU every time, incurring lots of cache misses, depending on how much data the work handler accesses. This can significantly hurt the performance of the system when workqueue tasks run often and need their caches to be hot. On the other hand, this can be good, performance wise, in some cases where cache misses are not a big issue, as the scheduler can do load balancing and the response time for the work items may improve. So one needs to evaluate the users of the workqueues carefully and see which configuration (power-efficient or not) they fit best with.

Power numbers

I ran some benchmarks on a 32-bit ARM big.LITTLE platform with four Cortex A7 cores and four Cortex A15 cores. Audio was played in background using aplay while the rest of the system was fairly idle. Linaro's ubuntu-devel distribution was used and the kernel also had some out-of-tree scheduler changes. The results across multiple test iterations showed average improvement of 15.7% in energy consumption with power-efficient workqueues enabled. The numbers shown here are in joules.

Vanilla kernel +

scheduler patches Vanilla Kernel +

scheduler patches +

power-efficient wq A15 cluster 0.322866 0.2289042 A7 cluster 2.619137 2.2514632 Total 2.942003 2.4803674

With the mainline kernel, the power-efficient workqueues will give better results today as well since the scheduler picks a better target CPU; it will further improve as the scheduler gets more energy aware.

Comments (7 posted)

The persistent-memory arrays we're told we'll all be able to get someday promise high-speed, byte-addressable storage in massive quantities. The Linux kernel community has been working to support this technology fully for a few years now, but there is one problem lacking a proper solution: allowing direct writes to persistent memory that is managed by a filesystem. None of the proposed solutions have yet made it into the mainline, but that hasn't stopped developers from trying; now two new patch sets addressing this issue are under consideration.

Normally, filesystems are in control of all I/O to the underlying storage media; they use that control to ensure that the filesystem structure is consistent at all times. Even when a file on a traditional storage device is mapped into a process's virtual address space, the filesystem manages the writeback of modified pages from the page cache to persistent storage. Directly mapped persistent memory bypasses the filesystem, though, leading to a number of potential problems including inconsistent metadata or data corruption and loss if the filesystem relocates the file being modified. Solving this problem requires getting the filesystem back into the loop just far enough to avoid confusion while keeping the performance enabled by direct access to the storage media.

Proposed solutions have included a special "I know what I'm doing" flag and, more recently, a new system call named daxctl() to freeze the state of a file's metadata so that the data could be safely modified in place. None of them have proved fully satisfactory, though, sending developers back to their keyboards to come up with a new approach.

Synchronous page faults

One new contender is the synchronous page faults patch set from Jan Kara. It follows the lead of some of the previous attempts by ensuring that any needed filesystem metadata writes are completed before a process is allowed to modify directly mapped data. A new flag, MAP_SYNC , is added to the mmap() system call to request the synchronous behavior; that means, in particular:

The guarantee provided by this flag is: While a block is writeably mapped into page tables of this mapping, it is guaranteed to be visible in the file at that offset also after a crash.

In other words, the filesystem will not silently relocate the block, and it will ensure that the file's metadata is in a consistent state so that the blocks in question will be present after a crash. This is done by ensuring that any needed metadata writes have been done before the process is allowed to write pages affected by that metadata.

When a persistent-memory region is mapped using MAP_SYNC , the memory-management code will check to see whether there are metadata writes pending for the affected file. It will not actually flush those writes out, though. Instead, the pages are mapped read-only with a special flag, forcing a page fault when the process first attempts to perform a write to one of those pages. The fault handler will then flush out any dirty metadata synchronously, set the page permissions to allow the write, and return. At that point, the process can write the page safely, since all the necessary metadata changes have already made it to persistent storage.

The result is a relatively simple mechanism that will perform far better than the currently available alternative — manually calling fsync() before each write to persistent memory. The potential downside is that any write operation can now create a flurry of I/O as the filesystem flushes out dirty metadata. That can cause the process to block in what was supposed to be a simple memory write, introducing latency that may be unexpected and unwanted. Fear of that latency has helped to drive the quest for alternatives.

MAP_DIRECT

One such alternative is the MAP_DIRECT patch set from Dan Williams. It can be thought of as the current form of the daxctl() patch mentioned above, though that new system call is no longer a part of the proposal. Instead, we have, once again, a new mmap() flag, but the proposed semantics are rather different. This flag eliminates the potential for write-fault latency by "sealing" the state of the file at the time it is mapped.

When a filesystem sees a map request with MAP_DIRECT , it should ensure that all metadata related to the area being mapped is consistent on the storage media before continuing. Once the mapping has been made, the filesystem must reject any operation that would force a metadata write affecting the portion of the file that has been mapped. Blocks cannot be moved, for example, unless the filesystem can magically perform the move in an atomic manner that does not risk data loss for a concurrent process writing to that block. Operations like truncating the file, breaking the sharing of extents in the file, or allocating blocks will fail. This extends to allocating blocks for the region that has been mapped; the application must thus ensure that all of the relevant blocks are allocated before creating the mapping.

An important aspect of this "sealing" operation is that it is a part of the filesystem's runtime state; it is not stored on the media itself. So, if the system crashes, the file will not be sealed after the reboot. The seal is only there to support a specific mapping and will go away when the mapping itself is taken down. It's also worth noting that the filesystem implementation may choose to only seal the portion of the file that has been mapped, or it may seal the entire file.

An application that uses MAP_DIRECT will want a clear indication from the kernel that the file has indeed been sealed. Unfortunately, mmap() is one of those system calls that does not check for unknown flags; one can pass MAP_DIRECT on any existing kernel and not get an error back. To get around this problem, the patch set adds a new mmap3() variant that does fail on unknown flags. Internally, the patch set adds a mmap_supported_mask field to the file_operations structure so that each low-level implementation can specify which flags it is able to handle. Requiring applications to use a new version of mmap() is not pretty, but there is no other way to solve the problem without an ABI change.

Use of MAP_DIRECT requires the CAP_LINUX_IMMUTABLE capability; without that restriction, it was feared, it might be possible to carry out a denial-of-service attack by sealing a file that some other process needs to be able to change. As a result, this feature is not available to most users, which rather limits its usefulness. In an attempt to improve the situation, the patch set also adds a new fcntl() operation called F_MAP_DIRECT . This operation, which is also subject to the capability check, sets a flag on an open file that causes subsequent mmap() operations to act as if MAP_DIRECT had been specified, but without the capability check. The idea is that a privileged process could open a file and set this flag, then pass the file descriptor to an unprivileged process that does the actual work with that file.

One advantage to MAP_DIRECT is that it has applications beyond just allowing high-performance applications to write directly to storage. The sealing mechanism is close to what the kernel needs anyway for files used for swapping, so some improvements may be possible there. It also makes it possible to set up DMA I/O operations from user-space drivers, a feature that is attractive in the RDMA realm, at least.

Comments on both patch sets have been relatively muted after the most recent posting. Each is probably getting close to a point where it could be considered for inclusion. What has not happened, though, is any sort of discussion on which of the two is the better approach, or whether they should be combined somehow. So, while the community may be getting closer to a solution for direct writes to persistent memory, it will probably be a little while yet before any solution makes it upstream.

Comments (5 posted)

TeX has been the tool of choice for the preparation of papers and documents for mathematicians, physicists, and other authors of technical material for many years. Although it takes some effort to learn how to use this venerable work of free software, its devotees become addicted to its ability to produce publication-quality manuscripts from a plain-text, version-control-friendly format.

Most TeX users use LaTeX, which is a set of commands and macros built on top of TeX that allow automated cross-referencing, indexing, creation of a table of contents, and automatic formatting of many types of documents. TeX, LaTeX, a host of associated utilities, fonts, and related programs are assembled into a large package called TeX Live. It's available through the package managers of many Linux distributions, but to get an up-to-date version, one often needs to download it from its maintainers directly.

The 2017 version of TeXLive was recently released. As usual, this new release contains no big surprises and should create no compatibility issues. All of those who use TeX in their daily work will eventually get around to starting the big download before bedtime to reap the benefits of dozens of incremental improvements. Enthusiastic readers of release notes may encounter one notable tidbit, however: the version of LuaTeX, a key component of TeX Live, is now numbered 1.0.4.

LuaTeX is a project with several components and goals, all of which take TeX in new directions. This project modernizes font handling, using Unicode for input and output, including for math. It allows you to use any OpenType or TrueType font on your system, and to select typefaces, styles, variants, and font features easily and flexibly. LuaTeX embeds the Lua scripting language into TeX, allowing authors a new level of power and control. Previously, authors who needed to bend TeX to do non-standard or unusual typesetting tasks were obliged to program in the TeX language itself, which is an arcane and specialized skill. With LuaTeX, authors can accomplish these tasks by writing scripts in a language with a more familiar syntax.

As recently as just a few years ago, however, documentation writers were warning against using the still-evolving LuaTeX in critical work. For example, there is a cautionary note in "A guide to LuaLaTeX" [PDF], which is one of the few available introductory documents about the project.

LuaTeX is now out of beta, however, which means that the time for worrying about this has now passed. According to the official roadmap, LuaTeX has become "functionally complete"; most of the functionality and interfaces are considered to be stable. You can now undertake large LuaTeX projects without worrying about your document breaking with the next upgrade. A "large LuaTeX project" in this context means not just a long document that you happen to process using LuaTeX, but one that makes extensive use of the Lua scripting that the project makes possible.

In fact, LuaTeX has been the generally preferred TeX implementation for some time. It has become an inseparable part of ConTeXt, which is a large project that is a popular alternative to LaTeX for publishing all kinds of documents, especially books and pamphlets. LuaTeX passing version 1 means that its official status has caught up to its de facto status as the center of development in the TeX world.

Terminology

As terms in the TeX world can become confusing, this brief aside may be called for. The engine is the command that you type to compile your document. The name of the engine determines, among other things, what format will be used, and what kind of output file TeX will create. There are two choices of output file: the original DVI file and the generally more popular and useful PDF. There are a handful of formats, such as Plain TeX and LaTeX, that are large sets of macros that alter the behavior of the TeX typesetting program that underlies everything, and that expose various settings and commands to the author.

For example, the engine pdftex will use the Plain TeX format to create a PDF; the latex command will use the LaTeX format to create a DVI; and lualatex will use the LaTeX format and create a PDF by running a version of the pdftex program that has been rewritten in C and can interoperate with the Lua scripting language.

Most of the terms with weird mixtures of case refer to formats (LaTeX) or projects such as LuaTeX.

Why LuaTeX?

LuaTeX provides many advantages over the more traditional versions of TeX, and a couple of possible disadvantages. There are some, usually older, LaTeX packages that are incompatible with LuaTeX; if you depend on one of these, then you may want to stick with XeLaTeX (a part of XeTeX), which is the Unicode-aware LaTeX-like project, or even the older pdfLaTeX, if necessary. You may, however, want to consider either making the jump to ConTeXt, which can duplicate the capabilities of many old LaTeX packages, or trying to reproduce the effects you want with some Lua scripting.

Another disadvantage is that, for some documents, luatex can be slower than the other engines. According to the Introduction of the LuaTeX Reference [230-page PDF], when using Plain TeX, for example, pdftex will almost always be faster than luatex , but complex documents, especially using the recent incarnation of ConTeXt, often run faster with luatex . In any case, the difference is typically about a factor of two or so.

One of the delightful advantages of LuaTeX is its simple and powerful font handling, which I demonstrated in a previous article about recent TeX developments. In addition to the features introduced in that article, I should mention that LuaTeX has highly developed support for directional typesetting, which means it can handle languages that go from right to left or vertically. Another advantage, for some, is easier access to MetaPost, bringing an efficient graphical subsystem into the TeX engine.

LuaTeX has a handful of commands to suppress different kinds of error messages. Some of these don't merely turn off the messages, but cause things to be permitted that were previously forbidden. For example, you can tell TeX that it's OK to have a paragraph break inside an equation, which frees you up to format your math input more flexibly.

There are several improvements in math typesetting related to the possibility of using wide glyphs in equations. The distance between equations and their numbers can now be controlled. There is a new type of leader, whose alignment is based on the largest enclosing box (rather than the smallest). You can set the minimum word length in which hyphenation will be allowed.

Aside from font handling, these are all minor enhancements; there are dozens more listed in the reference manual linked above. The headline feature of LuaTeX is its embedding of the Lua scripting language, and everything that enables.

Lua

Lua is a deceptively simple, but sophisticated, modern scripting language. Some of Lua's unusual, or unusually nice, features are: a single data structure (the "table"); functions that can return multiple results; proper tail calls; lexical scoping (closures, etc.); built-in coroutines; and "metatables" and metamethods.

The features that make Lua so popular as an embedded scripting language, however, are its small footprint (10,000 lines of C and a compiled, static library of 500K) and its design from the ground up as a language for embedding in C and C++ programs. Its JIT compiler and advanced garbage collection make for good performance in both time and space. The tradeoff is its relatively spare standard library, but this is supplemented by a healthy ecosystem of official and contributed packages.

Lua is widely used, for example, by game developers, who can program the core game behavior in C while defining the game play logic in Lua. You can get the latest version of the language up and running on any system with an ANSI C compiler — there are no extra dependencies. On many distributions, there is also a recent version available through the package management system, and, finally, there are binaries provided for a variety of operating systems. Lua has an interactive mode that you can start by typing lua in the terminal. This of course helps in learning the language through experimentation but, unlike Python and Lisp, typing a something won't return its result unfortunately; you need to use the print statement.

Fortunately the application of Lua within TeX documents rarely requires anything beyond basic knowledge of the language. Those who wish a systematic introduction may want to study the book Programming in Lua; all of the editions of the book, covering recent versions of the language, can be found here, including an early edition available free of charge.

Lua in LaTeX

By simply including the line \usepackage{luacode} in your LaTeX document's preamble, and processing it with the lualatex command, you can mix Lua in with your LaTeX. There are two major ways of incorporating Lua into your document; the first is to insert the results of a Lua calculation into the TeX token stream, by using tex.print (or tex.sprint , that inserts the output inline, avoiding a line break) instead of the normal Lua print function. This article shows how to do that by using Lua to compute a numerical table within a LaTeX document that formats the table, and another one that graphs it.

Below is another example that uses Lua to calculate a fancy paragraph shape and construct the TeX command for defining it. The \parshape command in LaTeX or Plain TeX lets you define any shape for the current paragraph, but it's verbose and cumbersome, because you need to type in a list of line lengths and indents, following the number of lines that you want it to apply to. Wouldn't it be nice to be able to simply say that you would like a paragraph to have the shape of a particular mathematical expression? Here is a little LaTeX document that typesets the beginning of a great American novel in the wavy shape of the cos2 function:

\documentclass{article} \usepackage{luacode} \usepackage{fontspec} \begin{document} \setlength{\parindent}{0pt} \pagestyle{empty} \begin{luacode*} function mpshape() shape = "" n = 23 bl = 10 for i = 1, n do indent = string.format("%4f", 2*math.cos(i*2*math.pi/n)^2) length = string.format("%4f", bl - 2*indent) shape = shape.." "..indent.."cm "..length.."cm" end tex.sprint("\\parshape= "..n.." "..shape) end \end{luacode*} \luadirect{mpshape()} Call me Ishmael. Some years ago—never mind how long precisely—having [text deleted] \end{document}

Here is the output:

A few things here need a little explanation. We need to include the fontspec package to enable Unicode input: this should be a standard part of your preamble if you are using LuaTeX, in which case you probably have a handful of additional fontspec commands to set up your fonts (the em-dashes in the input are the only Unicode here). The luacode* environment passes its contents directly to Lua, and frees you from worrying about escaping special TeX characters such as backslashes. This is explained in the documentation for the luacode LaTeX package. Everything between the lines \begin{luacode*} and \end{luacode*} is not TeX, but pure Lua. The Lua code here serves to define the function mpshape , that takes no arguments. The function uses an "arithmetic for" loop with a common syntax. Note that all blocks are terminated with end , and that white space is insignificant (except for comments). The ".." operator used several times is string concatenation; Lua converts numbers to strings as needed.

We've used string formatting, which works in Lua just as in C and many other languages. The dimensions are in cm, and bl is the unaltered line length. This document follows a common pattern, which is to define some functions in a luacode* environment, and invoke them in the document as needed. You can also put the function definitions in an external file.

When you want to actually execute some Lua code, call it as the argument of a luadirect command. Here you must use caution if you are inserting Lua code directly rather than merely calling a function, as we do here: the one place where line breaks are significant in Lua is in comments, which begin with "--" and extend to the end of the line. But TeX changes line breaks to spaces, which means that a comment in your code will comment out everything that comes after it. Here the luadirect command runs our mpshape function, which in turn inserts the parshape command into the TeX stream. I've kept things simple for illustration, but you can easily see how this could be generalized. Since Lua has an eval function, you could pass a mathematical expression into mpshape as an argument, along with values for bl and n (the number of lines to process, here hard-coded as 23, which I found by trial and error).

Altering typesetting with callbacks

There is a whole different level of Lua-TeX integration afforded by callbacks from TeX's processing stages. TeX processes documents in a series of steps, such as reading input, hyphenating, inserting glue, ligaturing, breaking lines, and many more (see the LuaTeX reference manual linked to above). At each of these stages, TeX is dealing with a linked list of nodes, which are elements such as glyphs, kerns, lines, etc., with associated collections of properties. In order to modify one of TeX's processing stages, you write a Lua function that manipulates the relevant node list, and register this function as a callback attached to the stage. When the TeX engine reaches the stage in question, it will call the registered function, and continue as normal after it returns, but with a modified node list.

Here is a example of a LaTeX document that uses this technique to gradually decrease the grey value of each line in a paragraph, creating a fade-out effect. Consider that it would be impossible to create this effect by, for example, inserting color commands in the input file, because one does not know where TeX will break lines until one actually runs the document through it; and the line breaks depend on the line width. You need to insert the color commands as part of the line breaking process, and this is precisely what LuaTeX callbacks enable.

Here is a complete document that you can process with the lualatex command to get the output in the figure. It will make more sense than it does at first glance after the explanation that follows.

\documentclass{article} \usepackage{luacode} \usepackage{fontspec} \usepackage[total={10cm,25cm},centering]{geometry} \begin{document} \setlength{\parindent}{0pt} \pagestyle{empty} \begin{luacode*} function fadelines(head) WHAT = node.id("whatsit") COL = node.subtype("pdf_colorstack") colorize = node.new(WHAT,COL) gvalue = 0 for line in node.traverse_id(0,head) do colorize.data = gvalue.." g" node.insert_before(head, line, node.copy(colorize)) gvalue = math.min(gvalue + 0.06, 1) end return head end luatexbase.add_to_callback("post_linebreak_filter", fadelines, "fadelines") \end{luacode*} Call me Ishmael. [text deleted] \end{document}

Here is the result:

The luacode section starts with another function definition, this time taking an argument that is the head of a list of TeX nodes; which list of nodes it gets passed will be determined later by how the function is registered as a callback. The first three lines of the function define colorize to be a "whatsit" node, which is a node type used for such things as PDF output. The for block loops over nodes of type 0, which are "hlist nodes", beginning at the HEAD . For each node it defines a string with the value "gvalue g", where the number gvalue starts at 0 (full black) and increases by 0.06 at each iteration. The string is used as a new value for the data field of the colorize node, which is then inserted before the current node. The node.insert_before function takes care of keeping the link structure correct as the new node is inserted into the linked list. When TeX constructs the output PDF, the colorize nodes become constructs that set the color.

After the function definition, the luatexbase.add_to_callback call registers the function as a callback attached to the post-linebreak phase. Our function fadelines will be called immediately after TeX finishes breaking the text into lines. The node list traversed by the function will be the final, typeset list of lines. The string in the final argument can be any label, and is used for a subsequent unregister command if desired.

If all this seems a bit arcane, that's because it is. There is little in the way of gentle tutorial material to teach one how to do this kind of work. But, after studying some example code [52-page amusing PDF], carefully reading a few sections of the reference manual, and some experimentation, I was pleasantly surprised at how quickly I could go from an idea to a working implementation. Although anything that you can do with these techniques you could also do, in theory, by programming purely in TeX, using Lua and the interfaces defined in the LuaTeX project is far simpler. If you've had the pleasure of trying to read and understand a LaTeX style file, for example, the code here will seem far less arcane by comparison.

A simple modification of our fadelines function will change the color of the output text letter-by-letter. Here is the new function and its output:

function fadelines(head) GLYPH = node.id("glyph") WHAT = node.id("whatsit") COL = node.subtype("pdf_colorstack") colorize = node.new(WHAT,COL) cvalue = 0 for line in node.traverse_id(GLYPH,head) do colorize.data = cvalue.." "..1 - cvalue.." .5".." rg" node.insert_before(head, line, node.copy(colorize)) cvalue = math.min(cvalue + .0008, 1) end return head end luatexbase.add_to_callback("pre_linebreak_filter", fadelines, "fadelines")

At the beginning of the function the GLYPH variable is set to the node representing a single printed character. The constructed string used for the data field of the coloring node now has the form " R G B rg ", and is made to change gradually as the loop traverses the GLYPH nodes. Finally, we register the function to the pre_linebreak_filter callback, to get access to the list of glyph nodes.

Parting words

Since any LuaTeX document may contain any code whatsoever in the form of embedded Lua scripts, you must use caution in processing documents from untrusted sources. "Normal" TeX has traditionally refused to run operating system commands (to run external programs, for example) unless you specifically enabled them, but that safety check is absent from LuaTeX. You can, however, invoke lualatex with the --safer flag, which disables several features that could cause mischief, including spawning processes and creating files. See pp. 44-45 of the above-linked reference manual for details.

If you are undertaking a large project using LaTeX, such as writing a textbook, or a series of smaller projects where you may want to get TeX to perform typesetting tricks that are not covered by a LaTeX package, I believe it is well worth it to become acquainted with the techniques described here. Although TeX is a Turing-complete language, actually writing TeX code to do anything non-trivial is dark magic. In contrast, after very little study you can do things with LuaTeX that would be practically impossible without it. Lua is pleasant to program in. The ability to insert the results of Lua computations into the TeX document is already immensely useful; the next level, direct access to TeX internals, gives you powers that used to be the exclusive possession of the most advanced TeX wizards.

Comments (21 posted)