Welcome to the first LWN Weekly Edition for 2015. We hope that the holiday season was good to all of you, and that you are rested and ready for another year of free-software development. It is a longstanding tradition to start off the year with a set of ill-informed predictions, so, without further ado, here's what our notoriously unreliable crystal ball has to offer for this year.

We will hear a lot about the "Internet of things" of course. For larger "things" like cars and major appliances, Linux is the obvious system to use. For tiny things with limited resources, the picture is not so clear. If the work to shrink the Linux kernel is not sufficiently successful in 2015, we may see the emergence of a disruptive competitor in that space. We may feel that no other kernel can catch up to Linux in terms of features, hardware support, and development community size, but we could be surprised if we fail to serve an important segment of the industry.

We'll hear a lot about "the cloud" too, and we'll be awfully tired of it by the end of the year. Some of the hype over projects like OpenStack will fade as the project deals with its growing pains. With some luck, we'll see more attention to projects that allow users to own and run their own clouds rather than depending on one of the large providers — but your editor has often been overly optimistic about such things.

While we're being optimistic: the systemd wars will wind down as users realize that their systems still work and that Linux as a whole has not been taken over by some sort of alien menace. There will still be fights — we, as a community, do seem to like fighting about such things — but most of us will increasingly choose to simply ignore them.

There is a wider issue here, though: we are breaking new ground in systems design, and that will necessarily involve doing things differently than they have been done in the past. There will certainly be differences of opinion on the directions our systems should take; if there aren't, we are doing something wrong. There is a whole crowd of energetic developers out there looking to do interesting things with the free software resources we have created. Not all of their ideas will be good ones, but it is going to be fun to watch what they come up with.

There will be more Heartbleed-level security incidents in 2015. There are a lot of dark, unmaintained corners in our software ecosystem, many of which undoubtedly contain ancient holes that, if we are lucky, nobody has yet discovered. But they will be discovered, and we'll not be getting off the urgent-update treadmill this year.

Investments in security will grow considerably as a consequence of 2014's high-profile vulnerabilities, high-profile intrusions at major companies, and ongoing spying revelations. How much good that investment will do remains to be seen; much will be swallowed up by expensive security companies that have little interest in doing the hard work required to actually make our systems more secure.

Investments in other important development areas will grow more slowly despite the great need in many areas. We all depend on code which is minimally maintained, if at all, and there are many unsolved problems out there that nobody seems willing to pick up. The Linux Foundation's Critical Infrastructure Initiative is a good start, but it cannot come close to addressing the whole problem.

Speaking of important development areas, serious progress will be made on the year-2038 problem in 2015. The pace picked up in 2014, but developers worked mostly on the easy part of the problem — internal kernel interfaces. But a real solution will involve user-space changes, and the sooner those are made, the better. The relevant developers understand the need; by the end of this year we'll know at least what the shape of the solution will be.

Some long-awaited projects will gain some traction this year. The worst Btrfs problems are being addressed thanks to stress testing at Facebook and real-world deployment in distributions like openSUSE. Wayland is reaching a point of usability for brave early adopters. Even Python 3, which has been ready for a while, will see increasing use. We'll have programs like X.org and Python 2 around for a long time, but the world does eventually move on.

There has been some talk of a decline in the number of active Linux distributions. If that is indeed the case, any decline in the number of distributions will be short-lived. We may not see a whole lot more general-purpose desktop or server distributions; that ground has been pretty well explored by now, and, with the possible exception of the systemd-avoidance crowd, there does not appear to be a whole lot to be done in that area. But we will see more and more distributions that are specialized for particular applications, be it network-attached storage, routing, or driving small gadgets. The flexibility of Linux in this area is one of its greatest strengths.

Civility within our community will continue to be a hot-button issue in 2015. Undoubtedly somebody will say something offensive and set off a firestorm somewhere. But, perhaps, we will see wider recognition of the fact that the situation has improved considerably over the years. With luck, we'll be able to have a (civil!) conversation on how to improve the environment we live in without painting the community as a whole in an overly bad light. We should acknowledge and address our failures, but we should recognize our successes as well.

Finally, an easy prediction is that, on January 22, LWN will finish its 17th year of publication. We could never have predicted that we would be doing this for so long, but it has been a great ride and we have no intention of slowing down anytime soon. 2015 will certainly be an interesting year for those of us working in the free software community, with the usual array of ups, downs, and surprises. We're looking forward to being a part of it with all of you.

Comments (77 posted)

The Dark Mail Alliance has published the first description of the architecture that enables its secure-and-private alternative to the existing Internet email system. Called the Dark Internet Mail Environment (DIME), the system involves a new email message format and new protocols for email exchange and identity authentication. Nevertheless, DIME also makes an effort to be backward-compatible with existing email deployments. DIME includes several interesting ideas, but its main selling points remain its security: it not only offers end-to-end encryption, but it encrypts much of the message metadata other systems leave in cleartext, too, and it offers resistance to attacks that target servers between the sender and the recipient.

The Alliance

Dark Mail was started in 2013, led by Ladar Levison of the privacy-centric email service Lavabit and by PGP creator Phil Zimmermann of Silent Circle. Both of those companies abruptly shut down their email offerings in August 2013 in reaction to a US government request for access to Edward Snowden's Lavabit account—including a copy of the Lavabit SSL keys, which would have enabled the government to decrypt all of the traffic between Lavabit and its customers. Subsequently, Levison and Zimmermann announced that they would be developing an "email 3.0" system through Dark Mail, with the goal of preventing just the sort of attacks that occurred in the Snowden case.

One key problem that the Snowden incident revealed was that, even if two users employ strong encryption on their email messages (such as with OpenPGP or S/MIME), the metadata in those messages remains unencrypted. And that metadata can contain vital information: the sender and receiver addresses, the subject line, various mail headers, and even the trail of mail servers that relayed the message from sender to destination. Changing that would necessitate a new email message format, new protocols for email transfer and retrieval, and some sort of new infrastructure to let users authenticate each other. A new authentication framework is needed to avoid revealing key owners' email addresses, as currently happens with public PGP keyservers—and to avoid the well-documented vulnerabilities of the certificate authority (CA) system used for SSL/TLS.

DIME is designed to be that replacement email system. It describes a message format that encrypts every part of the message separately, using separate encryption keys for the different parts. Thus, mail transfer agents (MTAs) along the way can decrypt the portions of the message they need to deliver the message—but nothing else—and mail delivery agents (MDAs) can deliver messages to the correct user's inbox without learning anything about their content or about the sender. DIME also describes a transport protocol for sending such encrypted messages—one in which the multiple key retrieval and authentication steps are handled automatically—and a framework for how the authentication tokens required by the system should be published and formatted.

The DIME package

The DIME system is detailed in a 108-page PDF specification—although it should be noted that several sections in the specification are empty, either blank or labeled "TBD." The most significant of these is DIME's IMAP replacement, DMAP, about which the document says: "This protocol specification will not be released as part of the initial publication of this document," followed by an assurance that a later release with more details will follow.

There is also source code for a suite of DIME-related libraries available through the Lavabit GitHub account. So far, none of those GitHub repositories indicates what software license the code is under. Mozilla's Hubert Figuiere filed an issue requesting one that does not yet seem to have been addressed. At this point, however, digesting and understanding the architecture and formats described in the DIME specification is probably the more important concern.

A bird's-eye view of the system starts with the message format. A DIME message object contains three separate sections: the Next-Hop section (which is unencrypted and holds the routing information needed for the current transport method), the Envelope section (which includes two "chunks" for the origin and destination information, each encrypted separately), and the Content section (which contains the email message headers and body, with each header and each body part encrypted separately).

Within the Envelope and Content sections, it is critical that each is encrypted separately and with a variety of keys. This allows applications to decrypt only some parts of a message if not all are of immediate importance (such as a mobile client that only decrypts the Subject and Sender of new messages for a summary screen, rather than downloading and decrypting everything). It also allows the software to control which applications can decrypt which sections by using several different keys.

By encrypting things like attachments and headers separately, there is a clear security and privacy improvement—consider, for example, that mailing-list thread information and return paths could allow an attacker to collect a significant amount of information about a conversation even without seeing the message body. Still, it may come as a surprise to some that DIME also encrypts the sender and recipient email addresses and names. The name of the sender and recipient are optional, of course, but encrypting the addresses might seem to make mail routing and delivery impossible.

Authenticating identities

DIME's solution to this problem is to adopt a domain-based authentication scheme that the origin and destination mail servers can use to validate each other's identities. Each mail server is also responsible for authenticating the user on its end, but the user-to-server authentication is logically separate from the server-to-server authentication.

In other words, the scheme looks like this:

The sender authenticates with the origin mail server, and sends it the encrypted message. The origin mail server can see only the fully qualified domain name of the destination server (not the recipient's account name), so it authenticates the destination server's identity (as described below) and forwards the message to it. The destination server can see only the fully qualified domain name of the origin server (not the sender's account name), so it can also authenticate the origin server's identity. After it accepts the incoming message, the destination server decrypts the recipient's username and puts the message in the appropriate mailbox. The recipient authenticates with the destination mail server, downloads the new message, then decrypts and reads the content.

For each step (sender-to-origin, origin-to-destination, destination-to-recipient), the necessary information to complete the next step is encrypted separately, so that only the need-to-know parties for that step have access to the information. The various fields in the message are each encrypted with an ephemeral session key, and a separate copy of that session key is included in the message for each party trusted to access that field—with each copy encrypted using a known public key for the appropriate party.

So there are three copies of the session key that protects the recipient's email address: one encrypted with the sending user's public key, one encrypted with the destination server's public key, and one encrypted with the recipient user's public key. There are also three copies of the (different) session key that protects the sender's address: one for the sender, one for the recipient, and one for the origin server. All of the keys in question are intended to be generated automatically: users may naturally wish to have control over their personal public/private key pairs (which will require software support), but the session-key generation and retrieval of remote keys is designed to be handled without explicitly involving the user.

The last piece in the puzzle is the actual transport method used to send the message from the origin server to the destination server. Here, DIME allows for several options: TLS, DIME's own SMTP replacement DMTP, or even connecting over a Tor circuit.

Left up to the implementer are details such as exactly how the users authenticate to their servers. There is a "paranoid" mode in which the servers have no access to the user's key material and a full key-exchange process is required for every connection, as well as a "cautious" mode in which the server can store encrypted copies of the user's keys to simplify the process somewhat, and a "trustful" mode in which the server has full access to the user's secret keys.

The server-to-server authentication, however, is more precisely specified. There are two authentication methods, both of which ought to be used to protect against a well-funded adversary. The first is a dedicated keyserver system akin to the OpenPGP keyserver network. The other is based on DNS: each server publishes its DIME public key in a new DNS resource record type, which (for security reasons) ought to be looked up using DNSSEC. Thus, each server can look up the public key of its peer in multiple ways, and verify that it generates an encrypted session key matching the one included in the message before agreeing to the message exchange.

So far, we have been using the term "public key" to describe the DIME keys published for both mail servers and users, but DIME's actual identity system is a bit more complicated than that. The credentials used are called signets, and they include not just a public key, but also a series of signatures and a set of fields describing the DIME options, ciphers, and other settings supported by that user or server. Since DIME's functionality places a great deal of trust in domain-wide identity, each user signet has to be signed by the key for the controlling organization.

What next

DIME is, by any measure, a complex system. Interested users are encouraged to read the full specification, which (naturally) goes into considerably more detail than is feasible here. But by looking at DIME constituent parts separately, it can be easier to follow the overall design. The relevant fields of each message are encrypted separately, and a copy of the decryption key for each field is transmitted for each party that must decrypt the field for processing. The per-party keys are published in a federated manner: each mail domain is responsible for maintaining its own DIME DNS records and keyserver, which places ultimate control of the authentication scheme in the hands of the mail-server administrators, not in a CA that can be compromised.

It is also noteworthy that the project seems to be taking pains to consider how email providers and users might transition to DIME—even if it is a wild success, there will necessarily be a need for DIME users to interoperate with traditional email for many years still to come. The new DNS records and the signet data format include information that can be used to fall back to the most secure alternative available, and several pieces of the overall architecture are optional. Webmail providers, for example, could employ either the "cautious" or "trustful" user-authentication models—the users would have to decide if they indeed trust the provider enough to use the service.

The DIME specification also examines a number of possible attack scenarios against the new system, and shows how DIME is designed to cope with such attacks. Public scrutiny will, of course, be required before most potential adopters consider implementing the architecture. For now, even Lavabit and Silent Circle have not yet announced any intention to deploy DIME-based mail services. When they do so, no doubt the offerings will attract a great many users interested in testing the system.

The other major dimension to any widespread roll-out scenario is acceptance of the DIME architecture by some appropriate standards body. Levison told Ars Technica that he intends to pursue eventual IETF approval via a set of RFCs. That will be a slow process, though, starting when he begins "circulating the project’s specifications document among members of the IETF at the group’s meeting this March."

That said, there is clearly considerable interest within the technology community for the additional protections that DIME offers beyond existing email encryption systems. The government surveillance revealed in the Snowden case alarmed many a software developer (and regular citizen), but the law-enforcement chase that followed it—particularly where it affected Lavabit and Silent Circle—was, in many ways, an even bigger call to arms for privacy advocates.

Comments (91 posted)

Gnuplot is a program for creating plots, charts, and graphs that runs on Linux as well as on a wide variety of free and proprietary operating systems. The purpose of a plot, in general, is to help to understand data or functional relationships by representing them visually. Some plotting programs, including gnuplot, may perform calculations and massage data, which can also be convenient.

Some data-plotting tools are complete solutions, standalone programs that can be controlled through a command line, a GUI, or both. Others exist as subsystems of various tools, or as libraries available for a specific programming language. This article will introduce a prominent example of the first type.

Gnuplot is one of the earliest open-source programs in wide use. It's free enough to be packaged with Debian, for example, but has an idiosyncratic license, with unusual restrictions on how modifications to the source code may be distributed. The name is not derived from the GNU project, with which it has no particular relationship, but came about when the original authors, who had decided on the name "newplot", discovered that this name was already in use.

You may already be using gnuplot without knowing it. The plotting facilities of Maxima, Octave, gretl, the Emacs graphing calculator, and statist, for example, all use gnuplot.

Most of gnuplot is written in C and is quite fast and memory-efficient. Its output is highly customizable, and can be seen in a multitude of scientific and technical publications. It's also a popular choice with system administrators who want to generate graphs of server performance, as it can be run from a script on a remote machine and forward its graphs over X11, without having to transfer the usually voluminous data sets. The same arrangement makes gnuplot useful for monitoring the progress of simulations running on remote machines or clusters.

Gnuplot has an interactive command-line prompt, can run script files stored on disk, can be controlled through a socket connection from any language, and has interfaces in everything from Fortran to Clojure. There are also several GUI interfaces for gnuplot, including an Emacs mode, that are not too widely used, since much of gnuplot's power arises from its scriptability.

Installation

Gnuplot is actively developed, with desirable new features added regularly. If you have Octave or Maxima installed, then you already have gnuplot somewhere, although you might not have a recent version. Binaries are probably available from your distribution's package management system, but they are likely to lag approximately one major version behind the shiniest.

The solution is to follow the Download link from gnuplot headquarters to get the source tarball of the latest stable release (or a pre-release version if you can't live without some feature in development). A simple ./configure and make will get you a working gnuplot, but you probably want to check for some dependencies first.

Having the right packages installed before compiling gnuplot will ensure that the resulting binary supports the "terminals" that you want to use. In gnuplot land, a terminal is the form taken by the output: either a file on disk or a (possibly interactive) display on the screen. Gnuplot is famous for the long list of output formats that it supports. You can create graphs using ASCII art on the console, in a canvas on a web page, in various ways for LaTeX and ConTeXt, as a rotatable, zoomable object in an X window, for Tektronix terminals, for pen plotters, and much else, including Postscript, EPS, PNG, SVG, and PDF.

Support for most of this will happen without any special action on your part. But you will want to make sure that you have compiled in the highest quality, anti-aliased graphics formats, using the Cairo libraries; this makes a noticeable difference in the quality of the results. You will need to have the development libraries for Cairo and Pango installed. On my Ubuntu laptop installation of the packages libcairo2-dev and libpango1.0-dev are sufficient for the latest stable (v. 4.6.6) gnuplot version. Pick up libwxgtk2.8-dev while you're at it: it will add support for a wxWidgets interactive terminal that's a higher quality alternative to the venerable X11 display. Finally, if you envision using gnuplot with LaTeX, you might want the Lua development package, which enables gnuplot's tikz terminal.

Using gnuplot

Gnuplot comes with extensive help. For extra information about any of the commands used below, try typing " help command " at the gnuplot interactive prompt. For more, try the official documentation [PDF], the many examples on the web, or the two books about gnuplot: one by Philipp K. Janert and one by me. The command stanzas here can be entered as shown at the gnuplot prompt or saved in a file and executed with: gnuplot file .

Here is how to plot a pair of curves:

set title 'Bessel Functions of the First and Second Kinds' set samp 1000 set xrange [-.05:20] set y2tics nomirror set ytics nomirror set ylabel 'Y0' set y2label 'J0' set grid plot besy0(x) axes x1y1 lw 2 title 'Y0', besj0(x) axes x1y2 lw 2 title 'J0'

The set ytics etc. commands create independent sets of tics and labels on the two vertical axes. The final line illustrates the usual form of gnuplot's 2D plot command, and some of the program's support for special functions. The axes parameters tell gnuplot what axis to associate with which curve, lw is an abbreviation for "linewidth" (gnuplot's default is pretty thin), and each curve has an individual title assigned, which is used in the automatically generated legend. The sequence of colors used to distinguish the curves is chosen automatically, but can, of course, be specified manually as well.

Gnuplot also excels at all kinds of 3D plots. Here is a surface plot with contours projected on the x-y plane. There is a vector field embedded in the surface as well.

set samp 200 set iso 100 set xrange [-4:4] set yrange [-4:4] set hidd front set view 45, 75 set ztics .5 set key off set contour base set style arrow 1 filled lw 3 lc 'black' f(x,y) = x**2+y**2 < 2.0 ? x**2+y**2 > 0.5 ? besj0(x**2+y**2) : NaN : NaN splot besj0(x**2+y**2), '++' using 1:2:(f($1,$2)):\ ( -.5*sin(atan2($2,$1)) ):( .5*cos(atan2($2,$1)) ):(0)\ every 4:2 w vec as 1

The set hidd front command has the effect of making the surface opaque to itself but transparent to the other elements in the plot. The set style command is an example of gnuplot's commands for defining detailed styles for lines, arrows, and anything else that can be made into a plot element. After this command is entered, arrowstyle 1 (or as 1 ) can be referred to wherever we want a black arrow with a filled arrowhead.

This script defines a function, f(x,y) , using gnuplot's ternary notation (with an embedded ternary form to implement two conditions) in concert with NaN s, to skip a range of coordinates when plotting. The function is used on the following line to plot the vector field over only part of the surface.

Two additional details may be worth noting in this example. First, in gnuplot, NaN (for "not a number") is a special value that you can use in conditional statements where you want to disable plotting, as we did here. You can also use "1/0" and some other undefined values, but using NaN makes the code easier to understand. Second, gnuplot's ternary notation is borrowed from C. In the statement

A ? B : C

B will be executed if A is true, otherwise C will be executed. In order to have two conditions, as we have here, B needs to be replaced by another ternary statement.

The splot command is the 3D version of plot . The part before the comma plots our Bessel function again, this time as a surface depending on x and y. The rest of it plots the vector field of a circular flow as an array of arrows originating on the surface. Vector plotting uses gnuplot's data graphing syntax, which refers to columns of data ( $1 and $2 instead of x and y ). There are six components per vector, for the three spatial coordinates on each side of the arrow. Finally, the every clause skips some grid points to avoid crowding, and we invoke our defined arrow style at the end.

LaTeX support

Gnuplot can integrate with the LaTeX document processing system in several ways. Most of these allow gnuplot to calculate and draw the graphic elements while handing off the typesetting of any text within the plot (including, of course, mathematical expressions) to LaTeX. This is desirable because, first, TeX's typesetting algorithms produce superior results, and, second, the labels that are typeset as part of the graph will harmonize with the text of the paper in which it is embedded. The results look like the figure here, which is a brief excerpt from an imaginary math textbook.

Notice that the fonts used in the figure labels and the text in the paragraph are the same — everything is typeset by LaTeX (even the numbers on the axes).

There is a two-step procedure to produce this result. First, we create the figure in gnuplot, using the cairolatex terminal:

set term cairolatex pdf set out 'fig3.tex' set samp 1000 set xrange [-4:4] set key off set label 1 '\huge$\frac{1}{\sqrt{2\pi}\sigma}\,e^{-\frac{x^2}{2\sigma^2}}$' at -3.5,.34 set label 2 '\Large$\sigma = 1$' at 0.95,.3 set label 3 '\Large$\sigma = 2$' at 2.7,.1 plot for [s=1:2] exp(-x**2/(2*s**2))/(s*sqrt(2*pi)) lw 3 set out

We've used LaTeX syntax for the labels. Running this through gnuplot creates a file called fig3.tex , which we include in the LaTeX document, listed in the Appendix.

The final step is to process the document with pdflatex . This is just one of several workflows for integrating gnuplot with LaTeX. If you use tikz to draw diagrams in your LaTeX documents, for example, you can extend it with calls to gnuplot from within the tikz commands.

Gnuplot and LaTeX share a family resemblance. They are both early open-source programs that demand a certain amount of effort on the part of the user to achieve the desired results, but that repay that effort handsomely. They're both popular with scientists and other authors of technical publications. Both programs are unusually extensively documented by both their creators and a cadre of third parties. And both systems, originating in an era of more anemic hardware, do a great deal with a modest amount of machine memory. Gnuplot has a good reputation for the ability to plot large data files that cause most other plotting programs to crash or exhaust the available RAM.

Analysis

Gnuplot can do more than just plot data and functions. It can perform several types of data analysis and smoothing — nothing like a specialized statistics platform, but enough to fit functions or plot a smoothed curve through noisy data. To illustrate, we first need to create some noisy data. The Appendix contains a little Python program that will write the coordinates of a Gaussian curve to a file, called rn.dat , with some pseudorandom noise added to the ordinates.

Suppose we are presented with this data and we want to fit a function to it. Since it looks bell-shaped to us, we'll attempt to fit a Gaussian. That kind of curve has two parameters, its amplitude and its width, or standard deviation. We could write a program to search the parameter space of these two numbers to optimize the fit of the curve to the data, or we could ask gnuplot to do it for us. Gnuplot's built-in fitting routine is invoked like this:

fit a*exp(-b*x**2) 'rn.dat' via a,b

After typing that command into gnuplot's interactive prompt, it will return its best guess for the free parameters a and b , as well as its confidence in its estimates. It also remembers the estimated values, so we can plot the fit function on top of the data:

plot 'rn.dat' pointtype 7, a*exp(-b*x**2) lw 5 lc 'black'

gets us this plot:

The pointtype specifier selects the style of marker used in the scatterplot of the data. There is a different list for every terminal type, which you can see by typing test at the gnuplot prompt. We've selected a thick line width ( lw 5 ) and a black line color ( lc 'black' ).

Gnuplot is endowed with some simple language constructs providing blocks, loops, and conditional execution. This is enough to do significant calculation without having to resort to external programs. Using looping, you can create animations on the screen. Try the following gnuplot script to get a rotating surface plot:

set term wxt persist set yr [-pi:pi] set xr [-pi:pi] end = 200.0 do for [a=1:end] {set view 70, 90*(a/end); splot cos(x)+sin(y); pause 0.1}

The first line tells gnuplot not to delete the window after the script is complete, which it will otherwise do if these commands are not run interactively. The last line contains the loop that creates the animation. The pause command adds a tenth of a second delay between each frame.

Conclusion

Gnuplot in the wild is not a rare encounter. Its output can be found in many of the math and science entries on Wikipedia; my article about calculating Fibonacci numbers; the book Mechanics by Somnath Datta, an example of a complex text with closely integrated intricate plots, using LaTeX and gnuplot; the book Modeling with Data: Tools and Techniques for Scientific Computing by Ben Klemens, using gnuplot’s latex terminals; and the free online text Computational Physics by Konstantinos Anagnostopoulos, just to give a few examples. In the system administrator field, check out the articles on benchmarking Apache, graphing performance statistics on Solaris, and using gnuplot with Dstat.

Gnuplot is a good choice if you have large data sets, if you prefer a language-agnostic solution, if you need to automate your graphing, and especially if you use LaTeX.

Comments (16 posted)