The conventional viewpoint among open-source projects is that drive-through contributors—meaning people who make one pull request, patch, or other contribution then are never seen again—are problematic. At best, one would prefer to lure the contributor back, eventually cultivating them into a regular project participant. At worst, they can be seen as a disruption, taking up developers' time for work that may, ultimately, lead nowhere. At OSCON 2016 in Austin Texas, however, Vicky "VM" Brasseur from HP Enterprise presented an alternative viewpoint. Drive-through contributors are a good sign of a healthy project, she said, and optimizing the project to meet drive-through contributors' needs benefits contributors of every stripe.

Brasseur noted at the outset that the desire to capture drive-through contributors and convert them into regulars was a good instinct, but said it was simply out-of-scope for her presentation. Instead, she wanted to explore what motivates drive-through contributors and see how projects can best make use of them. For the purposes of the talk, she explained, a "contribution" included anything that could be kept in version control, be it code, documentation, artwork, or anything else.

There are four major categories of drive-through contributor, she said: self-service contributors, work-related contributors, special-project contributors, and documentation fixers. Self-service contributors are those needing to get the project working for some other, larger purpose; perhaps they fix a bug that affects them, so they submit a patch for that fix and never return. Work-related contributors are similar, except that they are working with the open-source project at their day job, and have no other attachment to it: they will submit patches needed to get the job done, but do not get invested otherwise. Special projects include people enabling obscure hardware ports or supporting peculiar configurations; they care enough to get the project code working for that new scenario, but that is all. And documentation fixers are rather self-evident: they see a missing section or typo, send a fix, and consider the job done.

For projects, the interesting questions are why the contributor shows up at all and why they disappear. Many drive-through contributors tend to show up in the first place just to scratch their own itch (the self-service and special-projects contributor categories in particular). But there are other possibilities, she said: the contributor may have no choice but to use the project, even though they do not care for it, for instance. An example would be a developer who prefers PostgreSQL, but who is working on a MySQL project at their day job. When the job is complete, they will likely return to PostgreSQL. And, in some cases, drive-through contributors submit a patch simply because they like you: they notice something wrong and send a fix because they care about open source.

On the flip side, these contributors depart for a handful of reasons worth reflecting on, too. When the itch is scratched or the work project is finished, they may simply move on. But it is also common for contributors to start with good intentions and simply run out of time; "life happens," she reminded the audience. Drive-through contributors also sometimes move on because they find another project that suits them better. That, too, comes with the territory. Yet there are a few reasons for a drive-through contributor's departure that should worry the project: when contributors find it too difficult to work with the project, when they feel like their contribution was not appreciated, and when some project member treated them poorly.

The latter case is the "asshole problem," she said; it is a common criticism of open-source projects, but it is even more pointed when drive-through contributors are involved. Too many developers treat drive-through contributors with hostility, calling them a "waste of time" or words to that effect. These developers tend to claim that drive-through contributors are time sinks and that they are not "part of the community and never will be."

But that represents a misunderstanding, Brasseur said, of how "community" and "contributors" relate. Often, projects talk about community as the base of a pyramid, with contributors a smaller level above that and "core" developers at the top. In reality, she said, the contributor base (including drive-through contributors) is larger than the project community; it is the foundation of the project pyramid—as illustrated in the diagram to the left. "You can't build a community with no contributors—they come first."

Consequently, she said, an increasing number of drive-through contributors should be seen by the project as a positive sign. Conversely, a project with few drive-through contributions may not be as healthy as it thinks it is. Lots of drive-throughs means more people are seeing the project, more people are using the software, and that the process for making a contribution is easy and working well. Therefore, more people will make contributions (of every type) and the community can develop. Improving the number of drive-through contributions means more bugs are located and fixed, more documentation is written, test coverage is expanded, and releases can be faster. Furthermore, the project's reputation is likely to improve as well, with it being seen as friendlier and more accessible. Thus, while there are a lot of ways to measure "project health," she said, growing the number of drive-through contributions improves almost every metric.

She then turned to providing some advice on how to make drive-through contributions easier. About half of the methods revolved around documentation, she said. Better documentation cuts down on questions by providing potential contributors with the necessary information up front, and it standardizes processes across the project.

In particular, Brasseur recommended writing a "quick start" guide that offers a high-level summary of the contribution process, plus an in-depth "how to contribute" document that addresses the steps in detail: how to format patches, how to submit any contributor agreements required, how merges are approved, and so on. These documents minimize the number of "how do I start" and "what do I do next" questions that project members will have to field. She also recommended documenting the project's communication routes (i.e., who to contact on various topics) and a code of conduct. "Writing one won't kill you," she said about the latter. "It just shows people that you give a damn about them."

A few other documents worth creating include a "who's who" that explains leadership roles, subject-matter experts, and any sub-teams within the project, a "rules and processes" document, a jargon file, and a project roadmap. The "rules and processes" document should explain how someone becomes a core contributor, which can be quite inspirational for new contributors to see, as well as various bylaws and governance structures. The project roadmap helps new contributors by explaining the release schedule, the planned features, and what it will take to get a patch into a particular release.

Beyond documentation, she outlined several other methods for improving the contribution experience. Project members can mentor new contributors by doing code reviews, holding IRC "office hours" to answer questions, and by holding hackfests open to the public (perhaps even hackfests specifically geared toward new contributors). Projects can do things to improve their processes as well, she said. Suggestions include tagging "starter" bugs, providing pre-built containers or virtual-machine images of the development environment, and having a public "service level agreement" (SLA) for contributions. The SLA, she explained, means making a pledge that (for example) "we will look at each patch and respond to it within five days." That encourages newcomers by telling them that their effort will not be overlooked, and it sets expectations.

On a larger scale, she said, projects would be wise to cultivate a culture that values contributions and contributors. They can make sure that all contributions are credited in the release notes and Contributors file, they can "default to assistance" when they encounter a new contributor, and they can place a high value on documentation. "It is much easier to document as-you-go than it is to tackle a long list of documentation all at once."

Projects also need to create and enforce a "no assholes" rule, she said. "There's talk in the world about the unicorn 10X developer," she said. "But I don't care how many X's they have; if they act like an asshole they're bringing everybody down." Fortunately, she said, the majority of the time, people who treat others poorly and with hostility are not doing so intentionally—they only know that the rest of the project lets them continue acting the way they do. Most of the time, telling them why there is a problem and what they are expected to do next time is sufficient.

Finally, Brasseur advised projects to engage in outreach to contributors. They should express gratitude for contributions (including drive-through contributions), recognize each contributor, and follow up after the fact. Follow-up may include asking the contributor how their experience was, if there is anything about the process that could be improved, and (in the case of drive-through contributors) why they left. To be certain, not every drive-through contributor will—or even can—be cultivated into a regular project member. But, Brasseur said in closing, "all of the steps you take to maximize drive-through contributions also lead to a healthier project overall."

The session ended with a few questions from the audience; one person asked for examples of large projects that do a good job at the sort of documentation discussed in the talk. Brasseur replied that OpenStack does well in this regard, as do many Apache projects and the Django project. Another audience member asked how to encourage the drive-through contributors who leave for lack of time to reconsider. Brasseur echoed what someone else in the audience offered as a reply: perhaps the best thing a project can do is feel grateful that the drive-through contributor, even when busy, took some of their time to stop and make a contribution.

Comments (12 posted)

At OSCON 2016 in Austin Texas, Karen Sandler of the Software Freedom Conservancy (SFC) spoke about an issue that impacts an ever-growing number of free-software developers: employment agreements. As the number of paid contributors to free-software projects grows, so do the complications: copyright assignment, licensing, patents, and many other issues may be codified in an employment agreement, and a developer who fails to consider the implications of an agreement's conditions may be in for an unpleasant surprise years down the road.

Sandler kicked off the session by acknowledging that reading through agreements and contracts is boring stuff. "It's such a drag, I know. Legal stuff is boring, and this is boring even for me. But we have to do it." You only get one chance to sign your employment agreement, she said, but even if you only plan to stay for a year, the terms and conditions included can affect you and your ability to work on free software for many years to come. That is because an employment agreement not only establishes the relationship between the employee and the employer, but it establishes how the employee will make their contributions to free-software projects. For developers who care about their contribution to the free-software community, the details of the agreement can be significant.

We live in an age where we are constantly having to agree to more and more terms-and-conditions documents, Sandler said, to the point where no one has sufficient time to read them all. Employment agreements are different, though: you are not a consumer when you sign one, you are an employee. And your agreement is unique to you; it is not a blanket set of terms-and-conditions. Even if you sign a boilerplate agreement identical to everyone else's in the company, that class of "employees" is much smaller than the class of "consumers" or "customers" addressed by a public click-through license.

You therefore have—and should use—the power to negotiate with the company for what matters to you. Immediately after signing, a power shift occurs, but at the end of the hiring and interview process, the job candidate is in the best possible position to ask for changes to the agreement. Far too many developers never ask for any changes to their agreement, Sandler said, either assuming them to be non-negotiable documents or presuming that everyone in the company has the same agreement. Neither assumption is true; companies "ask for the world" up front, because it is in their best interest, but the clauses and conditions in an employment agreement are almost always malleable and should be just as much a part of the negotiation process as compensation.

Thus, she said, you are not "being paranoid" to read through a potential agreement. The best move is to have a lawyer review the agreement, but at the very least, educating yourself about the potential issues can enable you to spot areas of concern.

The first step is to evaluate your priorities, Sandler said. For many free-software developers, those priorities may be the licensing and patenting of code developed as part of the job, but the goal is for the potential hire to determine what is crucial and what is negotiable, then to examine the agreement. It is also vital, she said, to review all documents related to the potential job, because they may interact with each other. Other documents like contributor license agreements (CLAs) or copyright assignment documents may not be part of the employment agreement itself, but can make a big impact.

Provisions

Sandler then walked the audience through a list of key provisions to look out for. First and foremost is probably the licensing of the software created on the job. The assumption for the OSCON crowd is that the employee will be working on free software and, therefore, the agreement will state that the employee's work will be released under a free-software license, but Sandler reminded the audience that this needs to be in writing to be enforceable. A "general understanding" that the job will entail working only on free software is insufficient—what happens, she asked, if a new manager comes in, the original project is canceled, or if the whole company is acquired?

Furthermore, many free-software developers hired on at companies are hired to work on existing projects (perhaps even projects that they already contribute to). So it is important to verify that the licenses described meet the project community's expectations. Companies new to free software may have a misunderstanding about what licenses are acceptable. Some agreements might be drafted by someone who does not even realize that the employee is coming on board to work on free software, and default to an "everything is proprietary" clause.

A related concern is who owns the copyright on the employee's code. In proprietary software shops, this is not an issue, but more and more free-software developers are demanding that they personally retain their copyrights, she said. Contributions to free-software projects are important to building one's reputation and to accumulating a body of work to show other employers. In an era when many people change employers several times while working on the same code base, she said, it can impede a developer's career for an employer to be the copyright holder for some (or all) of the developer's software.

The scope of employment is another major provision to look out for, Sandler said. She referred the audience to a plot line in HBO's sitcom Silicon Valley that hinged on a company's draconian employment agreements claiming the rights to everything the employees create. In many jurisdictions, such clauses are regarded as unenforceable, but free-software developers may have a harder time establishing which jurisdiction they work in. They may work remotely, or even move around regularly—in which case the local laws, in addition to the jurisdiction expressed in the agreement, can come into play.

It is also vital to many free-software developers to establish that they can contribute to outside projects in their free time. "Exclusivity" clauses are therefore problematic. Furthermore, many employees may accept a given salary with the expectation that they will be allowed to consult on the side or do other freelance work; if the agreement claims exclusivity, this could be disallowed.

Along those lines, employment agreements should also be clear on the status of pre-existing code. If an exclusivity clause makes it impossible for the employee to fix bugs on software they have already written, it could be detrimental.

Free-software developers will also want the nature of public communications to be clearly defined. Because contributions are expected to be made in the open, the agreement should clarify when developers speak or post as themselves and when their communication is deemed to speak for the company. If the employer requires that blog posts, tweets, or emails must be approved by the company in advance, Sandler said, it is a good idea to ask what that process is like.

Patents are another serious topic to consider, since so many in the free-software community have an ideological objection to software patents. Sometimes there are hard-to-miss red flags, such as a "patent wall" at the entrance to the building, but the issue can be subtler. It is vital for concerned developers to ensure that the employment agreement is clear about whether they would be required to file for patents on inventions, or if they would be encouraged to through bonuses and promotions. If the company has a patent portfolio, the employee may also want to ask about its patent-licensing policy. Participation in something like the Open Invention Network (OIN) may be a positive sign—though, she cautioned, OIN's patent pool does not cover every patent, and it does not preclude other negative outcomes like patents being sold to third parties like patent trolls.

Last on the list, Sandler advised looking out for "non-compete" clauses, particularly those that bar "conflicting employment" and could prohibit the employee from working at the job that they consider their key skill set. Employees should push back, noting that an agreement that bars them from working as a software developer later could be ruinous to their career. Here again, companies almost uniformly ask for expansive terms in the agreement, but there is almost always room for bargaining.

Asking questions

It never hurts to ask questions or to ask for changes in an employment agreement, Sandler said. Typically there is room for compromise; she told the audience that she has never worked on an employment agreement negotiation where no changes were made. An audience member asked about non-compete clauses, which are often vaguely worded and broad. Sander replied that it is worth asking "what is it that you really want to prevent?" It may be that the company is only worried about losing employees to some specific competitor, in which case it might be worth renegotiating the clause to be more specific.

Just as there is no harm in asking the company "am I understanding this clause correctly?," Sandler said, there is no harm in asking for enough time to review the agreement in detail. Even if a company asks for an answer the same day, it will likely provide a few days if the developer indicates a desire to read over the provisions in detail. After all, once a job offer is made, the company has decided that it wants to hire the candidate.

She also told the audience to be sure that they keep track of employment agreements after signing them—even after they have moved on to another job, some provisions could still come into play. There have been developers, she said, who raised questions about copyright assignment long after the work was done, and not being able to produce a copy of the employment agreement as evidence can be a serious problem.

Although it is best to renegotiate the clauses of interest (or to get "riders" attached to the employment agreement), she said, sometimes it might not be possible to have changes made, and the informal agreement between the new hire and a manager about the potential job may be all that there is. In that case, she said, if the manager in question actually has the authority to make decisions about copyright assignment, licensing, and so forth, the best thing for the employee to do is to write down the understandings agreed upon and get a confirmation, in writing or email, from the manager that the two parties are on the same page. In may not have the same weight as a formal contract, but it is better than memory alone.

Moving forward

Employment agreements are still a rarely discussed topic in the free-software world but, as Sandler pointed out, the growth of commercial investments in free-software projects is making them more and more important. She ended the session with two tidbits about where matters may head in the future.

First, SFC is working on creating a set of resources for developers and companies to use when crafting employment agreements. Though not ready for publication yet, the concept is to publish a suite of "standard" clauses covering provisions of interest to the free-software world. They would make it easier for developers to propose the changes that they want in agreements, and could even be useful for companies in the long term. If the standard clauses became popular references, a company could specify that it offers "clauses two, five, six, and nine" unambiguously (a bit like the way Creative Commons licenses have standardized certain copyright clauses).

Finally, Sandler told the attendees that "we can create a culture shift" by actively pushing for the provisions that matter in our employment agreements. If even ten percent of developers asked to retain the copyrights on the software they create, Silicon Valley will take notice, she said. Allowing developers to hold their own copyrights might not become the default position in employment agreements, but companies will recognize that it has value and will begin offering it as a benefit. Software is a competitive business, she said; free-software developers have the ability to influence it by raising the ideological issues they care about when negotiating with new employers.

Comments (8 posted)

Preserving files for the long term isn't as easy as just putting them on a drive. As xkcd points out, in its subtle way, some other issues are involved. Will the software of the future be able to read the files of today without losing information? If it can, will people be able to tell what those files contain and where they came from?

Digital archives and libraries store files for future generations, just as physical ones store books, photographs, and art; the digital institutions have a similar responsibility for the preservation of electronic documents. In a way, digital data is more problematic, since file formats change more quickly than human languages. On the other hand, effective use of metadata lets a file carry its history with it.

For these reasons, detailed characterization of files is important. The file command just isn't enough, so developers have created a variety of open-source tools to check the quality of documents going into archives. These tools analyze files, reporting those that are outright broken or might cause problems, and showing how forthcoming or reticent the files are about describing themselves. We can break the concerns down into several issues:

Exact format identification: Knowing the MIME type isn't enough. The version can make a difference in software compatibility, and formats come in different "profiles," or restrictions of the format for special purposes. For instance, PDF/A is a profile of PDF that requires a file to have certain structural features but no external dependencies. PDF/A is better for archiving (which is what the "A" stands for) than most other PDF files.

Knowing the MIME type isn't enough. The version can make a difference in software compatibility, and formats come in different "profiles," or restrictions of the format for special purposes. For instance, PDF/A is a profile of PDF that requires a file to have certain structural features but no external dependencies. PDF/A is better for archiving (which is what the "A" stands for) than most other PDF files. Format durability: Software that can read any given format fades into obsolescence if there isn't enough interest to keep it updated. Which formats will fare best is a guessing game, but open and widely known formats are a safer bet than proprietary or obscure ones.

Software that can read any given format fades into obsolescence if there isn't enough interest to keep it updated. Which formats will fare best is a guessing game, but open and widely known formats are a safer bet than proprietary or obscure ones. Strict validation: Many software projects follow Postel's Law: "Be liberal in what you accept and conservative in what you send." Archiving software, though, stands on both sides of the fence. It accepts files in order to give them to an audience that doesn't even exist yet. This means it should be conservative in what it accepts.

Many software projects follow Postel's Law: "Be liberal in what you accept and conservative in what you send." Archiving software, though, stands on both sides of the fence. It accepts files in order to give them to an audience that doesn't even exist yet. This means it should be conservative in what it accepts. Metadata extraction: A file with a lot of identifying metadata, such as XMP or Exif, is a better candidate for an archive than one with very little. An archive adds a lot of value if it makes rich, searchable metadata available.

A number of open-source applications address these concerns, some of which we will look at below. Most them come from software developers in the library and preservation communities. Some focus on a small number of formats in intense detail; others cover lots of formats but generally don't go as deep. Some just identify files, while others pull out metadata.

JHOVE

JHOVE (JSTOR-Harvard Object Validation Environment) is the most demanding and obsessive of the lot. It covers a small number of formats in a nitpicking way, which is useful for making sure that software in the future won't have problems. It examines files exhaustively, analyzing them for validity, identifying versions and profiles, and pulling out lots of metadata. I worked on it for a decade, joining the project at the Harvard University Libraries in 2003, writing the bulk of the code, and continuing to support it after I left Harvard. It's now in the hands of the Open Preservation Foundation, which has just released version 1.14.

JHOVE is written in Java and is available under the GNU LGPL license (v2.1 or later). It includes modules for fifteen formats, including image, audio, text-based, and PDF formats. New in version 1.14 (and not yet listed in the documentation) are PNG, GZIP, and WARC.

Each module does extensive analysis on files, looking for any violations of the specification. A file that conforms to the syntactic requirements is considered "well-formed." If it also meets the semantic requirements, it's "valid." For instance, an XML file is well-formed when its tags are all properly matched and nested, etc., and it's valid when it matches its schema, if any.

The fallback format is "Bytestream," which is just a stream of bytes, in other words, any file. In the default configuration, JHOVE applies all of its modules against a file and reports the first one to declare it well-formed and valid. If no other module matches, it reports that the file is a Bytestream. It's also possible to run JHOVE to apply just a single module, for the format that a file is supposed to be. This is useful with defective files, since it will report how they aren't well-formed or valid. That's more helpful than simply declaring them Bytestreams.

If a file is valid, JHOVE will report the version of the format, any profiles that it satisfies, and lots of file metadata. The output can be in plain text or XML. The GUI version shows its output as an expandable outline and can save it as text or XML.

To examine a known TIFF file and get output in XML, the command might be:

jhove -m TIFF-hul -h xml example.tif

Other Java applications can call JHOVE through its API.

JHOVE is strict, but it isn't designed to examine the data streams in a file, only the file's structure. For instance, in an LZW-compressed TIFF file, it will check that all the tags are well-formed, including StripOffsets and StripByteCounts , but it won't check that the actual strips (i.e., the compressed pixel data) are well-formed LZW data. Thus, JHOVE will catch subtle errors, but it won't find all defects.

DROID and PRONOM

Archivists often have large batches of files to process and need a big picture of what they have: how many in each format, how many risky files, changes in format usage by year or month, how much older format versions are being used, and so on. This is where DROID shows its strength. It's available from the UK National Archive under the three-clause BSD license. Its main purpose is to screen and identify files as they're being ingested into an archive. It works with the National Archive's PRONOM database of formats, identifying files on the basis of their signature or "magic number."

In this regard, it's similar to file , but it performs finer-grained distinctions among formats. For example, within the TIFF format, PRONOM distinguishes the Digital Negative or DNG, which is a universal raw camera format based on TIFF, TIFF-FX for fax images, and Exif files, which are TIFF metadata without an image.

DROID is good at processing large batches of files. Analyzing them involves two steps. First the user "profiles" a set of files, collecting information on them into a single document. From the command line, the user can specify filters telling DROID which files to profile. Unfortunately, the filter language is difficult to figure out, and the documentation isn't as helpful as it might be, but fortunately there's a Google group where people can answer questions. The second step is to generate a report. One command can do both of these. Here's a relatively simple example with a filter that accepts only PDF files and generates a report as a CSV file.

droid.sh -p "result1.droid" -e "result1.csv" -F "file_ext contains 'pdf'"

Running DROID as a GUI application is easier. In this case, profile creation and report generation are separate steps.

DROID doesn't do much validation or metadata extraction, but it's strong on identifying the format of a file by looking at its signature. This is valuable when processing a large number of files for an archive and weeding out the files that aren't in suitable formats.

ExifTool

Phil Harvey's ExifTool has a different focus. Its specialty is fiddling with metadata and, in spite of its name, it knows about lots of metadata types, not just Exif. It can modify as well as view files, and it's adept at tricks like assigning an author to a group of files or fixing a timestamp that's in the wrong time zone. Its main interest for archivists is its ability to grab and report the metadata in files.

It's aware mostly, but not exclusively, of audio, image, and video formats. It does simple signature-based format identification, along with just enough validation to identify the metadata in a file. ExifTool is available under the Perl license.

It's a versatile piece of software with extensive scripting capabilities. Perl applications can use it through Image::ExifTool . Other code can use its command-line interface as an API, using the -@ and -stay_open options to feed it commands through standard input or an argument file. In addition, a library wraps the command-line interface for use in C++ programs.

ExifTool treats all file properties and metadata as "tags." A command can request specific tags or tag groups. The following command will return a file's type, MIME type, and usual format extension:

exiftool -filetype -mimetype -filetypeextension sample.png

The output for this would be as follows, assuming it's really a PNG file:

File Type : PNG MIME Type : image/png File Type Extension : png

A variety of export options are available, including HTML, RDF XML, JSON, and plain text. Output can be sorted, and some tags have formatting options.

Putting it all together: FITS

What if you want a second opinion on a file? Maybe even a third or fourth?

There are lots of free-software tools for file identification and metadata extraction, and space doesn't allow discussing all of them here. Others include MediaInfo, which extracts metadata from audio and video files, the National Library of New Zealand (NLNZ) Metadata Extraction tool, which specializes in a few archive-friendly formats, and Apache Tika, which extracts metadata from over a thousand formats.

All of these applications report different information, and they don't always agree with each other. Some produce more fine-grained identification than others, and some are fussier than others about whether a file is valid. It's desirable to use more than one tool, in case one of them doesn't handle certain cases well. The Harvard Library's File Information Tool Set (FITS) allows using a dozen different tools together.

FITS originally served as a gatekeeper for Harvard's Digital Repository Service (DRS), and it still does. Other institutions now use it too. I worked only briefly on FITS, but my efforts played a significant role in moving it from a Harvard-only tool to one with a larger user and support community. It is available under the LGPLv3.

DROID, ExifTool, and JHOVE are all parts of the repertoire of FITS. So are Tika, file , MediaInfo, the NLNZ Metadata Extractor, an unsupported but still sometimes useful tool called ffident, and several in-house tools.

For all its complexity, running FITS is fairly simple. Here's the simplest useful command, which simply processes the given file with all of the different modules:

fits -i sample.png

Combining all the tools is tricky for several reasons. They're written in different languages; FITS is in Java, and it invokes non-Java software such as ExifTool through the command-line interface. Their output is in a variety of formats and each tool uses its own terminology.

Where the component tools can produce XML, FITS uses XSLT to convert it to "FITS XML," and then consolidates the outputs into a single XML file. Optionally, it will convert FITS XML to metadata schemas that archives and libraries commonly use, such as MIX, TextMD, and AES Audio Object.

Often the tools won't completely agree about the file, and FITS tries to do conflict resolution. The identification section of the FITS XML output lists the tools that identified the file; if they disagree, it will have the attribute status=CONFLICT . Those who just want one answer can select an ordering preference for the tools and set the conflict reporting configuration element to false . The first tool to give an answer wins.

Because FITS incorporates so many tools, each of which has its own development cycle, into a single application, it's a complicated piece of software to manage. Sometimes it has to stay with older versions of tools until the developers can fix FITS to work with the latest version of the tool.

Final thoughts

Identifying formats and characterizing files is a tricky business. Specifications are sometimes ambiguous. Practices that differ from the letter of the spec may become common; for instance, TIFF's requirement for even-byte alignment is deemed archaic. People have different views on how much error, if any, is acceptable. Being too fussy can ban perfectly usable files from archives.

Specialists are passionate about the answers, and there often isn't one clearly correct answer. It's not surprising that different tools with different philosophies compete, and that the best approach can be to combine and compare their outputs.

Comments (2 posted)

Monday, May 30, is the Memorial Day holiday in the US. Here at LWN, we'll be taking the day off to tune up our gas grills, mow the lawn, drink beer, or whatever else it is that we do when we're not trying to keep up with what the community is doing. As a result, next week's edition will be published one day later than usual, on June 3. We'll be back to the usual schedule the following week.

Comments (none posted)