Effective scientific electronic publishing

Markus G. Kuhn, Computer Laboratory, University of Cambridge

This is a brief list of recommendations for authors of scientific papers who make their work available online. It focuses in particular on producing high-quality PDF files with LaTeX and covers some other technical and typographic pitfalls.

Contents

Be consistent with how you write your name

Choose an exact spelling of your name at the start of your scientific career and use that and only that on all your publications. Do not change any part of your name. If you have a middle initial in your name, then either use it always (preferred) or use it never, but avoid switching between the two possibilities. Otherwise, you will get sorted in bibliographic databases (Science Citation Index, etc.) under various different places like J DOE and JA DOE, which makes it more difficult to locate your work.

Use the LaTeX styles suggested by the conference organizers

If the conference proceedings will be published by Springer as Lecture Notes in Computer Science, then use the latest LLNCS LaTeX2e macro package provided by Springer to format your camera-ready copy. Read the authors’ instructions carefully.

Make sure your online version has page numbers and reference information

Camera-ready submission formats required by publishers often lack page numbers or an indication of where this paper was published, because the publishers want to add this information themselves. If you put this camera-ready version on the Web, then people will print it out and forget where they downloaded it. If they then can’t find the reference information on the paper, they will not be able to quote your paper properly.

Therefore, your own online version should differ from the submitted camera-ready copy in these two aspects. The page numbers should be switched on and the precise bibliographic reference of your paper should be included. Preferably put the reference information at the bottom of the first page, in a way that does not change the page breaks compared to the submitted camera-ready copy.

Update your online copy once you receive all the precise metadata (page numbers, ISBN, publication date, etc.) of the final published paper version.

Users of Springer’s LLNCS style can use the package butterma.sty to add the bibliographic reference to the first page of the online version. To make this work, the file should start like \documentclass[runningheads]{llncs} \usepackage{butterma} \idline{J.~Doe and E.~Muster (Eds.): Perfect Publishing, LNCS 9999} \setcounter{page}{101} %\renewcommand{\year}{1999} % just if you don't want the current year ... \maketitle \thispagestyle{electronic} where of course the text after \idline has to be replaced with your reference and the page number after \setcounter{page} has to be adjusted to your first page number. The line \thispagestyle{electronic} which follows \maketitle suppresses the page number on the first page, which was activated by the [runningheads] in the first line. If your title is too long to fit into the running heads, then you should provide a shorter one using \titlerunning and \authorrunning . Special thanks to Antje Endemann from Springer for this description. My variant prebutterma.sty, which only adds the \idline value at the bottom, can be used to annotate preprints that are not yet published in LNCS and therefore do not yet need a Springer copyright notice and a page-number range.

If you make a paper that you submitted to a publisher available online, then read the publisher’s copyright conditions carefully. Most scientific publishers now allow you to have your paper on your Web page, but some require you to add a special copyright notice.

The first printed page should be page number 1

International Standard ISO 7144 (“Documentation — Presentation of theses and similar documents”):

“The numbering of pages shall run consecutively, including blank pages, also if a thesis is published in several volumes, in arabic numerals, beginning on the recto of the first printed leaf. The title-leaves are counted but not numbered.”

If your document comes with a table of contents or index and the printed version is usually bound separately (e.g., a thesis, technical report, manual, book), then it is very convenient if the page numbers printed in the document match exactly the page numbers displayed by an electronic document viewer, such as Adobe Reader or ghostview. This is most easily achieved, if, starting from the title page (the front of the first page that comes out of the printer), all pages are numbered consecutively in Arabic numerals (1, 2, 3, ...). The LaTeX “article” and “report” styles do this. Avoid separate Roman numerals for front matter (the LaTeX “book” style uses these by default). Should the thesis presentation regulations of your institution disagree, you may want to make those who wrote them aware of ISO 7144.

Use PDF as the distribution format for your online version

Adobe’s Portable Document Format has long become the preferred format for publishing formatted documents. PDF has several advantages over the more traditional Adobe PostScript format:

PDF was specifically designed as a document distribution and archive format, while PostScript is just a printer control language. PDF is more portable.

PDF viewers and printing tools such as the Adobe Reader, GhostScript, xpdf, and kpdf are freely available. They are much more widely deployed and user friendly than PostScript printers and viewers (especially on Microsoft platforms).

The Shrink to fit function of Adobe Reader provides an elegant workaround for the non-ISO paper size problem with which North Americans continue to plague the world.

PDF has built-in per-page compression, therefore it is not necessary to use extra compression and packaging tools such as PKZIP.

necessary to use extra compression and packaging tools such as PKZIP. PDF allows authors to include URL hyperlinks (accessible from TeX via HyperTeX and the hyperref LaTeX macros).

PDF encodes photos very compactly using the JPEG algorithm, whereas they increase the size of PostScript Level 1 files tremendously.

Some PDF Web-browser plugins support fast direct download of individual pages in documents.

PostScript can be converted to PDF with the “ps2pdf” tool included with ghostscript, or with Adobe’s Acrobat Distiller product.

Please do not package PDF files into ZIP files. They are already compressed. Put them directly on your web server. Applying PKZIP in addition will not reduce the size significantly, but it will render the convenient PDF plug-ins of web browsers useless. Make sure that your web server serves PDF files with the line “ Content-Type: application/pdf ” in the HTTP header.

PdfTeX is a version of TeX that can produce both DVI and PDF files as output. It knows a number of additional commands to control the PDF output (adding URLs, embedding graphics, etc.). Pdftex does not allow to embed EPS files as this is possible with dvips, but EPS files can be converted into PDF using ghostscript and the epstopdf script that comes with tetex. One good way of including diagrams into TeX documents is to use xfig or for more complicated cases MetaPost in order to generate an embedded PostScript file. Both these tools allow you to include mathematical formulae into diagrams that will be typeset by LaTeX (in xfig, export to "Combined PS/LaTeX (both parts)" to get a pair of pstex/pstex_t files).

I’ll discuss below some of the more important issues of generating PDF with TeX.

Some related information can be found in:

Use Type1 vector fonts for generating PDF files with TeX

TeX (and LaTeX) traditionally used raster-graphic fonts produced by Metafont for a specific device resolution. Dvips originally produced PostScript files containing 300 or 600 dpi raster fonts, and so did the PDF files converted from that by ps2pdf or Acrobat Distiller. PDF viewers do usually a rather bad job when displaying device-dependent “Type3” raster fonts. Texts in raster fonts are displayed slow on the screen and with no or suboptimal anti-alias filtering. Also, the “Type3” raster fonts inserted by dvips lack information about which character each glyph represents, which interferes badly with full-text search and copy&paste.

You can check whether the output of dvips contains any Type3 raster/bitmap fonts under Unix with the command

grep '%DVIPSBitmapFont:' file.ps

This should produce no output if there are no bitmap fonts.

Instead of Type3 (raster graphics) fonts, make sure any Postscript file that you produce for conversion into PDF uses only resolution-independent Type1 vector fonts.

Fortunately, a consortium of AMS, SIAM, IBM, Springer, Elsevier, BlueSky Research, and Y&Y Inc. arranged to make commercial high-quality PostScript Type1 versions of both the Computer Modern fonts and AMS fonts for TeX freely available under the copyright of AMS.

Dvips has used these resolution-independent TeX fonts by default for a few years now. If you still use some pre-2005 version of dvips, you may have to use special command-line options to get the desired Type1 fonts,such as

dvips -Ppdf -G0 ...

Or better upgrade to a more recent TeX distribution. [The -G0 was a workaround for an old bug in dvips that caused ligatures to disappear in some fonts, which also got fixed in more recent versions.]

Make sure you configure the distiller to the “Subset fonts below 100%” option. This will ensure that only fonts for which 100% of all characters are used in the document are included completely and the distiller will remove font data for all unused characters from your PDF file. This will keep your PDF files small.

When you want to convert to PDF historic PostScript files that were produced with Computer Modern bitmap fonts, then try the pkfix tool to replace these fonts in the PostScript with their Type1 equivalents.

Set the information fields of the PDF file

In PDF files, you can store the title, authors, and keywords of a paper in special information fields. This information can help search engines to locate and present your paper more accurately. There are several ways to set this information:

You can manually add it using the Adobe Acrobat product.

You can add via dvips a special PostScript command that will instruct the distiller or ps2pdf tool to set these fields correctly. The direct way to do this is by adding somewhere near the beginning of a TeX or LaTeX document code like \special{! /pdfmark where {pop} {userdict /pdfmark /cleartomark load put} ifelse [ /Author (Markus G. Kuhn, Ross J. Anderson) /Title (Soft Tempest: hidden data transmission using electromagnetic emanations) /Keywords (compromising emanations, data security, eavesdropping) /DOCINFO pdfmark} to store the author, title, and keywords in the PDF file automatically.

Under LaTeX, the hyperref macros (see the hyperref manual), allow you to do the same in a slightly more convenient form via package options, as in \usepackage[pdftitle={Soft Tempest: Hidden Data Transmission Using Electromagnetic Emanations}, pdfauthor={Markus G. Kuhn, Ross J. Anderson}, pdfkeywords={compromising emanations, data security, eavesdropping}]{hyperref} Warning: LLNCS style file version 2.14 (2004-08-14) contained a bug that changes the figure/table caption font size slightly when hyperref is loaded, fixed in more recent LLNCS versions.

With pdftex, you can also use \pdfinfo{/Author (Markus G. Kuhn, Ross J. Anderson) /Title (Soft Tempest: Hidden Data Transmission Using Electromagnetic Emanations) /Keywords (compromising emanations, data security, eavesdropping)}

Use the paper format of the printed version in the PDF file

PDF files look best on the screen if the specified paper size matches the one for which the layout was designed. Therefore, use the actual physical paper size of the published document in the PDF file. When a PDF file is printed, the page will always be centered, and if the Shrink oversizes pages to paper size or Expand small pages to paper size function is used it is also guaranteed to fit the output paper size.

For users of the LNCS style: the paper size is 152 mm × 235 mm and correct alignment of the output relative to the upper left corner can be achieved by instructing distiller to ps2pdf to use the CropBox parameters [92 112 523 778] .

British Standard BS 1413 defines a book page size of 156 mm × 234 mm called “Metric royal octavo”. This is the best clue I have found so far on where the LNCS format might have come from. Springer actually use in their own LNCS online PDFs the paper format 155 mm × 235 mm (439.37 pt × 666.142 pt), which seems the same size, just rounded differently.

There are several ways to achive this:

You can directly insert a special command into the DVI file right after \begin{document} : \special{! /pdfmark where {pop} {userdict /pdfmark /cleartomark load put} ifelse [ /CropBox [92 112 523 778] /PAGES pdfmark} % LNCS page: 152x235 mm This line causes dvips to insert a pdfmark instruction into the PostScript output, which will set in the distiller the page size correctly.

: If you use the hyperref macros in LaTeX, you can achieve the same more elegantly via the package options \usepackage[pdfpagescrop={92 112 523 778},a4paper=false]{hyperref} Warning: The current LLNCS style file (version 2.14, 2004-08-14) still has a bug that changes the figure/table caption font size slightly when hyperref is loaded.

In pdftex, you can also use the command \pdfpagesattr{/CropBox [92 112 523 778]} % LNCS page: 152x235 mm

Use appropriate graphics formats for figures

Choose appropriate formats for included graphics. In particular:

Whenever possible, use resolution-independent vector graphic formats (e.g., EPS, WMF) for diagrams and line drawings such as

block diagrams;

flow charts;

function plots;

circuit diagrams;

diagrams that include text labels.

Use pixel-based raster-graphic formats (PNG, JPEG, TIFF, GIF) only for figures in which the original source information has a fixed resolution, such as

screen shots;

photos;

scanned paper documents;

computer-generated raster graphics.

Before considering lossy compression formats, such as JPEG and GIF, for presenting scientific data, make sure you understand exactly what information their encoders throw away.

In particular:

JPEG was only designed for storing photos that will be viewed unmodified by humans.

Avoid (DCT or wavelet) transform-based photo-compression algorithms, such as JPEG, for storing non-photographic or computer-generated material (e.g., screen shots). These formats are not suitable for storing images that contain only a small number of distinct colours, large areas of identical color and distinct edges.

Avoid lossy compression formats such as JPEG if your image represents authentic scientific data that readers may want to inspect closely with image-processing functions (zooming in, contrast enhancement, etc.).

Avoid GIF, unless you know that the image contains fewer than 256 different colors.

Warning: Normally, distiller and ps2pdf will apply the DCT-JPEG compression to any colour and grayscale raster image that they encounter in the input PostScript file. In many scientific publications, especially those related to image processing and compression, this JPEG compression can introduce unacceptable artifacts that distort the meaning of the image. You can avoid this by processing the output of pnmtops with my sed script nojpeg.sed, which adds a setdistillerparams command to the generated EPS file that deactivates JPEG compression in the distiller for this image only.

To use nojpeg.sed in the Makefile described in the next section, simply use the replacement macro

PNMTOPS=pnmtops -rle -noturn -nosetpage | sed -f nojpeg.sed

Use good software engineering for the document sources

Ensure that your document preparation becomes a traceable and repeatable process, just as you should have learned to do with software (think ISO 9000).

Use a revision control system like Subversion (manual) or RCS in order to keep track of old revisions and prepublications. Make sure that you can always regenerate the source of any version of your paper that you ever have given away.

Document-preparation systems that use plain-text document formats, such as LaTeX, are particularly easy to use with version-control software. You may find that the diff and merge functions work best if you start a new line after each sentence.

If you work on a paper with coauthors, then cooperate via the revision control system and not by sending around new revisions via email. This avoids confusions and makes sure no changes are accidentally lost.

If you use other TeX macro packages such as llncs.cls in your document, always add the version number and release date as a comment to where you include the macros. Keep all revisions of macro packages that you use in a revision control database as well.

To include the file name and the RCS revision of your current draft on every printout, use the rcs.sty or rcsinfo.sty package, which extracts the relevant information from the $Id: ...$ strings that version-control tools can update automatically.

or package, which extracts the relevant information from the $Id: ...$ strings that version-control tools can update automatically. Write a Makefile. If you used other software to generate embedded postscript files that you use in your TeX file (for instance fig2dev, gnuplot, etc.), then add rules for how this software is called to your Makefile. Your Makefile should be able to regenerate automatically all intermediate files from the earliest processing step involved. A typical Makefile for a paper with various embedded images will contain production rules such as: .DELETE_ON_ERROR: %.pdf %.aux %.idx: %.tex pdflatex $< while grep 'Rerun to get ' $*.log ; do pdflatex $< ; done %.ind: %.idx makeindex $* %.bbl: %.aux bibtex $* %.pdftex %.pdftex_t: %.fig fig2dev -L pdftex_t -p $*.pdftex $< $*.pdftex_t fig2dev -L pdftex $< $*.pdftex all: clean: rm -f *~ *.log *.bak *.aux *.toc *.blg *.bbl *.pdftex *.pdftex_t

If you include bitmap images, then use the JPEG format only for photos from a camera or scanner. For line drawings or computer-generated plots never use JPEG. For the latter, preferably use a vector format such as PDF or EPS, or where this is not practical (e.g., computer-made screenshots), use the lossless PNG format instead.

Archive the source plus PDF version of your paper in a long-term format. Standards such as tar, gzip, ISO 9660, PDF, TeX, and CD-R are today well documented and so widely deployed that people will most likely still be able to read them without major problems in 100 years from now. Eventually write your archive onto a CD-R, preferably the silver or gold looking ones using the phthalocyanine dye, which should keep the data intact for many decades, if not centuries. Remember that magnetic media lasts hardly longer than 5–10 years. Ensure that your institution has a long-term archive concept for the source and final formatted version of all publications.

Use filenames that are meaningful in a broader context

It is a good idea to include an indication of where the paper is published (abbreviation for the conference or journal) and a most significant title word in the filename. For instance ih98-tempest.tex is much more useful then just paper.tex . Plan your filenames such that you and all your local colleagues can have them nicely together in a single public directory. No filename should be longer than 25 characters; preferably keep them at less than 15 characters. Use only lowercase US-ASCII letters, digits, hyphens, and a dot (only for the extension).

If you publish a HTML version of your paper, then please check not only whether it displays nicely with your current browser, but also send it through an SGML parser that grammatically validates your HTML syntax against the HTML 4.01 document type definition. A validation service is available for instance from W3C, or you can easily install your own using nsgmls. Also perform a link check from time to time, as URLs are unfortunately not very stable.

Typographic conventions

Professional typesetting works slightly differently from using typewriters or ASCII email. Make sure that you are well familiar with these conventions. Lamport’s LaTeX User’s Guide provides a very brief introduction is section 2.2.1. In particular, make sure you are aware of

how to use directional quotation marks;

the differences in shape and use of hyphen (-), minus (−), en-dash (–) and em-dash (—);

the fact that the default font used by TeX’s math mode was designed only for use with single-letter variables and must not be used for writing words or multi-letter abbreviations;

When using BibTeX, understand that it tries to change the capitalization of titles to lowercase unless a word is protected by surrounding {}. Therefore, protect all proper nouns (names) and abbreviations in this way in your BibTeX file.

Here are some more typographic conventions that you may want to consider:

Capitalization of headlines. In the United States, it is a common practice to capitalize the first letter of more words in headlines and titles than in normal sentences. The style guides and author’s instructions of U.S. publishers, such as the IEEE, require this. On the other hand, in Britain, in many other English-speaking countries, and in many international organizations, professional typographers use in headlines exactly the same capitalization rules as in normal sentences, namely only the first word and proper nouns or abbreviations are capitalized. I personally strongly prefer the British convention. It preserves more information (which word is a name) and causes far fewer problems with bibliographic databases (like BibTeX), where the unnecessary U.S.-style capitalization has to be removed for most bibliographic-reference styles. My advice is to follow the requirements of publishers, but if there are none, do not perform any unnecessary capitalization in titles. In any case, always remain consistent within a single document.

In the United States, it is a common practice to capitalize the first letter of more words in headlines and titles than in normal sentences. The style guides and author’s instructions of U.S. publishers, such as the IEEE, require this. On the other hand, in Britain, in many other English-speaking countries, and in many international organizations, professional typographers use in headlines exactly the same capitalization rules as in normal sentences, namely only the first word and proper nouns or abbreviations are capitalized. I personally strongly prefer the British convention. It preserves more information (which word is a name) and causes far fewer problems with bibliographic databases (like BibTeX), where the unnecessary U.S.-style capitalization has to be removed for most bibliographic-reference styles. My advice is to follow the requirements of publishers, but if there are none, do not perform any unnecessary capitalization in titles. In any case, always remain consistent within a single document. Quantities and units. There are well-established rules for typesetting units of measurements, which are described, for example, in NIST SP 811 and ISO 31-0. In particular: numbers and unit symbols are separated by a no-break space; unit symbols are never written in italics , to distinguish them from variables for physical quantities, which are in italics; indices of variables are only written in italics if they represent another variable, but not if they are just an abbreviation of a word; symbols for SI units have well-defined capitalization rules (prefix symbols are uppercase from mega upwards and lowercase from kilo downwards, unit symbols start with an uppercase letter only if the unit was named after a person). Bad example: v max =120 Kph Good example: v max = 120 km/h

There are well-established rules for typesetting units of measurements, which are described, for example, in NIST SP 811 and ISO 31-0. In particular:

Special thanks to Robin Fairbairns and Lars Engebretsen for useful suggestions.

Further suggestions for this text are very welcome! Just mail me.



Creative Commons Attribution

4.0 International License. This work is licensed under a

Markus Kuhn