Answering my own question after much research, experimentation, and testing. stevem 's pointer to to the Mac OS X TeX Toolbox approach (store the TeX snippet in a box and write the height, width, and depth to a file) was the crucial key to the puzzle. I followed that approach, made some adjustments and additions, and came up with a solution that is not only pixel-perfect but is also subpixel-perfect, and also holds up under magnification.

First, a screenshot demonstration of the results before discussing the technique. The following paragraph is, by design, a very ugly mess. However, the baselines of all the rendered TeX snippets do all line up properly—which is the goal:

I don’t actually use Times Roman in my HTML pages—I use HTML/CSS versions of TeX’s Latin Modern fonts—but I wanted to make the transitions between paragraph text and embedded TeX snippets here visually obvious.

Technique

Doing this correctly is not easy. There are many places where subtle errors can be introduced—especially if shortcuts are taken. Correct vertical alignment cannot be a single-step process. To achieve proper baseline alignment, it is necessary to measure, pad, re-measure, crop, re-measure, re-crop, re-measure, and finally re-pad the image before downsampling.

Here are the fundamental steps:

Write the (La)TeX snippet to a file, encapsulated by specially designed preamble/postamble which will cause TeX to write the width, height, and depth (in TeX points) of the snippet to a file. This preamble uses the geometry package to specify a specific page size with enough margin padding (I use 4pt) to avoid clipping anomalies with glyphs whose physical size exceed their virtual size (a very common occurrence). Invoke pdflatex to compile the TeX file to PDF. Invoke gs (Ghostscript) to convert the PDF to a PNM image. Specify 4-bit anti-aliasing and a DPI representing 16x oversampling. The exact value of the DPI is not obvious and works out to 1850.112 dpi. (Ghostscript does take fractional DPI values on the command line.) I’ll explain the derivation of this number later below. Read the width and height from the PNM image and determine the actual snippet depth in pixels, using the image height and the dimensions written by TeX. Crop whitespace from only the bottom of the image and re-measure the new height of the image. The difference is crucial later in calculating the exact proper value for the vertical-align property of the <img> . Now crop whitespace from the top and sides of the image. Now prepare to pad the image again with whitespace. This time, however, we won't add 4pt of padding (which would be a lot), but only just enough to round up the size to a multiple of 16 pixels, so that downsampling is perfect. First calculate the bottom padding amount as the snippet depth plus the page margin minus the amount cropped from the bottom, and round this up to the next multiple of 16. This does not necessarily result in an overall image height that is a multiple of 16; only the distance from the baseline to the bottom will be. Thus, we still need to pad the top, so compute that value after the bottom padding is added, and set the top padding such that overall image height will be a multiple of 16. Padding the left and right is easier: just calculate the new image width to be the next higher multiple of 16 and divide the difference between the two sides. Now pad the image with whitespace using the precise values calculated in the previous step. The size of the resulting image will be a multiple of 16 in both dimensions, and has the additional property that the baseline of the text is also an exact multiple of 16 pixels from the bottom of the image. OK, now all the hard work is done. The rest is easy. Downsample the PNM file by a factor of 4 (not 16!). Darken this slightly using a gamma curve adjustment. Convert the result to PNG. Read the PNG image and encode it as base-64 data inside an <img> tag for direct embedding in the HTML file. Set the height= and width= attributes to 1/4 of the size of the PNG image (which is 1/16 of the original image). This will cause the web browser to scale the image down on-the-fly to 1/4 actual size, but will also allow the user to magnify the font size in the web page and still have the TeX snippets look great. Set the vertical-align: property of the style= attribute to be the negation of the padded snippet depth divided by 16. This will raise or—more commonly—lower the image below the text baseline when the paragraph is rendered on the HTML page.

Those are the fundamental steps. The details are a bit more subtle, so I’ll include at the bottom of this answer a Perl program which converts arbitrary blocks of text with $ -delimited TeX snippets.

Why 1850.112 dpi?

The number 1850.112 is (96 × 12 ÷ 10) × (72.27 ÷ 72) × (4 × 4).

96 is the screen dpi that modern web browsers assume.

12÷10 is the ratio 12pt/10pt. Typical default font size in modern web browsers is 12pt, versus the TeX default of 10pt (which is in use in the snippet template).

72.27÷72 is the ratio of TeX points to HTML points. This ratio is very close to 1, but without it there will be an error of approximately 1 pixel per 300 pixels.

4×4 is the oversampling factor. The first 4 is for oversampling at the rendering step (PDF-to-PNG) and the second 4 is for oversampling at the display step (on-the-fly image scaling in the browser).

You could probably get away with omitting the 72.27/72 factor without anyone noticing (this would give 1843.2 dpi instead of 1850.112 dpi), but the important thing is not to settle for some arbitrarily chosen dpi like 1200 or 600. Good results depend on integer-multiple downsampling, and that means telling Ghostscript whatever weird dpi should happen to be necessary to make that happen.

Wait, really?

Yup. An in fact, the 96 × 12 ÷ 10 portion is actually 96 × ((16 ÷ 96 × 72) ÷ 10). Here is the full derivation, with units:

96 Hpx/in × ((16 Hpx ÷ 96 Hpx/in) × 72 Hpt/in) ÷ 10 Tpt) × (72.27 Tpt/in ÷ 72 Hpt/in) × (4 Ppx/Hpx × 4 Rpx/Ppx)

where Tpt is TeX points (1/72.27 in), Hpt is HTML points (1/72 in), Hpx is HTML pixels, Ppx are PNG pixels, and Rpx and rendering pixels.

This reduces to:

96 Hpx/in × (16 ÷ 96 × 72 ÷ 10 Hpt/Tpt) × (72.27 ÷ 72 Tpt/Hpt) × (4 × 4 Rpx/Hpx)

or:

96 × 16 ÷ 96 × 72 ÷ 10 × 72.27 ÷ 72 × 4 × 4 Hpx/in Hpt/Tpt Tpt/Hpt Rpx/Hpx

Cancelling out terms and units gives:

1850.112 Rpx/in

or in other words 1850.112 dpi. Note that this is 115.632 dpi with 16x oversampling.

Stepping through font sizes from smallest to largest

Here is the same page from above but now shown in different font sizes. This is Safari on Mac OS X. The page was loaded with default settings, then Command - and Command + were used to shrink and grow the text size. The baseline alignment is correct at all sizes.

Program for automation of this technique

Below is a Perl program which converts an input paragraph of text containing $ -delimited TeX snippets to an HTML page with embedded PNG images. It is assumed that you have Ghostscript the PNM Tools.

#!/usr/bin/perl -w #============================================================================== # # CONVERT SIMPLE PLAIN TEXT TO HTML WITH TEX MATH SNIPPETS # # This program takes on standard input a simple text file containing TeX # arbitrary math snippets (delimited by '$'s) and produces on standard # output an HTML document with PNG images embedded in <IMG> tags. # # This program demonstrates conversion techniques and is not intended for # production use. # # Todd S. Lehman # February 2012 # use strict; #------------------------------------------------------------------------------ # # RUN EXTERNAL COMMAND VIA BOURNE SHELL # sub run_command (@) { my $origcmdline = join(" ", grep {defined} @_); return if $origcmdline eq ""; my $cmdline = $origcmdline; $cmdline =~ s/(["\\])/\\$1/g; $cmdline = qq{/bin/sh -c "($cmdline) 2>&1"}; my $output = `$cmdline`; my ($exit_value, $signal_num, $dumped_core) = ($?>>8, $?&127, $?&128); $exit_value == 0 or die "FAILED: $origcmdline

" . " \$! = $!

" . " \$@ = $@

" . " EXIT_VALUE = $exit_value

" . " SIGNAL_NUM = $signal_num

" . " DUMPED_CORE = $dumped_core

" . " OUTPUT = $output

"; return $output; } #------------------------------------------------------------------------------ # # ROUND NUMBER UP TO THE NEXT HIGHER MULTIPLE # sub round_up ($$) { my ($num, $mod) = @_; return $num + ($num % $mod == 0? 0 : ($mod - ($num % $mod))); } #------------------------------------------------------------------------------ # # FETCH WIDTH AND HEIGHT FROM PNM FILE # sub pnm_width_height ($) { my ($filename) = @_; $filename =~ m/\.pnm$/ or die "$filename: not .pnm"; open(PNM, '<', $filename) or die "$filename: can't read"; my $line = <PNM>; # Skip first line. do { $line = <PNM> } while $line =~ m/^#/; # Read next line, skipping comments close(PNM); my ($width, $height) = ($line =~ m/^(\d+)\s+(\d+)$/); defined($width) && defined($height) or die "$filename: Couldn't read image size"; return ($width, $height); } #------------------------------------------------------------------------------ # # COMPILE LATEX SNIPPET INTO HTML # # This routine caches results in the /tmp directory. Snippets are named and # indexed by their SHA-1 hash. # sub tex_to_html ($$) { my ($tex_template, $tex_snippet) = @_; my $render_antialias_bits = 4; my $render_oversample = 4; my $display_oversample = 4; my $oversample = $render_oversample * $display_oversample; my $render_dpi = 96*1.2 * 72.27/72 * $oversample; # This is 1850.112 dpi. # --- Generate SHA-1 hash of TeX input for caching. (my $tex_input = $tex_template) =~ s{<SNIPPET>}{$tex_snippet}; my $hash = do { use Digest::SHA; uc(Digest::SHA::sha1_hex($tex_input)); }; my $file = "/tmp/tex-$hash"; # --- If the image has already been compiled, then simply return the # cached result. Otherwise, continue and create the image. if (open(HTML, '<', "$file.html")) { my $html = do { local $/; <HTML> }; close(HTML); return $html; } # --- Write TeX source and compile to PDF. open(TEX, '>', "$file.tex") and print TEX $tex_input and close(TEX) or die "$file.tex: can't write"; run_command( "pdflatex", "-halt-on-error", "-output-directory=/tmp", "-output-format=pdf", "$file.tex", ">$file.err 2>&1" ); # --- Convert PDF to PNM using Ghostscript. run_command( "gs", "-q -dNOPAUSE -dBATCH", "-dTextAlphaBits=$render_antialias_bits", "-dGraphicsAlphaBits=$render_antialias_bits", "-r$render_dpi", "-sDEVICE=pnmraw", "-sOutputFile=$file.pnm", "$file.pdf" ); my ($img_width, $img_height) = pnm_width_height("$file.pnm"); #print "# img_width=$img_width

"; #print "# img_height=$img_height

"; #print "#

"; # --- Read dimensions file written by TeX during processing. # # Example of file contents: # snippetdepth = 6.50009pt # snippetheight = 13.53899pt # snippetwidth = 145.4777pt # pagewidth = 153.4777pt # pageheight = 28.03908pt # pagemargin = 4.0pt my $dimensions = {}; do { open(DIMENSIONS, '<', "$file.dimensions") or die "$file.dimensions: can't read"; while (<DIMENSIONS>) { if (m/^(\S+)\s+=\s+(-?[0-9\.]+)pt$/) { my ($value, $length) = ($1, $2); $length = $length / 72.27 * $render_dpi; $dimensions->{$value} = $length; } else { die "$file.dimensions: invalid line: $_"; } } close(DIMENSIONS); }; #foreach (keys %$dimensions) { print "# $_=$dimensions->{$_}px

"; } #print "#

"; # --- Crop bottom, then measure how much was cropped. run_command("pnmcrop -white -bottom $file.pnm >$file.bottomcrop.pnm"); my ($img_width_bottomcrop, $img_height_bottomcrop) = pnm_width_height("$file.bottomcrop.pnm"); my $bottomcrop = $img_height - $img_height_bottomcrop; #printf "# Cropping bottom: %d pixels - %d pixels = %d pixels cropped

", # $img_height, $img_height_bottomcrop, $bottomcrop; # --- Crop top and sides, then measure how much was cropped from the top. run_command("pnmcrop -white $file.bottomcrop.pnm >$file.crop.pnm"); my ($cropped_img_width, $cropped_img_height) = pnm_width_height("$file.crop.pnm"); my $topcrop = $img_height_bottomcrop - $cropped_img_height; #printf "# Cropping top: %d pixels - %d pixels = %d pixels cropped

", # $img_height_bottomcrop, $cropped_img_height, $topcrop; # --- Pad image with specific values on all four sides, in preparation for # downsampling. # Calculate bottom padding. my $snippet_depth = int($dimensions->{snippetdepth} + $dimensions->{pagemargin} + .5) - $bottomcrop; my $padded_snippet_depth = round_up($snippet_depth, $oversample); my $increase_snippet_depth = $padded_snippet_depth - $snippet_depth; my $bottom_padding = $increase_snippet_depth; #printf "# Padding snippet depth: %d pixels + %d pixels = %d pixels

", # $snippet_depth, $increase_snippet_depth, $padded_snippet_depth; # --- Next calculate top padding, which depends on bottom padding. my $padded_img_height = round_up( $cropped_img_height + $bottom_padding, $oversample); my $top_padding = $padded_img_height - ($cropped_img_height + $bottom_padding); #printf "# Padding top: %d pixels + %d pixels = %d pixels

", # $cropped_img_height, $top_padding, $padded_img_height; # --- Calculate left and right side padding. Distribute padding evenly. my $padded_img_width = round_up($cropped_img_width, $oversample); my $left_padding = int(($padded_img_width - $cropped_img_width) / 2); my $right_padding = ($padded_img_width - $cropped_img_width) - $left_padding; #printf "# Padding left = $left_padding pixels

"; #printf "# Padding right = $right_padding pixels

"; # --- Pad the final image. run_command( "pnmpad", "-white", "-bottom=$bottom_padding", "-top=$top_padding", "-left=$left_padding", "-right=$right_padding", "$file.crop.pnm", ">$file.pad.pnm" ); # --- Sanity check of final size. my ($final_pnm_width, $final_pnm_height) = pnm_width_height("$file.pad.pnm"); $final_pnm_width % $oversample == 0 or die "$final_pnm_width is not a multiple of $oversample"; $final_pnm_height % $oversample == 0 or die "$final_pnm_height is not a multiple of $oversample"; # --- Convert PNM to PNG. my $final_png_width = $final_pnm_width / $render_oversample; my $final_png_height = $final_pnm_height / $render_oversample; run_command( "cat $file.pad.pnm", "| ppmtopgm", "| pamscale -reduce $render_oversample", "| pnmgamma .3", "| pnmtopng -compression=9", "> $file.png" ); # --- Convert PNG to HTML. my $html_img_width = $final_png_width / $display_oversample; my $html_img_height = $final_png_height / $display_oversample; my $html_img_vertical_align = sprintf("%.0f", -$padded_snippet_depth / $oversample); (my $html_img_title = $tex_snippet) =~ s{([&<>'"])}{sprintf("&#%d;",ord($1))}eg; my $png_data_base64 = do { open(PNG, '<', "$file.png") or die "$file.png: can't open"; binmode PNG; my $png_data = do { local $/; <PNG> }; close(PNG); use MIME::Base64; MIME::Base64::encode_base64($png_data); }; #$png_data_base64 =~ s/\s+//g; my $html = qq{<img

} . qq{ width=$html_img_width} . qq{ height=$html_img_height} . qq{ style="vertical-align:${html_img_vertical_align}px;"} . qq{ title="$html_img_title"} . qq{ src="data:image/png;base64,

$png_data_base64" />}; open(HTML, '>', "$file.html") and print HTML $html and close(HTML) or die "$file.html: can't write"; # --- Clean up and return result to caller. run_command( "rm -f", "${file}{.*,}.{tex,aux,dvi,err,log,dimensions,pdf,pnm,png}" ); return $html; } #------------------------------------------------------------------------------ # # MAIN CONTROL # binmode(STDIN, ":utf8"); binmode(STDOUT, ":utf8"); binmode(STDERR, ":utf8"); my $tex_template = do { local $/; <DATA> }; my $input = do { local $/; <STDIN> }; (my $html = $input) =~ s{\$(.*?)\$}{tex_to_html($tex_template,$1)}seg; $html =~ s{([^\s<>]*<img.*?>[^\s<>]*)} {<span style="white-space:nowrap;">$1</span>}sg; print <<EOT; <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <title></title> </head> <body> <p> $html </p> </body> </html> EOT exit(0); #------------------------------------------------------------------------------ # # LATEX TEMPLATE # __DATA__ \documentclass[10pt]{article} \pagestyle{empty} \setlength{\topskip}{0pt} \setlength{\parindent}{0pt} \setlength{\abovedisplayskip}{0pt} \setlength{\belowdisplayskip}{0pt} \usepackage{geometry} \usepackage{amsmath}

ewsavebox{\snippetbox}

ewlength{\snippetwidth}

ewlength{\snippetheight}

ewlength{\snippetdepth}

ewlength{\pagewidth}

ewlength{\pageheight}

ewlength{\pagemargin} \begin{lrbox}{\snippetbox}% $<SNIPPET>$% \end{lrbox} \settowidth{\snippetwidth}{\usebox{\snippetbox}} \settoheight{\snippetheight}{\usebox{\snippetbox}} \settodepth{\snippetdepth}{\usebox{\snippetbox}} \setlength\pagemargin{4pt} \setlength\pagewidth\snippetwidth \addtolength\pagewidth\pagemargin \addtolength\pagewidth\pagemargin \setlength\pageheight\snippetheight \addtolength{\pageheight}{\snippetdepth} \addtolength\pageheight\pagemargin \addtolength\pageheight\pagemargin

ewwrite\foo \immediate\openout\foo=\jobname.dimensions \immediate\write\foo{snippetdepth = \the\snippetdepth} \immediate\write\foo{snippetheight = \the\snippetheight} \immediate\write\foo{snippetwidth = \the\snippetwidth} \immediate\write\foo{pagewidth = \the\pagewidth} \immediate\write\foo{pageheight = \the\pageheight} \immediate\write\foo{pagemargin = \the\pagemargin} \closeout\foo \geometry{paperwidth=\pagewidth,paperheight=\pageheight,margin=\pagemargin} \begin{document}% \usebox{\snippetbox}% \end{document}