Highlight Perl source code using PPI::HTML A. Sinan Unur March 27, 2015

On my blog, I use the excellent highlight.js library to apply syntax highlighting to source code in the browser. This has the benefit of being able to copy & paste source code directly in to the post (enclosed in a [% FILTER html %] block), instead of having to transform it somehow. There is also the added benefit of keeping the number of tag-enclosed pieces of text to a minimum, keeping the original DOM simple which, one hopes, means faster downloads and faster initial rendering.

A recent Stackoverflow question introduced me to the PPI::HTML module which uses the amazing PPI module to parse Perl source code, and associate CSS classes with the various elements.

If you ask the module to produce a complete HTML page, it will also embed the relevant CSS in the page, and will produce a pretty, colorful document. By default, the class names are rather verbose, and the module offers limited flexibility, but PPI::HTML::CodeFolder provides some enhancements that may be useful.

What if you wanted to produce a self-contained chunk of syntax-highlighted Perl without depending on external CSS or JavaScript? In that case, you can resort to a somewhat grungy technique I use when I am generating HTML email: Post process the HTML to replace classes on elements with style attributes.

Here is an example Perl script which generates a syntax highlighted version of its own source code:

#!/usr/bin/env perl use strict; use warnings; use PPI; use PPI::HTML; use HTML::TokeParser::Simple; my %colors = ( cast => '#339999', comment => '#008080', core => '#FF0000', double => '#999999', heredoc_content => '#FF0000', interpolate => '#883333', keyword => '#0000FF', line_number => '#666666', literal => '#999999', magic => '#0099FF', match => '#9900FF', number => '#990000', operator => '#DD7700', pod => '#008080', pragma => '#990000', regex => '#9900FF', single => '#664444', substitute => '#9900FF', transliterate => '#9900FF', word => '#40c080', ); my $highlighter = PPI::HTML->new(line_numbers => 0); my $html = $highlighter->html(\ do { local $/; open 0; <0> }); print qq{<pre style="background-color:#fff;color:#000">}, map_class_to_style($html, \%colors), qq{</pre>

} ; sub map_class_to_style { my $html = shift; my $colors = shift; my $parser = HTML::TokeParser::Simple->new(string => $html); my $out; while (my $token = $parser->get_token) { next if $token->is_tag('br'); my $class = $token->get_attr('class'); if ($class) { $token->delete_attr('class'); if (defined(my $color = $colors->{$class})) { # shave off some characters if possible $color =~ s{ \A \# ([[:xdigit:]])\1 ([[:xdigit:]])\2 ([[:xdigit:]])\3 \z }{#$1$2$3}x; $token->set_attr(style => "color:$color"); } } $out .= $token->as_is; } $out; }

And the output, in a rather distasteful color scheme, I admit:

The original script is 1,690 bytes. On the other hand, the syntax highlighted chunk above is 8,599 which is about a 408% increase.

PS: You can discuss this post on /r/perl.