Add syntax highlighting to your static blog with Python and Pygments

Recently I’ve converted my blog from WordPress to a static website generated with Jekyll, my only problem was with the way Jekyll handles syntax highlighting through Pygments, specifically with changing the background of more than a line of code. While doing this for a single line of code works perfectly, I wasn’t able to find a way to change the background of a group of lines, like for example the second and the fourth line from a piece of code.

Changing the background for a specific line of code is the equivalent of using a marker on a paper with printed code, it is especially important to have this kind of fine control over the details when you write about programming.

In the past I’ve used WordPress and a JavaScript syntax highlighter plugin to render my code snippets. So I was accustomed with marking code this way:

[code lang="cpp" highlight="1 9 15"] ... [/code]

Using the above markers a code like this:

//Unicode strings in C++11 #include <iostream> #include <vector> #include <string> using namespace std; int main(){ string aux = u8"Hello, World"; string aux2 = "Hello, World"; cout << aux << endl; ... string tst=u8"您好世界"; cout << tst << endl; return 0; }

can be rendered as:

1 //Unicode strings in C++11 2 #include <iostream> 3 #include <vector> 4 #include <string> 5 6 using namespace std ; 7 8 int main (){ 9 string aux = u8 "Hello, World" ; 10 string aux2 = "Hello, World" ; 11 cout << aux << endl ; 12 13 ... 14 15 string tst = u8 "您好世界" ; 16 cout << tst << endl ; 17 18 return 0 ; 19 }

Obviously a JavaScript solution like the above can be used with Jekyll, you just need to include the relevant JS files in your pages. There is however a problem with using this - it looks really bad on mobile devices and it won’t work if the reader has JS disabled. On the other hand a code highlighted with Pygments looks the same on a desktop and on an iPad for example.

You can use Pygments directly as a stand alone program or as a module in a Python code. I wanted to be able to automatically highlight snippets of code, in different programming languages, in a bunch of Markdown or html files, using a simple syntax like the one I was accustomed to.

In order to be able to do this, I wrote a simple Python script that will parse all my posts and, using Pygments, will highlight any code delimited by [code lang=”___” ] … [/code].

This code will add only html tags to your file. You will also need to generate a css style from Pygments if you want to have colorized syntax, this can be achieved with:

1 pygmentize -f html -S colorful -O style=borland -a .highlight > syntax.css

You can pick from more than a dozen styles already defined in Pygments (e.g. borland, vim, emacs …) or you can define your own if you want. For a complete list of programming languages and styles supported, see the Pygments documentation.

Here is the complete Python code used to highlight the syntax for this blog:

1 import re 2 from os import listdir 3 from pygments import highlight 4 from pygments.lexers import get_lexer_by_name 5 from pygments.formatters import HtmlFormatter 6 7 def reset_defaults (): 8 linenos = True 9 lang = "text" 10 hl_lines = "0" 11 style = "borland" 12 return linenos , lang , hl_lines , style 13 14 15 def proc_file ( fname , orig , dest ): 16 linenos , lang , hl_lines , style = reset_defaults () 17 start_code = re . compile ( r "\[\s*code\s.*\]" ) 18 end_code = re . compile ( r "\[\s*\/\s*code\s*\]" ) 19 expr = re . compile ( r "\s*\w+\s*=\s* \" [a-zA-Z0-9\s]+ \" " ) 20 21 fd = open ( "./" + orig + "/" + fname , 'r' ) 22 fout = open ( "./" + dest + "/" + fname , 'w' ) 23 line = fd . readline () 24 fout . write ( line ) 25 while ( len ( line ) > 0 ): 26 line = fd . readline () 27 if ( start_code . match ( line )): 28 #extract the language and options 29 opt = expr . findall ( line ) 30 for s in opt : 31 aux = s . split ( "=" ) 32 if ( aux [ 0 ] . strip () == "lang" ): 33 tmp = aux [ 1 ] . strip () 34 lang = tmp [ 1 : len ( tmp ) - 1 ] 35 if ( aux [ 0 ] . strip () == "highlight" ): 36 tmp = aux [ 1 ] . strip () 37 hl_lines = tmp [ 1 : len ( tmp ) - 1 ] 38 39 lexer = get_lexer_by_name ( lang ) 40 formatter = HtmlFormatter ( linenos = linenos , hl_lines = hl_lines , style = style , cssclass = "highlight" ) 41 42 text = "" 43 while ( 1 ): 44 line = fd . readline () 45 if ( end_code . match ( line )): 46 break 47 else : 48 text = text + line 49 #process the code 50 result = highlight ( text , lexer , formatter ) 51 #save the processed line 52 fout . write ( result ) 53 linenos , lang , hl_lines , style = reset_defaults () 54 else : 55 fout . write ( line ) 56 fd . close () 57 fout . close () 58 59 60 #Hard coded directory names, these should be taken as command line arguments 61 orig = "./_orig" 62 dest = "./_posts" 63 64 file_list = listdir ( "./" + orig ) 65 66 for f in file_list : 67 #ignore hidden system files 68 if ( f [ 0 ] != '.' ): 69 proc_file ( f , orig , dest )

The above code will recursively parse all files from the origin folder and will use Pygments to highlight every piece of code will find, the modified files will be saved in the destination folder. You will need to change the names of the origin and destination folder to what you have on your computer. In my case these are named _orig and _posts.

If you need more control you can easily modify the above code to let you pass more Pygments options.

The above code is perfectly integrated in my blogging workflow:

Use your preferred text editor to write a post in html or Markdown.

Run the above syntax highlighter Python code.

Run Jekyll to generate your static blog.

Minify your site.

Deploy your changes.

If you are a novice Python programmer, or if you want to learn more about Python, you can read the official tutorial from Python’s website.

If you prefer to read a book, a good introduction is Mark Lutz’s Learning Python