August 24, 2008 at 19:58 Tags Blogging , Software & Tools

This post documents my transition from Textile to reStructuredText, with Pygments for source code highlighting.

Leaving Textile When I got tired banging in HTML code for my blog posts, I found Textile as a friendlier solution. However, I'm finally fed-up with Textile, for several reasons: No implementation does exactly what I want, and tweaking is essential. But textile implementations were not designed for tweaking, so making them fit your needs is a painful experience. Since I'm lately into Python, I've been recently using pytextile, which turned out to be a particularly bad implementation . The source code formatting (in <pre> blocks) of the textile processors kept clashing with Wordpress And, looking for a better solution, I ran into reStructuredText, which is part of the docutils package.

reStructuredText (reST) reStructuredText has a few immediate benefits over Textile: It is being developed very actively. A few busy mailing lists is always a good sign of healthy development activity. The main implementation is in Python reStructuredText is considered to be a quasi-standard tool in the Python world, and is being used to format docstrings and even PEPs Its architecture is designed to be hackable and extensible from the ground up, and the documentation is very extensive and detailed. reStructuredText is suitable for more complex tasks than simple formatting. It can be used to format whole documents, with hyper-linked sections and a table of contents. The certainly "eat their own dog food" - the whole stack of documentation (and there's a lot of it) is formatted with reStructuredText

Installing reST Installation was a snap. I've downloaded docutils , followed the installation instructions and was up and running in 2 minutes. docutils installs a few useful scripts into the scripts installation directory of Python, and these can be used to turn text into various formats - HTML, XML, Latex, etc. In principle, reST is similar to Textile, and learning it was very easy. It took me less than an hour to whip up a sample document for myself that contains all the types of formatting I ever use for my blog posts. From a cursory glance, reST seems to be more powerful than Textile in several ways, providing more options. It is a tad less lightweight , but I think this is for a good purpose - Textile's lightness is the cause of the bad quality of parsers written for it. The only problem I had with reST is its construct for formatting source code. It's quite easy to do (simply ident a block of text, and it will be placed in <pre> tags), but it wouldn't be easy to connect it with the wp-syntax Wordpress plugin I'm using to highlight code in my blog. So I've decided to give Pygments a try.

Pygments Pygments is a Python library for source code highlighting. It is widely used and respected, and best of all - can easily connect to reST. After installing Pygments (just downloading from its website and following the instructions), I've modified the supplied external/rst-directive.py script for my needs, and created a generic "runner script" that is called with a text file as an argument, and creates from it an HTML file, formatted with reST with Pygments syntax highlighting (hooked to the sourcecode directive). Here's the code of the runner script, together with my custom style class for Pygments: # A 'runner' for HTML output # Accepts the input file and output file names as command line # arguments. Loads docutils and pygments and runs the formatter. # # Based on: # rst2html - from the docutils distribution # external/rst-directive.py - from the pygments distribution # # This code is in the public domain # Eli Bendersky # try : import locale locale.setlocale(locale.LC_ALL, '' ) except : pass ## ## Configuring Pygments ## from pygments.formatters import HtmlFormatter from pygments import highlight from pygments.lexers import get_lexer_by_name, TextLexer from pygments.style import Style from pygments.token import Keyword, Name, Comment, String, Error, \ Number, Operator, Generic, Whitespace, Text class SciteStyle (Style): default_style = "" styles = { Whitespace: '#bbbbbb' , Text: '#000000' , Comment: '#007f00' , Keyword: 'bold #00007f' , Operator.Word: '#0000aa' , Name.Builtin: '#00007f' , Name.Function: '#00007f' , Name.Class: '#00007f' , Name.Namespace: '#00007f' , String: '#7f007f' , Number: '#007f7f' , Generic: '#000000' , Generic.Heading: 'bold #000080' , Generic.Subheading: 'bold #800080' , Generic.Deleted: '#aa0000' , Generic.Inserted: '#00aa00' , Generic.Error: '#aa0000' , Generic.Emph: 'italic' , Generic.Strong: 'bold' , Generic.Prompt: '#555555' , Generic.Output: '#888888' , Generic.Traceback: '#aa0000' , Error: '#F00 bg:#FAA' } # Set to True if you want inline CSS styles instead of classes inlinestyles = True # The default formatter DEFAULT = HtmlFormatter(noclasses=inlinestyles, linenos= False , style=SciteStyle) # Add name -> formatter pairs for every variant you want to use VARIANTS = { 'linenos' : HtmlFormatter(noclasses=inlinestyles, linenos= True , style=SciteStyle) } def pygments_directive (name, arguments, options, content, lineno, content_offset, block_text, state, state_machine): """ Will process the highlighted source-code directive. """ try : lexer = get_lexer_by_name(arguments[ 0 ]) except ValueError: # no lexer found - use the text one instead of an exception lexer = TextLexer() # take an arbitrary option if more than one is given formatter = options and VARIANTS[options.keys()[ 0 ]] or DEFAULT parsed = highlight( u'

' .join(content), lexer, formatter) return [nodes.raw( '' , parsed, format= 'html' )] ## ## Loading docutils and registering the new directive ## from docutils import nodes, io from docutils.parsers.rst import directives import docutils.core pygments_directive.arguments = ( 1 , 0 , 1 ) pygments_directive.content = 1 pygments_directive.options = dict ([(key, directives.flag) for key in VARIANTS]) directives.register_directive( 'sourcecode' , pygments_directive) ## ## Execution ## import os , sys infile = sys.argv[ 1 ] outfile = os.path.splitext(infile)[ 0 ] + ".html" print "Running HTML writer:

-> %s" % outfile # Running publish_parts to get at the document body, without # header, style specifications and footer # parts = docutils.core.publish_parts( source= open (infile, 'r' ), source_class=io.FileInput, settings_overrides = { 'doctitle_xform' : 0 , 'initial_header_level' : 3 }, writer_name= 'html' ) open (outfile, 'w' ).write(parts[ 'body' ])