I recently saw a neat trick on Twitter from Stephen Hay, who says that:

Few things clean up CMS-input HTML better than running it through Pandoc to convert to Markdown and then back to HTML again. 1 sec, big win.

pandoc is a swiss-army knife for converting between all sorts of markup formats. You can find installation instructions on the pandoc site.

Suppose we have a tea-dance.html file that contains crufty markup, because it was generated by a WYSIWYG editor. We could clean it up by running this at the command line:

cat tea-dance.html | pandoc --from=html --to=markdown | pandoc --from=markdown --to=html

This emits a cleaned up version of tea-dance.html on standard out.

We’re using pandoc as a filter, that is: a program “that accepts text at standard input, changes it in some way, and sends it to standard output”.

Running text from a Vim buffer through an external filter

Suppose that we open the tea-dance.html file in Vim. We can use the bang Ex command to filter the contents of the current buffer through our pandoc pipeline:

:%!pandoc --from=html --to=markdown | pandoc --from=markdown --to=html

Vim will take the output from that pipeline and use it to overwrite the original text from the buffer.

In a followup tweet, Stephen suggests mapping this Ex command to a key so we can run it more easily. For example, you could add a mapping for normal mode and another for visual mode:

nnoremap < leader > gq : % ! pandoc - f html - t markdown | pandoc - f markdown - t html < CR > vnoremap < leader > gq :! pandoc - f html - t markdown | pandoc - f markdown - t html < CR >

That’ll work, but I want to suggest a way of doing it without leader mappings.

Set up formatprg to filter selection through pandoc

In episode 18 of Vimcasts, I demonstrated how the external par command could be used for the task of formatting plain text files with hard-wrapping. As long as we’re using Vim version 8.0.0179 (or newer), we can use a similar technique here.

The gq operation runs the selected text through the filter specified by formatprg . This autocommand sets formatprg for HTML files to use our pandoc pipeline:

if has ( "autocmd" ) let pandoc_pipeline = "pandoc --from=html --to=markdown" let pandoc_pipeline .= " | pandoc --from=markdown --to=html" autocmd FileType html let & l : formatprg = pandoc_pipeline endif

That means we can filter the current line through pandoc by pressing gqq . Or we can filter the entire buffer by pressing gg then gqG . Or we can switch to visual mode, and gq will filter only the selected lines.

Update: When I originally published this episode, I assumed that the formatprg option could be set for each buffer independently. I was wrong then, but this is now possible since this patch by Sung Pae was accepted into Vim core.

Further reading