Convert Microsoft Word to Plain Text

This is a repost of an entry from 2004. This Word-cleaning functionality is showing up in more and more web editors, but people might still find this useful.

Most of the time when I’m writing content for the web (for this blog, or a forum comment, or whatever), I’ll write in Microsoft Word for the spell check and other features that aren’t in a standard textarea widget, and then I’ll cut and paste into the form on the site.

The problem is that this carries all of the high characters (“smart-quotes” and the like) that MS Word makes straight through to the site — and most sites aren’t set up to handle them. They expect plain (“Latin”) text.

A solution: this script converts text copied from MS word into plain text. Paste your input into the top box, press clean, and the input will be scrubbed and sent to the lower box.

(If you want to clean up Word HTML, rather than just create plain text, I suggest that you use HTML Tidy with the “clean” and “Word 2000” boxes checked.)

• “Double Quotes”, • ‘Single quotes’, • Ellipsis …, • em-dash —

Clean