I have had repetitive stress injury (RSI) flareups on and off for many years. I’ve kept them under control mostly through use of an Kinesis Advantage ergonomic keyboard that I’ve had for fifteen years. However, several months ago, I was diagnosed with carpal tunnel syndrome and instructed to avoid typing if at all possible.

I make my living programming and writing. My hobbies are programming and writing.

The first thing I had to do was completely scrap two book projects. Then I spent some time researching voice coding programs. I clearly remember attending a talk on voice coding by Tavis Rudd at PyCon 2013, and I wondered how the voice coding space has changed since then. It turns out, it has come a long way, and in 2020, programming with your voice is entirely doable.

It’s not too finicky to set up, and while there’s a bit of a learning curve, I was productive within the first week. Six months later, I am quite comfortable with voice coding. I am not quite as efficient as I am while typing, and my brain is more tired at the end of the day, but at least I don’t lose my voice anymore! Surprisingly, dictating free text is a lot harder than actually writing code. Dictating this article is my first attempt at doing any major work in English. Hopefully it will go well.

If you want to give it a shot, you’ll have to spend some money, and jump through a few hoops. But if it’s the difference between programming and not programming, it may be worth it.

Table of Contents

The Money

Theoretically, there are several speech recognition APIs that can be used for voice programming. However, the only one that is currently really worth using is Dragon NaturallySpeaking. And the Dragon Home edition is not sufficient. So you’ll need to get Dragon professional which will set you back a couple hundred dollars.

On top of that, Dragon currently only supports the Windows operating system. I was lucky here, since I had recently switched to Windows after getting a Microsoft Surface Book. However, most coders I know use either Macs or Linux machines. If you need to buy a Windows 10 license, it’s going to add to your costs.

Further, speech recognition is a fairly CPU intensive task, so you may even need to upgrade your machine.

Finally, the difference between being thoroughly frustrated, and being functional when dictating is a very good microphone. That will probably set you back another few hundred dollars.

All in all, I’ve spent about a thousand Canadian dollars.

The Microphone

I got by with the microphone in my Sennheiser headphones for a couple months, but it drove me crazy. I finally broke down and bought myself a USB TableMike. I haven’t regretted it. Since switching, I am able to code for longer periods of time, without feeling too tired. I have to make far fewer corrections, which means that I can speak longer before pausing to make sure that the commands are showing up correctly. The delay between speech and seeing it on the screen is much shorter with the TableMike, which just makes everything go more smoothly.

Note that not all high-end mics are actually useful for speech recognition. You will probably not be able to repurpose your band’s professional recording studio microphone. I did quite a bit of research before settling on the TableMike, but there are a few other options you can look at. The KnowBrainer has a good collection of microphones that have been specifically vetted for dictation software.

If I were to choose again, I would probably get a SpeechWare MultiAdapter and TwistMike. The latter is longer and can optionally be wrapped around the neck, in addition to being used as a table microphone. This would make it better for adjusting my posture throughout the day. However the combination of the two is quite a bit more expensive than the basic TableMike.

The Software

The primary tool you need to set up a voice programming environment is a tool called Caster. It is an open-source program written in Python and built on top of the dragonfly project. It connects to Dragon NaturallySpeaking with the NatLink tool. (I swear I haven’t seen a project on source forge for at least ten years!)

Caster’s installation instructions look a bit intimidating, but it’s actually not that bad. Here are some tips:

Install Dragon NaturallySpeaking first, and make sure that it’s working with your microphone. Go through the tutorial but don’t pay too much attention to it. I use Caster’s commands almost exclusively instead of the ones provided by Dragon. You will be quite frustrated at first, but don’t worry, it does get better. If you started out with a decent microphone, hopefully you won’t be quite as frustrated as I was.

Make sure that you install the 32-bit version of Python 2. I know Python 2 is extremely near end of life. As the person who wrote one of the very first books on Python 3, I find it very frustrating to have such an outdated version of the interpreter on my system. Hopefully it will be ported soon, maybe by you, once you get comfortable with voice coding.

Install Microsoft Visual C++ 2010 Service Pack 1 Step through the wizard on Microsoft’s site and download the file vcredist_x86.exe when prompted to do so. On Windows 10, NatLink will not work correctly without this file.

Then install NatLink and Caster as described in the installation instructions. I recommend checking out Caster from the fork of the repo in your own github account as you will certainly want to make command customizations.

It will be a huge relief when you can say arch brov char delta and see abcd show up in your text.

Print The Cheat Sheet

Keep the Cheat Sheet under your pillow for the first month or two. You’ll be looking up commands all the time. Also keep a pen nearby so you can annotate it.

You will probably also want to make a cheat sheet for your preferred application commands and programming languages. I’m not sure if the one I made for Rust and Visual Studio code is useful, but it’s there if you want it.

Now might also be a good time to introduce yourself in the Caster gitter channel. These folks are some of the kindest and most helpful people in the open source community. Even if you don’t need help, they are just good folks get to know! But, frankly, you will need help. ;-)

Visual Studio Code

I recommend Visual Studio Code as a text editor in general, but it’s actually particularly nice for voice coding, especially if you are not using Windows. I have a Linux machine where I do all my development, and I log into it using the Remote Development extension, which has always worked flawlessly for me. If you don’t have a spare UNIX machine lying around, you can also set up the Windows Subsystem for Linux. Even if you’re a Mac or Linux user, it should give you an environment that you are somewhat comfortable in.

The other reason that I recommend VS code is that the Caster developers seem to use it themselves. The caster application commands for VS code are particularly extensive.

Dragon Naturallyspeaking Options

It is easy to tell that Dragon is a fairly old code base that is mostly developed in maintenance mode. They seem to improve the artificial intelligence with each release, but the user experience is quite nineties! As such, I recommend disabling most of their features, and relying on Caster commands instead. Specifically I did the following in the Dragon options menus:

Disable all the checkboxes in the “Correction” menu, except for the “Automatically add words to the active vocabulary” option.

Under the “Commands” menu make sure to check the “Require click to select…” checkboxes. Otherwise you will find yourself accidentally clicking buttons or menu items instead of inserting text into your editor. I’ve disabled the other checkboxes in that menu as well.

Check the “Do not show tips” button under the Appearance menu. You may want to wait a while for this, but I find that the tips are just distracting because Caster commands work better.

If you have a good quality microphone as I advised, you may want to set the “speed versus accuracy” slider in the “Miscellaneous” menu to a fairly high value.

Uncheck the “Use the dictation box for unsupported applications” checkbox. Use Caster text manipulation instead.

Manage Dragon vocabulary

Dragon is designed for English-language input, not the kinds of names that you typically apply to your variables, classes, and functions. As such, you will probably have to add words and phrases to the vocabulary on a regular basis. This is easily done from the Vocabulary Editor.

Most often, you will find it convenient to add entire phrases, as opposed to individual words. Your variable names are probably phrases that are comprised of words that are common in English, but are not often used together as phrases. For example, in my work I use the variable name beanstalk_miner . The word beanstalk is part of the English vocabulary, as is the word miner . But they are not normally used together. I tended to get that phrase recognized as be in stock minor or being stuck minor . Using the vocabulary editor, I added the phrase beanstalk miner .

Notice that I didn’t include an underscore in the phrase used in the vocabulary manager, even though it is included in my variable name. The formatting of variable names is, as you should know if you have read the Caster cheat sheet, not part of the vocabulary, but using special Caster commands.

I recommend being very aggressive in managing your vocabulary. If there is a phrase that is causing you problems, just delete that phrase from the vocabulary. For example, when I get confused mid-sentence, I often make an inadvertent hissing sound. This invariably gets recognized as 's . So entered the Vocabulary Editor and searched for 's and removed the matching symbols. It can also be handy to remove words that insert symbols that are better inserted as Caster commands. I have removed the point command which maps to a period. I don’t want periods showing up in my text unless I use the Caster dot command. Unfortunately, Dragon won’t let me remove the period command from my vocabulary.

I have even considered going through the vocabulary and deleting many phrases and names that I don’t need. That might speed up recognition, but the Dragon vocabulary is huge and it’s probably not worth it!

I would also like to experiment with having custom vocabularies based only on my project files and their dependencies. However, writing the software to do such analysis would actually be a pretty big project in itself. It would probably also not be terribly helpful to disable the default Dragon vocabulary in favour of a project-specific vocabulary since I tend to write extensive English-language doc strings.

One last thing to mention about vocabulary, is that Dragon does a pretty good job of adapting to your speaking habits. Use it for a couple of weeks and you will find that it is making fewer errors as you go. Don’t forget to launch accuracy tuning from the Audio menu every few months, as well.

Renaming Caster Commands

As you start to use the commands in the Caster cheat sheet, you will undoubtedly find that some of them get confused on a regular basis. For example, I found that the word ace , which is supposed to insert a space character, would often be confused with lease , which is supposed to move the cursor to the left. This got annoying really quickly.

Luckily it is really easy to rename words using Caster transformers. Let’s try doing this using voice commands:

First say enable bring me to enable the bring me command, which allows you to open favourite programs, files and websites. Caster ships with some default bring me commands for managing itself. Say bring me caster bring me file to see what they are. (Don’t try to edit this file directly while Dragon is running as gets overwritten when Caster shuts down. Use the commands outlined in the documentation to edit the file instead).

to enable the bring me command, which allows you to open favourite programs, files and websites. Caster ships with some default bring me commands for managing itself. Say to see what they are. (Don’t try to edit this file directly while Dragon is running as gets overwritten when Caster shuts down. Use the commands outlined in the documentation to edit the file instead). Say bring me caster transformers file to edit a configuration file in whichever editor is configured to open it. (Make sure that you have set it appropriately in Windows Explorer).

to edit a configuration file in whichever editor is configured to open it. (Make sure that you have set it appropriately in Windows Explorer). Set the TextReplacerTransformer = true and save the file.

and save the file. Say bring me caster transformer file (there’s no s after transformer this time) to open the words.txt .

(there’s no after transformer this time) to open the . Edit the file as described in the documentation. Incidentally, the documentation is being rewritten to incorporate some exciting changes that happened recently. If you want to read it, just say bring me caster documentation !

! Say reboot caster to reload the entire app. Caster is capable of reloading rules on the fly when you save a grammar file but I’m not certain if this works with words.txt .

This is how my words.txt currently looks:

squat -> bend spark -> poke ticky -> dunk chicky -> applet chocky -> chocolate deckle -> cota lease -> leap <<<NOT_SPECS>>> comma -> hang

I can’t remember exactly why I made all those changes. I know that deckle was often confused with echo which tended to switch up a colon for the letter e. The three commands that end in ky are often inserted when I use the word key , which I use a lot since I work in crypto! I don’t use and will never use Java so I knew that the word applet would be safe. The word dunk is actually named after the donk sounds that my childhood Speak N Spell used to admit whenever it was inserting an apostrophe! The command mapping comma to hang was added so that I didn’t confuse the Caster command for comma the one provided by Dragon.

You undoubtedly have your own hangups that you need to change, but these are the ones I use.

Caster Rules

The Caster documentation for writing your own grammar rules is a bit hard to follow. I am under the impression that it is being rewritten, but for me it was much easier to read the source code from some existing simple rules in the Caster repository. The Visual Studio code commands are fairly straightforward. The fact that there are two files reflects the difference between CCR and non-CCR commands. I’ll leave you to peruse the Caster documentation to figure out what the difference is.

I keep a local fork of this repository and maintain a branch on top of the latest master. My customizations are on this branch if you want to use them for reference. I have customizations on the Visual Studio code file because I don’t use the full key bindings in a couple cases, and a few customizations on the Rust grammar to match my workflow. These customizations aren’t as big as they used to be because the team has accepted a couple of pull requests to add my customizations that are more generally useful.

A custom rule to fix mixed up words

Even if you carefully manage your vocabularies, there will undoubtedly be words that are confused on a regular basis. I keep a custom Caster rule to help me distinguish between these words. You can put your custom rules in the Caster rules folder ( bring me caster rules ). I have one named tricky.py (short for tricky words) that looks like this:

from dragonfly import MappingRule, Choice from castervoice.lib.const import CCRType from castervoice.lib.actions import Text, Key from castervoice.lib.ctrl.mgr.rule_details import RuleDetails from castervoice.lib.merge.state.short import R from castervoice.lib.merge.mergerule import MergeRule class TrickyWords(MappingRule): pronunciation = "tricky words" mapping = { "fix <word>": R(Text("%(word)s")), } extras = [ Choice("word", { "at": "add", "rest": "rust", "note": "node", "bite": "byte", "bites": "bytes", "right": "write", "sink": "sync", }) ] def get_rule(): return TrickyWords, RuleDetails(name="tricky words")

Even if you don’t know Python it should be fairly easy to figure out how to add your own words. Note that the rule is not enabled by default. You’ll need to say enable tricky words to activate it. Then to insert a word that is often interpreted incorrectly, say fix bites and it will insert the word bytes .

I have one other custom rule that I used to execute common commands in the VScode terminal:

from dragonfly import MappingRule, Choice from castervoice.lib.const import CCRType from castervoice.lib.actions import Text, Key from castervoice.lib.ctrl.mgr.rule_details import RuleDetails from castervoice.lib.merge.state.short import R from castervoice.lib.merge.mergerule import MergeRule class CLI(MergeRule): pronunciation = "CLI" mapping = { "giddy": R(Text("git ")), "CD": R(Text("cd ")), "parent": R(Text("../")), "list files": R(Text("ls ")), "move file": R(Text("mv ")), "copy file": R(Text("cp ")), "remove file": R(Text("rm ")), "NPM": R(Text("npm ")), "neon build release": R(Text("neon build --release")), "cargo build release": R(Text("cargo build --release")), "cargo test release": R(Text("cargo test --release")), "cargo run release": R(Text("cargo run --release")), "previous command": R(Key("c-p")), "exit shall": R(Key("c-d")), "interrupt": R(Key("c-c")), } def get_rule(): return CLI, RuleDetails( executable="code", title="Visual Studio Code", ccrtype=CCRType.APP)

Some of these are generically useful, while others are specific to my workflow. It may not be obvious that you can string these commands together with other commands as in cd parent snake beanstalk miner slash laws tests which inserts cd ../beanstalk_miner/tests . This is because it is a merge rule from the continuous command recognition module.

The only thing that might be a little bit off here is the previous command which maps to a control-P keypress. I have that command mapped to bindkey '^P' history-beginning-search-backward in my ~/.zshrc . There is a similar command for bash but I can’t remember what it is off the top of my head.

In the get_rule function I have this limited to only execute these commands when Visual Studio code is enabled. If you tend to work from the Windows command prompt or some other terminal emulator you will probably want to make it apply to those programs instead, or make it generically applicable as I did with the tricky words.

Text Manipulation

By far the most powerful and useful feature of Caster is the Text Manipulation And Navigation grammar.

As just one example, when Dragon misrecognizes caster as castor I just have to say replace leap oscar with echo (bearing in mind that I have remapped lease to leap for left navigation in my words.txt ).

The documentation is fairly straightforward, but it takes a little bit of practice to get used to working with it. Without these rules, I would be unable to write and edit this article!

NOTE: You will need to issue the enable text manipulation command before you are able to use these rules.

I am proud to say that I added the capitalization commands and had my patch accepted. You can thank dusty capital leap dusty . ;-)

Snippets

I type fast enough that I have never had much use for snippets in my text editor. Perhaps if I’d used them more effectively, I would not have developed the RSI symptoms that I have! However, with voice editing snippets are invaluable. Here is just one example that I use all the time (Visual Studio Code syntax):

"function": { "prefix": "function", "body": [ "fn $1(${2:&self, }) ${3:-> }{", " $4", "}" ], "description": "Insert a rust function" }

This inserts a Rust function with positions that I can tab between to insert the name, arguments, return value, and body. I have default values for the arguments and the return value arrow which makes it easy to confirm or overwrite them.

Write lots of snippets! The more you have the more you will enjoy voice coding.

Learning Curve

Even when everything is working correctly, voice coding is hard. For example I just had to replace leap wainscotting with voice coding in the previous sentence. I’m sure you’ve seen many more voice-o’s throughout the article. I had to correct something in just about every sentence, although that’s not as bad as it sounds using the text replacement commands I described above. The replacement is not necessarily because Dragon got it wrong, but because I changed my mind about what I wanted to say midsentence. This is one of the reasons that I find it easier to code than to write English text. The commands are shorter and I don’t have to complete a thought before I start saying it.

It took me longer to become comfortable with voice coding than it did to learn Dvorak two decades ago, and that’s not just not because my brain moves slower these days! It probably took me about two months to stop looking at my cheat sheet every few minutes. I didn’t stop swearing at my computer (which inserts some weird strings into your document, since Dragon doesn’t seem to know those words) until I got the new microphone. Even then it took me quite a while to become comfortable with stringing together long commands without waiting to see if they worked out all right and I still have to correct them regularly.

Back when my hands were working, I was a really fast typer; probably around a hundred words per minute. In comparison, voice coding feels really slow. About once a week I give in and start typing again and it feels like a real luxury… until the pain starts to set in. I certainly hope that I can get back to typing sometime in the future, but I am told that nerves take a long time to heal and I don’t want to take any chances.

RSI Alternatives

The most unfortunate thing about dictation is that it takes a fair bit of money upfront before you can find out whether it will work for you. Using a microphone you already have or the free dictation software does not give you any indication of how powerful voice coding can be. But it would really suck to spend the upfront money only to discover that it doesn’t work as well for you as you had hoped.

I have been on the edge of carpal tunnel syndrome for a long portion of my coding career. I have tried almost everything to avoid it. Here are some suggestions if you are not ready to take the leap into voice coding: