Way back in February, a man by the name of Ben Randall demoed an amazing voice control app called "utter!" that he had started developing. The initial video (a whopping 22 minutes long) demonstrated some amazing capabilities - take a look for yourself:

But that was over 9 months ago, and aside from the initial release of the (very limited) alpha, we haven't heard much about the app, though Mr. Randall has kept interested parties updated via his very active XDA thread. In those 9 months, he's made very steady progress, and today he has released the first beta build.

Keep in mind, this isn't like any other voice control app we've seen. Most try to imitate personal assistants - handling basic tasks like using the calendar, phone, email, or messaging. Even services like Google Now try to provide info like driving times and sports scores - not so much a tool as a cool parlor trick. Utter!, though, is like your own mechanic for your phone.

Check out the new video, this time of the beta build in action:

Upon install, Utter uses Google Voice Search for speech recognition, so it's compatible with a variety of languages. Plus, there's a selection of other recognition providers available. There are quite a few default commands, too:

Location functions (define locations, ask where you are, have it remember where you parked your car)

But things get really interesting when you crack open the customization menu. You can create your own phrases, commands, nicknames, and so on - and even better, you can get new commands from the community.

We've spent some time with Utter! and I have to say, it has gobs of potential even without considering what kind of goodies may come from the community. It offers a whole host of capabilities that aren't found in any other voice control app I've ever seen. That said, it's in beta for a reason: the voice recognition isn't perfect and when using it, you get the impression that you're dealing with a robot with pre-defined inputs and outputs rather than a genuine AI.

Ben also took the time to answer some questions about his competition, what makes Utter! unique, his plans for the future of the app, and to tell us a little bit about himself:

Q) What do I think about all of the emerging competition?

A) It made me focus on core functionality, rather than the ‘virtual intelligence’ of the app. Some people love the idea of talking to an inanimate object, some people think it’s stupid – Some people will think it’s great for a couple of days, and then think it’s stupid.

I’m building a ‘community phrase database’ which users will be able to ‘opt-in’ to if they’d like to get random answers to questions that others users have entered. You’ll be able to submit these phrases directly from the device, so I hope the database will grow very quickly and be pretty quirky and fun too. I’ve just got to decide how it’s possible to moderate the content – It might not be!

In addition, users will be able to bolt on a ‘bot’ to utter!, which they can give a personality to and teach from scratch over time or start out with 40,000+ phrases and conversations.

For the above I need server space and hosting on a large scale. The implementation is pretty basic. It will appeal to some, but not all.

Some of the competition have greatly improved the standard of natural voice recognition and interpretation. However, if you glance at the comments in the Play Store, you’ll still very often get ‘Didn’t recognise anything, crap, uninstalled’.

The only way to resolve this is to allow users to configure the commands using the words that are returned from the recognition provider, not what they actually said! This is the route I chose to take – letting the user improve the recognition for themselves, rather than trying the impossible (when working alone) of understanding everything.

There’s a ‘replace words’ feature in utter! so that if for some reason every time a user asks what the time is, the result comes back as ‘where’s the line’ you can enter ‘where’s the line’ and the replacement of ‘what’s the time’ so utter! converts this before analysing the content for commands.

This can be done for any commands – If you prefer to say ‘whack on bluetooth’ then you can add that phrase and link it to ‘turn on bluetooth’. However you naturally request something, you can link it to the base command.

The same applies for any language supported by Google Voice Search. There is a widget that allows the user to start the recognition in their native language and also the language of the voice engine too. They can customise the ‘How can I help you’ intro to their own language and then link any commands or phrases in their own language to the English command.

It will take some configuration from the user, but after gradually tweaking, there’s no reason why they shouldn’t end up with a flawless recognition experience, in any language. A couple of tweaks a day, in a month it will be a personalised beast!

Language plugins will become available, but English is hardcoded into the algorithms and will take some time and assistance to complete. The users translating the app themselves in the mean time isn’t ideal, but possible – and with a bit of work can be usable in all 42 languages! I think that’s how many are supported?

Moving on….

The newly emerged voice assistants seem to continue to try and emulate Siri – Quite why they do this instead of focussing on what the Android Platform can do, is a little beyond me!

Q) What makes utter! different?

A) The customisation is the biggest difference. You can create custom questions, answers and conversions. The content can contain any sound effect by typing se~burp (or whatever it is called) and utter! will check the external storage for that sound. It will also dynamically populate the current content of Tasker Variables when %VARNAME is used.

You can create commands using just a single word of your choice to launch any device activity or Tasker task for those who want a little more complexity.

There’s a Tasker Plugin (which I’m still working on), which passes any variable data to utter! to either store, notify or announce.

Commands are all editable and can be created in a text file and imported for mass production! Cloud storage too for transferring between devices.

Q) Anything else?!

A) utter! can be remotely controlled by text message from another device – It will speak or perform any actions requested in the message. If it’s lost or stolen it will send back its location by reply or to a separate email if requested. The control is restricted by a user configured password that must appear at the start of all commands.

In addition, if the device is rooted, the user can execute any shell command remotely and receive the output in the response – Full SU shell control is probably what you want if your device has been stolen! I think remote shell control might be a first too? Not seen it anywhere else!?

Full/partial data wipe and complete ‘nuke’ is of course possible with SU permissions.

Voice recognition

Google Voice Search just doesn’t perform well for some users and sends back rubbish. Plugins will be available for iSpeech, Dragon Nuance and ATT&T Watson, so the user can switch to these providers if they want, so to give them the best chance of being understood. They are available to test in the recognition tab. ATT&T performs very well.

Application Integration

There aren’t enough apps out there, that share their core features… In the linked applications tab you’ll see the apps that are compatible or will be very shortly. I’m really hoping that the beta will get some attention and app devs will start coming forward and offering integration with their apps. It’s a core ‘philosophy’ of the way utter! will work – providing results in the users preferred applications.

I demonstrated AccuWeather and eBay in the video – Neither of these work without a ‘hack’ and the inclusion would be so simple for them.. With a few more downloads, maybe they’ll sit up and take notice?

Pricing

Companies like Wolfram Alpha want a $ per install – Nuance, iSpeech and ATT&T want per-recognition payment. Bing want a per character translation charge.

Not everyone will want all of the above, some might want none, so I’ll be making premium plugins available for each of them individually. That way, users can spend as much or as little as they like upgrading utter!

Once the application emerges from beta, the base application will be no more than $3 at the very most. I’ll wait until I get a feeling from the feedback before deciding on a price. Anything more than that is just classed as ‘expensive’ in the trend of the Play Store (unfortunately).

A free, almost fully functional version will be available, but with the usual restrictions of a limited number of custom commands and no auto sending of texts, emails and tweets etc.

Plans for the app

In the short term I’m hoping for plenty of bug reports from users that install the beta. I’ve spent such a long time on the framework that I’m able to bolt-on any additional functions without touching the core of the app – So, before I start going crazy bolting stuff on, I’d really love to catch every bug out there in the framework. It’s hard to get users to actually report it, instead of leaving a 1* ‘Doesn’t work for me’ comment… damn them!

By the end of the beta:

More app integration: weather providers, radio streams, file explores, ebay, FourSquare, Catch Notes, Spring pad, 2do, tasks…. Etc etc

Location aware profiles – Do this when I get home/work etc

Localised searches – where’s the nearest – find me a etc etc…

A new app icon! If you’re interested in running a competition of sorts and think it might be good for readers to ‘poll their favourite’, you are more than welcome to run with it! It desperately needs one.

In the long term, I have such a huge list of things to do. Some highly functional, some quirky, some very technical. It would be great if a funder emerged so that I could get some technical assistance to help with the complexities that I’ll struggle to implement if I continue to work alone.

Finally

utter! should work out of the box for everyone who follows the structured commands. It can be tweaked if not and tweaked some more regardless! If users spend some time on it, then I hope they will end up with their perfect ‘voice assistant’, if they can be bothered…

With the bot add on and community phrases, it can be personalised and the content will be fresh and ever increasing. With the Tasker integration and shell access, there’s something in there for the more technical users too.

I really hope with the algorithms I’ve constructed and the framework I’ve designed, users will actually find it quicker to do things by voice and will start reaching for the recognition button instead of navigating between menus and then into apps and settings etc… I know the speed of the app could be improved again with some more technical assistance – I need the Java regex master to step forward and lend a hand!

Looking forward to seeing what the masses make of it! I hope….!

About me

I’ve always been a developer and ‘hacker’ of sorts since I was a kid, but life took me in another direction for my career path. When the first video got plenty of attention, I figured it was time to do what I’ve always wanted to and I’ve spent every spare moment I’ve had on it since. I’ve had utter! in my mind for years, since I hacked apart the voice command app on an old Windows Mobile device and changed some registry settings and the actions it performed.

I really hope I get some commercial funding so I can continue to develop the application. There are some amazing start-up companies out there for voice recognition and artificial intelligence and it would be fantastic to have the resource and the time to approach them. I would also be great to be approached.

If utter! isn’t successful, then I hope I’ll be offered work in a similar field, as it’s what I enjoy and what I want to do.

It’s been great to develop the application entirely on what would be useful for the user, not what would bring in a return for an investor, but now I hope the application is at a stage where it could achieve both.