Limitations

There is just one major limitation with Amazon Polly; the maximum allowed characters to provide as input. It's limited to 1500 characters per audio stream. So a complete blog post with paragraphs rendered as audio won't be possible in one API call at the moment. I hope this is something AWS works on in the nearby future. Also the maximum audio length is currently set to 5 minutes, everything after that will be cut-off.

However doing multiple API calls, concat audio streams into a single e.g. mp3 file will be possible. We just haven't got the time to get this up and running in our module (we will pick this up in the future).

Speech Synthesis Markup Language



Polly galdy also supports SSML values, which makes it possible to manipulate your audio stream. Probably you have ever seen already some text-to-speech engines were you fill in some text but the output of voice will mostly sound like spitting a rap. It just doesn't make sense if you listen to pronunciations and break times.

With SSML you can do for example:

Add breaks (1s, medium or very long)

Use substitutes (e.g. World Wide Web)

Audio effects like whispering (imagine Polly speaking to childs before night time)

Conclusion

The service really fills the gap between easy API integration and a decent quality of audio stream if you look at pronunciation. Picking the right voice is important as we noticed quite some difference in quality. With Polly it makes you think about improving your accessibility, especially with large content like blog posts. If a website offers me to listen to a blog post instead of scrolling down and reading from my screen, I'll definitely use it.

I actually don't get it why large publishing platforms like newspapers don't use this these kind of services already on high scale. It would definitely improve quality and gain better user experience.

There is one thing you should take care of, that is making your data available in a structured way. So try to get away from storing HTML rendered content (most WYSIWYG editors) in your database.

Wagtail is a platform which keeps these things in mind (forwards to a headless CMS), that's why the integration wagtail-speech was made in a very short period of time as structured data is already present. This is really a big plus if you compare it with other more traditional CMS platforms.