This is a concept and demonstration of the duotone screen reading technology. It uses SpeechSynthesis API to generate human speech. It is on a research stage and the goal is to finalize the solution and enhance API design. Hope that early stage publication will help to improve it to become part of the Web Standards.

If you don't know screen readers read web documents element by element naming meaningful types and attributes before the element's content. For example html <h1>Hello</h1> will be read as Header Hello and checkbox input <input type="checkbox" checked /> Agree as Checkbox, checked. Agree . Also sometimes they wrap content with constructions like element-name and end element-name . For example link with such HTML code <a href="http://github.com">This is a link</a> will be read as Link. This is a link. End link .

And screen readers are using the same voice for everything. According to previous examples there is nothing difficult. But imagine big modern web pages with a dozen of elements and text blocks. There will be a lot of meta information which should be pronounced. Here it is more complete block of HTML:

<h1>Hello</h1> <p>This is a textblock. <a href="http://github.com">This is a link</a></p> <label> <input type="checkbox" checked /> Ok </label>

Screen reader would read it as Header Hello [pause] This is a textblock [pause] link This is a link end link [pause] checkbox, checked Ok . Separation between content and metadata is made by pause. And it could be improved if screan readers will separate metadata and a content and read it with different voices.

Example

⚠️ Note This code was tested and works in Chrome and Mozilla on MacOS. It doesn't work in Safari yet. If you meet issues on another platforms, please, file an issue in the repository.

Settings Content voice: Metadata voice: