Last week we brought out an early, experimental version of audioClip for everyone to try. Using this namespace developers can make Sonos speakers play short clips of sound that won’t end whatever was playing on that speaker at the time. We’ve heard a lot of folks asking for this capability over the past few years and we’re glad to finally be bringing it to you. Some small details may change and there are still some things that we need to implement, but this is a great preview of what the namespace is capable of.

Of course, what good is a new API namespace if we don’t build fun things with it? Personally I’ve always wanted the ability to have my Sonos speak to me. You can imagine the uses: tell your kids on the third floor to come down for dinner. Announce the score of the football game. Say who that latest email is from. Really, the sky’s the limit. So let’s use Google Translate’s free Text-to-Speech (TTS) API, along with this new audioClip namespace, and build ourselves a browser-based Sonos TTS experience.

Getting Started

We’ll need a few things to get started:

A set-up and configured Sonos system, obviously. Make sure you can play some content on it.

The username and password associated with that Sonos system.

A Control Integration, complete with a Client Key and Secret, with the redirect uri set to ‘http://localhost:3001/redirect’.

A machine capable of running a node version that supports at least ES7. I’m using node version 8.6.0 on my Mac.

Some topical music. Let’s do the obvious thing and put on some Talking Heads.

We’ll also be using a few external npm packages to help us with some app infrastructure. These are things that aren’t important to learn about for this blog post, but that are still required for our final app to run:

google-tts-api: This puts a nice, neat, promise-ready wrapper around the Google Text-to-Speech API.

simple-oauth2: A handy package to simplify the process of getting and refreshing access tokens.

node-persist: Mimics the HTML5 localStorage API used in browsers, so it’s pretty easily understood.

These packages are all installed automatically when we execute our npm install command in the next step.

Preparing the App

In a directory of your choice clone the github repo and cd into the newly-created directory. Type npm install and wait for it to run through the install process. Next, copy the .env.sample file into a new file called simply .env . Edit this new file and fill in your Sonos client id and client secret, obtained from the developer portal. After this you should have everything set up.

I built this app using React to drive the front-end. React is pretty new to me and I’ve really enjoyed learning about how to put such an app together. I had coincidentally just read this blog post by Phil Nash over at the Twilio blog. (As an aside, you should put that blog on your RSS reader of choice. Consistently great content.) The app structure Phil lays out here seemed to meet all of the needs I anticipated for my app. I cloned his repo and used that as a base for Sonos TTS.

App Architecture

The app we’re building here today consists of a front-end App, built in React, and a back end server. The React app makes calls to the back-end server to get data needed for the front-end UX. It also sends text to the back-end to speak. The back end interfaces with the Sonos auth and API servers. It keeps track of the access tokens that get generated during auth. The front-end app has no real idea that it’s working with Sonos. All the “smarts” are consolidated in the back end.

We should note that the back-end server we’re building here is completely unsecured. It shouldn’t be run anywhere except on your local machine. It’ll have access to your Sonos household and will store OAuth 2.0 access and refresh tokens.

Running the App

As Phil notes in his blog post you can choose to run this app as separate back end server and front-end processes. This is useful in the case where you do plan to run the server and front-end on different machines or instances. We’re going to run everything locally, so we’ll take advantage of the script Phil made to run both server and front-end simultaneously. Type npm run dev and wait for things to spin up. Your browser should automatically be brought to the foreground and the app will start up.

If this is your first time running the app you’ll immediately be redirected to the Sonos auth servers to log in to your Sonos account. Once you’ve done so you’ll be sent back to the main app screen.

In the screen above you can see we’re presented with a list of speakers in our household and a box in which to type the phrase we want the Sonos to say. Go ahead and pick some speakers, type something (might I suggest “Sonos speakers sound great!”?) and see what happens. Hopefully, your Sonos just talked to you.

There are a few things to note here:

If you’ve got multiple households associated with your Sonos account, you’ll have an extra select list so you can choose which household to target. You can have multiple households if, for example, you’ve got Sonos set up at both your primary residence and vacation home.

Remember above where I said that the audioClip namespace is still experimental? Well, one of the things that isn’t fully baked yet is a capability flag, called AUDIO_CLIP . Using this flag a player indicates its ability to actually play audio clips. Until that flag is available this app will just list all speakers. If the user selects a speaker that can’t play audio clips, the app will return an error. At the time of this writing only the Sonos One and Beam support audio clips.

Now that we’ve built and run the app let’s dig into the details to see how we did it.

Authorizing the App with Sonos

There are a few interesting parts of the code to look at. First let’s examine how we set up simple-oauth2 to work with Sonos. There are two main things we need to configure: the various auth endpoints and API keys and secrets, and the redirect handler. (For a quick refresher on authenticating against Sonos’ OAuth2.0 server, see our docs.)

Luckily for us, simple-oauth2 makes this all, well, simple. They provide a nice set of convenience methods for defining the OAuth2.0 parameters and for providing the authorization URLs.

const oauth2 = simpleOauthModule.create({ client: { id: process.env.SONOS_CLIENT_ID, secret: process.env.SONOS_CLIENT_SECRET, }, auth: { tokenHost: 'https://api.sonos.com', tokenPath: '/login/v3/oauth/access', authorizePath: '/login/v3/oauth', }, }); // Authorization uri definition const authorizationUri = oauth2.authorizationCode.authorizeURL({ redirect_uri: 'http://localhost:3001/redirect', scope: 'playback-control-all', state: 'blahblah', }); 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 const oauth2 = simpleOauthModule . create ( { client : { id : process . env . SONOS_CLIENT_ID , secret : process . env . SONOS_CLIENT_SECRET , } , auth : { tokenHost : 'https://api.sonos.com' , tokenPath : '/login/v3/oauth/access' , authorizePath : '/login/v3/oauth' , } , } ) ; // Authorization uri definition const authorizationUri = oauth2 . authorizationCode . authorizeURL ( { redirect_uri : 'http://localhost:3001/redirect' , scope : 'playback-control-all' , state : 'blahblah' , } ) ;

That second constant, the authorizationUri , is built by the authorizeURL method. It’s really handy because it encapsulates everything that’s important in the initial call to the authorization code endpoint. So a simple redirect to authorizationUri is all that’s needed to kick off the auth flow.

At this point the user is sent to the Sonos authorization site. They’ll log in to their account and read about the permissions your app is asking for. After having granted those permissions, they’re sent back to the redirect URI. That URI was specified when the Control Integration was built on the dev portal, and is handled by our app. The handler for that URI takes the authorization code and exchanges it for an access token via the Sonos auth endpoints.

// redirect service parsing the authorization token and asking for the access token app.get('/redirect', async (req, res) => { const code = req.query.code; const redirect_uri = 'http://localhost:3001/redirect'; const options = { code,redirect_uri, }; try { const result = await oauth2.authorizationCode.getToken(options); console.log('The resulting token: ', result); token = oauth2.accessToken.create(result); // Save the token for use in Sonos API calls await storage.setItem('token',token); // And save it to local storage for use the next time we start the app authRequired = false; // And we're all good now. Don't need auth any more res.redirect('http://localhost:3000'); // Head back to the main app } catch(error) { console.error('Access Token Error', error.message); return res.status(500).json('Authentication failed'); } }); 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 // redirect service parsing the authorization token and asking for the access token app . get ( '/redirect' , async ( req , res ) = > { const code = req . query . code ; const redirect_uri = 'http://localhost:3001/redirect' ; const options = { code , redirect_uri , } ; try { const result = await oauth2 . authorizationCode . getToken ( options ) ; console . log ( 'The resulting token: ' , result ) ; token = oauth2 . accessToken . create ( result ) ; // Save the token for use in Sonos API calls await storage . setItem ( 'token' , token ) ; // And save it to local storage for use the next time we start the app authRequired = false ; // And we're all good now. Don't need auth any more res . redirect ( 'http://localhost:3000' ) ; // Head back to the main app } catch ( error ) { console . error ( 'Access Token Error' , error . message ) ; return res . status ( 500 ) . json ( 'Authentication failed' ) ; } } ) ;

You can see above the simple-oauth2 method getToken which takes care of all the behind-the-scenes stuff for us. Everything is nice and straightforward since we configured simple-oauth2 at the beginning to plug directly in to Sonos’ auth server. We get the token back and save it, using node-persist, to local storage. That way we don’t have to ask the user to log in every time we restart the app. Now obviously local storage is not how you’d want to persist access tokens in a production app, but this simple method works for our purposes.

Talk To Me

Ok, we’ve got our access token and can now make calls to the Sonos Control API. You’ll note that in the code above, once a token is successfully fetched and saved, we send the user back to localhost:3000 which is the URL for our main app. They’ll see the main app screen, shown above. Behind the scenes the app has called our /households endpoint which gets a list of Sonos households associated with the authorized account:

try { hhResult = await fetch(`https://api.ws.sonos.com/control/api/v1/households`, { method: 'GET', headers: { 'Content-Type': 'application/json', 'Authorization': `Bearer ${token.token.access_token}` }, }); } catch (err) { res.send(JSON.stringify({'success':false,error: err.stack})); return; } 1 2 3 4 5 6 7 8 9 10 try { hhResult = await fetch ( ` https : //api.ws.sonos.com/control/api/v1/households`, { method : 'GET' , headers : { 'Content-Type' : 'application/json' , 'Authorization' : ` Bearer $ { token . token . access_token } ` } , } ) ; } catch ( err ) { res . send ( JSON . stringify ( { 'success' : false , error : err . stack } ) ) ; return ; }

You can see in the code above that we’ve inserted an Authorization header with our recently-fetched access token.

It’s important to note that the /households endpoint that the front-end app is calling is not the Sonos Control API command, but is to the back-end server we’ve built. Remember, the front-end app doesn’t know anything about Sonos. The back end is taking care of all the calls to Sonos as well as handling all authentication.

I’ve built a little bit of UX goodness in to the app. If there’s only one household available for the account, the household select list is not displayed. This is the case for the vast majority of accounts out there. Once the user picks a household (or the single household has been automatically selected), the app calls our /clipCapableSpeakers endpoint. Again, /clipCapableSpeakers is a custom endpoint we’ve built on our back-end server.

try { groupsResult = await fetch(`https://api.ws.sonos.com/control/api/v1/households/${household}/groups`, { method: 'GET', headers: { 'Content-Type': 'application/json', 'Authorization': `Bearer ${token.token.access_token}` }, }); } catch (err) { res.send(JSON.stringify({'success':false,error: err.stack})); return; } const groupsResultText = await groupsResult.text(); let groups; try { groups = JSON.parse(groupsResultText); if (groups.groups === undefined) { // If there isn't a groups object, the fetch didn't work, and we'll let the caller know res.send(JSON.stringify({'success': false, 'error':groups.error})); return; } } catch (err){ res.send(JSON.stringify({'success':false, 'error': groupsResultText})); return; } const players = groups.players; // Let's get all the clip capable players const clipCapablePlayers = []; for (let player of players) { if (!player.capabilities.includes('AUDIO_CLIP')) { // Remember when I said above that AUDIO_CLIP capability isn't implemented? So here we'll look for the "!" boolean clipCapablePlayers.push({'id':player.id,'name':player.name}); } } res.send(JSON.stringify({'success':true, 'players': clipCapablePlayers})); 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 try { groupsResult = await fetch ( ` https : //api.ws.sonos.com/control/api/v1/households/${household}/groups`, { method : 'GET' , headers : { 'Content-Type' : 'application/json' , 'Authorization' : ` Bearer $ { token . token . access_token } ` } , } ) ; } catch ( err ) { res . send ( JSON . stringify ( { 'success' : false , error : err . stack } ) ) ; return ; } const groupsResultText = await groupsResult . text ( ) ; let groups ; try { groups = JSON . parse ( groupsResultText ) ; if ( groups . groups === undefined ) { // If there isn't a groups object, the fetch didn't work, and we'll let the caller know res . send ( JSON . stringify ( { 'success' : false , 'error' : groups . error } ) ) ; return ; } } catch ( err ) { res . send ( JSON . stringify ( { 'success' : false , 'error' : groupsResultText } ) ) ; return ; } const players = groups . players ; // Let's get all the clip capable players const clipCapablePlayers = [ ] ; for ( let player of players ) { if ( ! player . capabilities . includes ( 'AUDIO_CLIP' ) ) { // Remember when I said above that AUDIO_CLIP capability isn't implemented? So here we'll look for the "!" boolean clipCapablePlayers . push ( { 'id' : player . id , 'name' : player . name } ) ; } } res . send ( JSON . stringify ( { 'success' : true , 'players' : clipCapablePlayers } ) ) ;

After making a GET /groups call to the Sonos Control API, our back-end sorts through the resulting list of players provided in the response. Normally we’d only select those players that have the AUDIO_CLIP capability flag. However, at the time of this writing, that flag has not been implemented. We’ll return all the players and let the user decide which will work with audio clips.

Now the user can select a speaker and in the text box below type something for the speaker to say. After hitting submit, we finally call our custom /speakText endpoint on our back end. The handler for this endpoint receives the text to speak and the selected speaker id. The first thing it does is call the google tts service to turn that speech text into a URL that will play the spoken text:

try { // Let's make a call to the google tts api and get the url for our TTS file speechUrl = await googleTTS(text, 'en-US', 1); } 1 2 3 try { // Let's make a call to the google tts api and get the url for our TTS file speechUrl = await googleTTS ( text , 'en-US' , 1 ) ; }

We take the returned URL, add it to our request body, and make a POST to /audioClip on the Sonos Control API.

const body = { streamUrl: speechUrl, name: 'Sonos TTS', appId: 'com.me.sonosspeech' }; let audioClipRes; try { // And call the audioclip API, with the playerId in the url path, and the text in the JSON body audioClipRes = await fetch(`https://api.ws.sonos.com/control/api/v1/players/${playerId}/audioClip`, { method: 'POST', body: JSON.stringify(body), headers: { 'Content-Type': 'application/json', 'Authorization': `Bearer ${token.token.access_token}` }, }); } 1 2 3 4 5 6 7 8 9 const body = { streamUrl : speechUrl , name : 'Sonos TTS' , appId : 'com.me.sonosspeech' } ; let audioClipRes ; try { // And call the audioclip API, with the playerId in the url path, and the text in the JSON body audioClipRes = await fetch ( ` https : //api.ws.sonos.com/control/api/v1/players/${playerId}/audioClip`, { method : 'POST' , body : JSON . stringify ( body ) , headers : { 'Content-Type' : 'application/json' , 'Authorization' : ` Bearer $ { token . token . access_token } ` } , } ) ; }

If everything went well here, the user’s speaker just spoke their typed text. Imagine the possibilities!

Wrap Up

We did a few cool things here: we built a simple React app, We actually implemented authorization against the Sonos servers, and we made a few calls to the Sonos Control API. The end result of all this work is that now our speakers can talk to us.

A really neat next step here would be to secure the back-end server side of this code, put it in the cloud, and have your own private Sonos TTS service. You could hook up any kind of front-end to that you want. Maybe some IFTTT Webhooks? That’d go a long way towards implementing the “announce the football score” scenario I noted at the beginning of this post.

Again, you’ll find everything you need at the github repo. Head over and check it out.

Thanks for reading this post, and building a basic Sonos TTS app with us. We’re really excited about all the things developers and partners will do with audioClips.

– Matt Welch – Principal Developer Advocate

Currently listening to C’est La Vie No.2 by Phosphorescent