‘Current circumstances’ have led to a lot of previously live events reconfiguring themselves into online versions – comedy nights, pub quizzes, even sick pervert stuff like Dungeons & Dragons and improv.

I help to run Maraoke, a comedy karaoke night where the songs are all rewritten to be about having sex with Pokemon video games. Having had every live gig pulled for the foreseeable future we decided to look at the possibility of an online equivalent.

Online karaoke presents 2 key problems:

The first is about timing. In case for some reason you don’t know what karaoke is: you are provided with the backing track for a song accompanied by a video indicating what the words and when to sing them. You sing them?

Introducing the almost inevitable lag of an online connection ruins this – it would have to vanishingly small (and reliably so) for the singer not to be noticeably behind the music. But even if you could fix this, you’ve still got a logistics problem:

At a live karaoke night, all the participants have to do is give the organisers their name and the song they want to sing. Then sing it.

What’s the online equivalent for this? You could just set up a Zoom call and let people take turns to sing – but how do you get them the backing track and lyrics? If you stream them to the singer, there’s lag. If they’re at the singer’s end, they need to share them back ‘down the line’, which necessitates walking everyone through a fiddly process which will probably vary depending on what version of what software on what operating system on what device they’re using, because computers. (Sorry, you cannot Maraoke because you have installed Skype for Business, please install Skype for Pleasure, etc.)

We experimented with various bits of existing software and there didn’t seem to be a particularly happy solution other than building something ourselves, which was probably not realistic.

Or was it? I did some reading and turns out the modern web browser has quite a lot of interesting stuff built into it: it is relatively easy to get a browser window on my computer to send and receive video and audio to a browser window on your computer, without installing anything special at either end.

WebRTC, as it’s called, just requires some stuff on the server to help the browsers at either end do some initial negotiation to get a more direct connection set up, and we handily already have a lot of what’s needed built into the browser-based software we use to run live nights. (This post has some information about how we use this tech, Rail’s Action Cable, as a sort of ‘remote control’.)

I hastily threw together a prototype for us to try on what would have been our April live date.

The singer saw something like this:

Ideally the host and singer are two different people.

A green-backgrounded video of one of our songs*, the output of their camera in green and the output of the host’s camera in blue.

The host saw something like this:

TFW you’ve lost all sense of perspective

With the singer’s video and the song video merged into one video at the top, and the output of their own web camera at the bottom. There’s no particular need for the singer to see the host but I was basing this off an example I’d found that was for 2 way communication and wasn’t confident enough to start pulling bits out of it.

The host can then captures the singer video in a piece of kit called OBS which can put the lyrics on top of the video with some Lord of the Rings-style HOLLYWOOD MAGIC, and push the result out over our streaming service of choice, Twitch, where it looks something like this:

Yeah Yeah

Here’s a rough diagram of what the technology is doing to make this whole process happen:

What actually happens at the host end is a lot more complicated than this and on the night involves a hell of a lot of coordination between Ste (who does the on-screen hosting) James (who organises the singers while simultaneously switching between different video and audio feeds as needed), while I can largely just drink beer and loftily claim that I’ll “fix it tomorrow”. This would, ironically, be much easier if we could all be in the same place at once rather than miles apart from each other.

The trial run in April was in some ways a success – we managed to get some singers singing songs and stream that LIVE to the internet, but was very far from perfect.

Problem 1: Sometimes we couldn’t hear people. At all. We were asking the browser to get permission to use their microphone, but not asking WHICH microphone – a computer can have several audio input devices and it turned out the browser was not necessarily using the most obvious one. So I added a ‘soundcheck’ stage so people can sort that out their end and check that the browser can ‘hear’ them before they sign up to do a song.

Problem 2: Getting the overlaid lyrics from a video over the WebRTC connection doesn’t result in very good or consistent presentation – the video is more likely to ‘stutter’ than the audio and on a bad connection the quality will drop, making them look rubbish.

To solve this I added a copy of the video to the host end – we send timing data from the original video across the connection to sync them up, then I do the Lord of the Rings-style HOLLYWOOD MAGIC to put them on top of the singer in the browser, meaning we can have a bigger video window to capture and send to the stream.

I also did a bit of tidying up, removing the video of the host (while allowing an optional audio connection so that Ste can talk to the singer at the start/end of a song if he wants) and adding a bit more feedback about what’s going on and whether things are working.

What the singer sees while waiting to be connected

What the host sees when a singer is connected and a song is playing.

In terms of what’s going on with the tech this makes the process a little more complicated:

But in practical terms once you’ve got the WebRTC stuff working, changing what you’re sending across it is relatively simple.

Problem 3: Wrangling the singers…

For the first attempt our method of getting the right singer connected at the right moment was to have pairs of singer/host links with a matching (randomly generated) keyword parameter, e.g. ‘snake’. This made it a massive faff to keep track of who was ready to sing then getting them the right link. So for ‘version 2’ I simplified things (by making them much more complicated) – as well as the video window, the host now has a 2nd window to control it – they can see who’s online and ready to sing, then connect to them when ready. The singer just has to complete the ‘soundcheck’ and pick a song.

The first official Maraoke Lockdown took place last Friday (May 1st), and was a massive improvement on our earlier attempt – we lost the recording due to some slightly confusing Twitch settings, but we think something like 27 songs were sung over the course of the evening, not counting some inevitable false starts. Some of them were from other countries! It turns out they have the internet there too!

The main technical issue** was that occasionally the backing track from the singer’s end wasn’t reaching the host end – so a few people were giving unexpectedly a capella performances until we reconnected and started again. Not entirely sure why yet – either we need to mix the two audio tracks into one at the singer end, or something is causing the stream of the video’s audio not be picked up at all, but it feels pretty fixable, especially as the actual connections seem to be pretty reliable now.

The other issues are largely around making it easier for the host to manage the queue of singers, and improving communication with people signing up to sing – the whole process needs to be a bit clearer in terms of what to expect, what they have to do (e.g. WEAR HEADPHONES, TURN YOUR RADIO OFF, ETC) and when, but at least we’re getting clearer about what we need to communicate even if we haven’t managed to do it entirely successfully yet.

And given that 3 weeks ago we had absolutely nothing (well, 500 comedy karaoke songs), I reckon that’s pretty good. We may not have developed the cure for coronavirus, but last night we livestreamed a cat singing a version of 99 Luftballons rewritten, in German, to be about an aeroplane shooting game (I think? I don’t speak German). And isn’t that the real victory?

Turns out if you let people pick which video input they want to send down the stream, this happens.

Is any of the stuff I’ve figured out doing this useful for doing something that isn’t slightly weird video game-based comedy karaoke? Who knows, but happy to talk to anyone who thinks it might (e dot jefferson at gmail dot com/@edjeff). Oh, and you can follow us on Twitch now!

—

* The visual elements of the songs are normally animated on the fly in the browser but there’s no simple way of pushing that animation into a WebRTC connection. There IS a slightly awkward way of capturing tabs as video built into modern browsers and you can save the result off to a file rather than piping it down the internet, so I was able to knock up an automated ‘video recorder’ to save us manually screen recording 500 songs.

** Well, also none of this works in anything other than Chrome yet, but Firefox should support all the tech with a few adjustments and apparently Microsoft Edge is basically Chrome now???? Apologies to IE6 users at this difficult time.