How I failed to replicate an $86 million project in 1 line of code

When an experiment with existing open source technology just cherry-picks results to make it look good

The Medium article “How I replicated an $86 million project in 57 lines of code” has been doing the rounds the last few days, describing how an automated license plate recognition (ALPR) system being developed for the Australian Victoria Police could just use the open-source ALPR system OpenALPR instead. This is basically the article-length version of the ubiquitous outraged “Why does this need $X? I could code that up in a weekend!” comments made on any sufficiently-mundane (or complicated!) tech rollout.

However, since OpenALPR is free* and open-source, we can test just how plausible this claim is.

Ignoring the boring stuff like getting OpenALPR working on your local computer, let’s jump straight to trying to automatically pull license plates out of a dashcam video. For my test video I picked “Drive around Bendigo”, a thrilling 27 minute YouTube video of someone “Driving around Bendigo, Victoria, Australia,” which I felt would be a somewhat representative test, as it’s 1080p car footage that’s quite clear and from around the area where the system will be deployed. After downloading it with youtube-dl I fed it to OpenALPR with time alpr --clock -n 1 'Drive around Bendigo-hrD75ebjCms.mp4' > bendigo.txt and let it churn for…

Hm. Well that’s a problem. Processing my 27 minute video took just over 3.5 hours on my 3.5GHz Core i7. Not exactly real-time. Pencil in a few quid for “optimization” and a few more for “ultra-beefy computer hardware in every patrol vehicle” I guess.

OpenALPR processing time. Yikes.

Anyway, on to the results! Gotta spend CPU cycles to make…catching thieves easier, as the saying goes. Let’s filter the results down to just the potential license plates with fgrep confidence bendigo.txt . Pipe that in to wc -l and it looks like we’ve got 6,137 potential plates (or 1,653 if we filter those down to unique plate numbers). Not bad! Wait, that seems like a lot. Let’s take a closer look:

fgrep confidence bendigo.txt| cut -d' ' -f 6 | sort -u | shuf | head

SURANT

1IR9DT

111DIDI

ARR0W1

I311

SGRD

D1R91T

0DI10D

II000

1ID1

Some of these seem…bad. Ok, no big deal, let’s deploy some of the “very straight forward code-first fixes” proposed in the article like adopting “a threshold […] that only accepts a confidence of greater than 90% before going on to validate the registration number.”

Running fgrep 'confidence: 9' bendigo.txt | cut -d' ' -f 6 | sort -u to cut it down to just the 90%+ confidence plate numbers and filter them to only the unique ones, what do we get?

0G700 HERE M5ER TUG700 WKX2D2

0NRED HM5ER MP356 TUG70Q X036

1HM5ER IUG700 R1GHT TUG70U XP036

1IR9IT JG700 R1LV TUG7Q0 XS036

1ZZ735 KEEP SLV522 TZ2735 XSP036

DH0SAHUT KX212 SLV52Z TZZ735 XSP036E

ERGE KX2D2 SLV5Z2 UG700 XSPQ36

ERQGHT KXZ12 T0G700 UG70U YLJ641

G700 LANE T0G70U VKX212 YLJ64D

GR1L LJ641 T2Z735 WKX212 YLJ64I

GRILL LV522 TDG700

OK, so we still have some apparent duplicates and recognition errors, and presumably the registration validation will sort these out. Checking these with the VicRoads site, we wind up with a grand total of seven automatically-recognized “valid” plates for 27 minutes of video.