Troy Hunt is the founder of haveibeenpwned.com . This post originally appeared on his website and is reprinted with his permission.

I suspect we’re all getting a little bit too conditioned to data breaches lately. They’re in the mainstream news on what seems like a daily basis to the point where this is the new normal. Certainly the Ashley Madison debacle took that to a whole new level, but when it comes to our identities being leaked all over the place, it’s just another day on the Web.

When it’s hundreds of thousands of children including their names, genders and birthdates, that’s off the charts. When it includes their parents as well—along with their home addresses—and you can link the two and emphatically say “Here is 9-year-old Mary, I know where she lives and I have other personally identifiable information about her parents (including their password and security question),” I start to run out of superlatives to even describe how bad that is.

This is the background on how this little device and other online assets created by VTech requested deeply personal info from parents about their families which they then lost in a massive data breach.

Breach source, verification, and (attempted) disclosure

Let me set some context first because this is clearly a very serious incident, and it all began when I was contacted by Lorenzo Bicchierai earlier this week. Lorenzo writes for Motherboard and has often approached me for comments on security incidents in the past. This time, he wanted some help verifying a data breach that had allegedly come from VTech and contained millions of customer records. Someone had gotten in touch with him (I assume as they thought it might make a good story) and he was doing his journo due diligence thing.

Lorenzo passed on the data and I checked it out. I found 4.8 million unique customer e-mail addresses in one of the files and it “smelled” good, that is it didn’t have the typical hallmarks that often accompany a fabricated breach. However it wasn’t quite clear where the data had come from, I mean it’s not like you can just go to vtech.com and there’s a login box that tells you whether or not the account exists (incidentally, I did later discover an API that confirms the presence of an e-mail address at login time). I needed further verification, so I invoked the help of some Have I been pwned? (HIBP) subscribers.

In order to verify the data via HIBP, I had to call on some supporters. One of the features I added to HIBP very early on was the ability to subscribe to notifications:

This is a free service that sends you an e-mail if your account pops up in a data breach. To date, 290,000 people have signed up and verified their e-mail addresses (they need to receive an e-mail at that address and click a unique link). Now I’d always intended for this to be a feature that simply notifies people of breaches as appropriate, but I’ve realized lately that it also means I have an excellent source of individuals supporting the project who can help me verify future data breaches as well.

I took the e-mail addresses from the alleged VTech breach and found 18 recent HIBP subscribers who had a comprehensive set of data in the dump. I e-mailed them asking for support, essentially saying that I’d been passed a data breach that included their details and if they were willing to assist, I’d send them some non-sensitive data attributes to verify. This was usually their month of birth, the city they live in and the name of their ISP based on their IP address. All of these attributes were in the data breach and if the HIBP subscriber could confirm them and acknowledge they had a VTech account, I’d be confident it was legitimate.

I received six responses within 24 hours, every one of them confirming their data: Yes. That's accurate. I did register at vtech so I could download addons for a toy laptop. Yes. That's my data. no doubt about that, I registered a vtech account within the last few months. Yes, That looks like legitimate information. The service would be VTech’s Learning Lodge. Yes, that looks like me. I lived near [redacted city] at the time and my daughter had one of their pads. I believe we logged in so that we could download apps from their app store and possibly for firmware updates etc. Yes that is correct. It's an old address, I was with [redacted ISP] at the time so can verify this info ! I would have used the VTECH website for my daughter around that time too ! Yes I did access the VTech learning lodge in 2014 after purchasing a "Cora Cub" for my child. In order to personalize it's voice activated feature, you had to join the learning lodge. I was with the broadband provider [redacted ISP] at that time. I have since changed services, unfortunately to TALKTALK!

Can’t help but feel sorry for the last person!

This was more than enough to now have complete confidence in the legitimacy of the data. But before loading it into HIBP, it was essential that VTech be aware of the incident, too, so I pushed Lorenzo on what steps he’d taken. He’s detailed his attempts to get in touch with VTech in the article he published Friday titled "One of the Largest Hacks Yet Exposes Data on Hundreds of Thousands of Kids."

For many days, he simply couldn’t get anyone to talk with him despite the fact they did actually respond (and redirect him) multiple times. As we discussed this incident in the days following his initial contact, at multiple points we talked about means of getting in touch with them and he reached out via various channels time and time again.

Further Reading Database of 4 million Adult Friend Finder users leaked for all to see

It was reminiscent of my trials with 000webhost last month and frankly, I’m both staggered and appalled by the negligence these organizations are showing. Data breaches like this can be enormously damaging for both the customers and the online business alike, but while I’m enormously sympathetic to the former, when the latter actively ignores multiple attempts at private disclosure even when they know it relates to a serious security incident, it’s hard to feel too sorry for them.

But to their credit, VTech did eventually respond to Lorenzo and acknowledged that prior to his contact they were not aware of a data breach but have since identified an incident on November 14. This roughly corresponds with the dates in the files, although as I’ll show shortly, there are records allegedly created many days after this. In their response, VTech explains the following:

Upon discovering the breach, we immediately made modifications to the security settings on the site to defend against any further attacks.

Unfortunately, this is insufficient and I’ll explain why shortly. They go on to reassure Lorenzo that financial data is just fine:

It is important to note that no payment card or banking information was obtained. Our database does not contain any credit card information and VTech does not process or store any customer credit card data on the Learning Lodge website. To complete the payment or check-out process of any downloads made on the Learning Lodge website, our customers are directed to a secure, third party payment gateway.

Frankly, I couldn’t care less about credit cards, and as I’ve explained before, these statements are designed to appease the likes of PCI and are of little consequence to consumers when genuinely sensitive things—irreplaceable things—are lost by a company that suffers a data breach. Let’s take a look at just what they lost.

Understanding the data breach

Here’s what was originally provided to Lorenzo:

The file that immediately jumps out is the big guy at the top—parent.csv. This file has 4,862,625 rows and column headings as follows:

id

email

encrypted_password

first_name

last_name

password_hint

secret_question

secret_answer

email_promotion

active

first_login

last_login

login_count

free_order_count

pay_order_count

client_ip

client_location

registration_url

country

address

city

state

zip

updated_datetime

One of the first things I look at in a breach like this is how many unique e-mail addresses there are, as it helps establish whether there are duplicate records. In this case there were 4,833,678 occurrences matching the pattern I extract e-mail addresses on, a few less than the total rows, which is normal due to either duplicates, missing addresses or strings that don’t conform to what an e-mail address looks like (I have a pretty liberal regular expression pattern I use).

The next thing I checked was the passwords, and, while the column heading implies they’re encrypted, they’re not. The easiest way to check what’s going on with password storage is just to Google a few of the values stored in the database. For example, let’s take the very first one in the dump: 835af17f41292ba8ea3270f6859757ab

And here it is:

Their password is “welcome81.” It’s that simple. It’s just a straight MD5 hash, not even an attempt at salting or using a decent hashing algorithm. The vast majority of these passwords would be cracked in next to no time; it’s about the next worst thing you do next to no cryptographic protection at all. Speaking of which…

All secret questions and answers are in plain text. The questions are typical (albeit poor) examples such as your favorite color, where you were born, and your first school. In fact, you can see them in context in this screen from a video I’ll show a bit later:

This aligns with the columns from the parent.csv file I referenced earlier and gives me a high degree of confidence it’s at least one of the locations where parents would have entered data.

Normally this would be the end of the story when it comes to processing a data breach. I’d make the data searchable on HIBP, notify impacted subscribers, and that would be it. But it’s a different story this time and it’s because of those member CSV files. Let’s take a closer look.