Transport for London will roll out default wi-fi device tracking on the London Underground this summer, following a trial back in 2016.

In a press release announcing the move, TfL writes that “secure, privacy-protected data collection will begin on July 8” — while touting additional services, such as improved alerts about delays and congestion, which it frames as “customer benefits”, as expected to launch “later in the year”.

As well as offering additional alerts-based services to passengers via its own website/apps, TfL says it could incorporate crowding data into its free open-data API — to allow app developers, academics and businesses to expand the utility of the data by baking it into their own products and services.

It’s not all just added utility though; TfL says it will also use the information to enhance its in-station marketing analytics — and, it hopes, top up its revenues — by tracking footfall around ad units and billboards.

Commuters using the UK capital’s publicly funded transport network who do not want their movements being tracked will have to switch off their wi-fi, or else put their phone in airplane mode when using the network.

To deliver data of the required detail, TfL says detailed digital mapping of all London Underground stations was undertaken to identify where wi-fi routers are located so it can understand how commuters move across the network and through stations.

It says it will erect signs at stations informing passengers that using the wi-fi will result in connection data being collected “to better understand journey patterns and improve our services” — and explaining that to opt out they have to switch off their device’s wi-fi.

Attempts in recent years by smartphone OSes to use MAC address randomization to try to defeat persistent device tracking have been shown to be vulnerable to reverse engineering via flaws in wi-fi set-up protocols. So, er, switch off to be sure.

We covered TfL’s wi-fi tracking beta back in 2017, when we reported that despite claiming the harvested wi-fi data was “de-personalised”, and claiming individuals using the Tube network could not be identified, TfL nonetheless declined to release the “anonymized” data-set after a Freedom of Information request — saying there remains a risk of individuals being re-identified.

As has been shown many times before, reversing ‘anonymization’ of personal data can be frighteningly easy.

It’s not immediately clear from the press release or TfL’s website exactly how it will be encrypting the location data gathered from devices that authenticate to use the free wi-fi at the circa 260 wi-fi enabled London Underground stations.

Its explainer about the data collection does not go into any real detail about the encryption and security being used. (We’ve asked for more technical details.)

“If the device has been signed up for free Wi-Fi on the London Underground network, the device will disclose its genuine MAC address. This is known as an authenticated device,” TfL writes generally of how the tracking will work. (Ergo, this is another instance where ‘free’ wi-fi isn’t actually free — as one security expert we spoke to pointed out.)

“We process authenticated device MAC address connections (along with the date and time the device authenticated with the Wi-Fi network and the location of each router the device connected to). This helps us to better understand how customers move through and between stations — we look at how long it took for a device to travel between stations, the routes the device took and waiting times at busy periods.”

“We do not collect any other data generated by your device. This includes web browsing data and data from website cookies,” TfL adds, saying also that “individual customer data will never be shared and customers will not be personally identified from the data collected by TfL”.

In a section entitled “keeping information secure” it further writes: “Each MAC address is automatically depersonalised (pseudonymised) and encrypted to prevent the identification of the original MAC address and associated device. The data is stored in a restricted area of a secure location and it will not be linked to any other data at a device level. At no time does TfL store a device’s original MAC address.”

Privacy and security concerns were raised about the location tracking around the time of the 2016 trial — such as why TfL had used a monthly salt key to encrypt the data rather than daily salts, which would have decreased the risk of data being re-identifiable should it leak out.

Such concerns persist — and security experts are now calling for full technical details to be released, given TfL is going full steam ahead with a rollout.

London tube is rolling out wifi tracking of commuter smartphones, full scale. The announcement is full in "this is depersonalised data". But we need technical details. https://t.co/GLtgQTPWfR — Lukasz Olejnik (@lukOlejnik) May 22, 2019

A report in Wired suggests TfL has switched from hashing to a system of tokenisation – “fully replacing the MAC address with an identifier that cannot be tied back to any personal information”, which TfL billed as as a “more sophisticated mechanism” than it had used before. We’ll update as and when we get more from TfL.

The pseudonymisation almost at source may work, but the description of it as irreversible doesn't preclude linking attacks. Only "authenticated devices" included: those that use "free" wifi. Not clear from descriptions that wifi registration and authentication doesn't record MAC. — Eerke Boiten (@EerkeBoiten) May 22, 2019

Another question over the deployment at the time of the trial was what legal basis it would use for pervasively collecting people’s location data — since the system requires an active opt-out by commuters a consent-based legal basis would not be appropriate.

In a section on the legal basis for processing the Wi-Fi connection data, TfL writes now that its ‘legal ground’ is two-fold:

Our statutory and public functions

to undertake activities to promote and encourage safe, integrated, efficient and economic transport facilities and services, and to deliver the Mayor’s Transport Strategy

So, presumably, you can file ‘increasing revenue around adverts in stations by being able to track nearby footfall’ under ‘helping to deliver (read: fund) the mayor’s transport strategy’.

(Or as TfL puts it: “[T]he data will also allow TfL to better understand customer flows throughout stations, highlighting the effectiveness and accountability of its advertising estate based on actual customer volumes. Being able to reliably demonstrate this should improve commercial revenue, which can then be reinvested back into the transport network.”)

On data retention it specifies that it will hold “depersonalised Wi-Fi connection data” for two years — after which it will aggregate the data and retain those non-individual insights (presumably indefinitely, or per its standard data retention policies).

“The exact parameters of the aggregation are still to be confirmed, but will result in the individual Wi-Fi connection data being removed. Instead, we will retain counts of activities grouped into specific time periods and locations,” it writes on that.

It further notes that aggregated data “developed by combining depersonalised data from many devices” may also be shared with other TfL departments and external bodies. So that processed data could certainly travel.

Of the “individual depersonalised device Wi-Fi connection data”, TfL claims it is accessible only to “a controlled group of TfL employees” — without specifying how large this group of staff is; and what sort of controls and processes will be in place to prevent the risk of A) data being hacked and/or leaking out or B) data being re-identified by a staff member.

A TfL employee with intimate knowledge of a partner’s daily travel routine might, for example, have access to enough information via the system to be able to reverse the depersonalization.

Without more technical details we just don’t know. Though TfL says it worked with the UK’s data protection watchdog in designing the data collection with privacy front of mind.

“We take the privacy of our customers very seriously. A range of policies, processes and technical measures are in place to control and safeguard access to, and use of, Wi-Fi connection data. Anyone with access to this data must complete TfL’s privacy and data protection training every year,” it also notes elsewhere.

We need more details how tokenization is deployed. What risk scenarios were considered? Why do they call it "depersonalised" if TfL has access to data? The opt-out-from-tracking that Transports for London is suggesting is still the old one: turn off smartphone. #GDPR #ePrivacy — Lukasz Olejnik (@lukOlejnik) May 22, 2019

Despite holding individual level location data for two years, TfL is also claiming that it will not respond to requests from individuals to delete or rectify any personal location data it holds, i.e. if people seek to exercise their information rights under EU law.

“We use a one-way pseudonymisation process to depersonalise the data immediately after it is collected. This means we will not be able to single out a specific person’s device, or identify you and the data generated by your device,” it claims.

“This means that we are unable to respond to any requests to access the Wi-Fi data generated by your device, or for data to be deleted, rectified or restricted from further processing.”

Again, the distinctions it is making there are raising some eyebrows.

Some details in this wifi tube tracking data protection impact assessment. TfL will refuse subject access request saying it's not personal. But they have the mapping, and some devices can be "registered". So why not give information? #GDPR #ePrivacy https://t.co/zJciobgPsU pic.twitter.com/YJbrtxD1Fc — Lukasz Olejnik (@lukOlejnik) May 22, 2019

What’s amply clear is that the volume of data that will be generated as a result of a full rollout of wi-fi tracking across the lion’s share of the London Underground will be staggeringly massive.

More than 509 million “depersonalised” pieces of data, were collected from 5.6 million mobile devices during the four-week 2016 trial alone — comprising some 42 million journeys. And that was a very brief trial which covered a much smaller sub-set of the network.

As big data giants go, TfL is clearly gunning to be right up there.