People in the Portland metro area with smartphones may not realize it, but they have digital clones.

After months of preparation, the initial phase of Portland’s project employing city mobility software from Sidewalk Labs, the controversial startup owned by Google parent Alphabet, is underway.

If all goes as planned, Portland will launch a year-long pilot of the Replica software, costing nearly $500,000 in total. In exchange, Portland gets access to a massive dataset that mirrors how people actually move throughout the city and its surroundings.

The purpose is to regularly query, for example, timely insights into what worker commutes entail, what the impacts of Uber and Lyft are on traffic congestion, and how many cyclists use protected bike lanes such as those along high-trafficked areas like Governor Tom McCall Waterfront Park.

“We’ll be looking to Replica to explore a number of questions about major issues in our region like equity, safety, and congestion,” said Eliot Rose, technology strategist at Portland’s Metro, one of the three agencies chipping in to pay for the $457,000 project also funded by TriMet and Portland’s Bureau of Transportation.

One might expect such an effort to entail surveillance cameras, sensors, or other physical tracking devices, but that’s not what this is about. Instead, the Replica software uses deidentified mobile location data showing how people actually move throughout the city to generate a mock modeled populous engaged in simulated travel activities.

The Replica system associates nuanced trip patterns with each persona engaged in home, work, shopping, eating or recreational activities. Sidewalk Labs said its data comes from mobile app publishers, mobile location data aggregators and telcos. Both Sidewalk Labs and Portland Metro confirmed that no sensors, beacons, cameras, IoT devices or WiFi hotspots or WiFi kiosks will be used or installed as part of the firm’s partnership in Portland.

Still, there are questions about the specific origins of the data used in Replica.

“If a city is going to use a system, it has a responsibility to have full transparency about where all of the data is coming from, how it is being deidentified and to what level, and if that data is reused again or stored by Replica or Sidewalk or passed to its parent company,” said Pam Dixon, executive director of Oregon-based nonprofit World Privacy Forum. “There’s too much that we don’t know.”

To assess how transportation options affect low-income worker commutes, Replica may include demographic information such as deidentified credit- and census bureau data to estimate age range, race, gender, household size and income-levels. Portland’s own data showing vehicle, bicycle and pedestrian counts, TriMet transit ridership data, and estimates from Metro’s own travel model can also be added.

Here’s how Replica explains its data collection procedure:

De-identified mobile location data: We use de-identified mobile phone location data to generate travel behavior models — basically, a set of rules that represent how a person makes choices on where, when, why, and how to travel. Synthetic population generation: Separately, we use aggregate demographic information to create what planners call a “synthetic population”. This is a virtual population that is statistically representative of the real population. Computer simulation: We then give each person in the virtual population a travel behavior model and use computer simulation to generate a week of activities–helping us confidently replicate trip patterns across a city or metro area.

In an effort to understand how people travel to and from Portland throughout the region, the Replica system will encompass Multnomah, Clackamas, Washington, and Clark Counties.

Other cities including Chicago and Kansas City will also test Replica.

Pros and cons of a synthetic city population

Some say they like this synthetic population approach because it provides a reflection of what’s likely happening on city streets, arguably without the privacy invasions associated with surveillance and identifiable information. It’s especially attractive when compared to how governments historically gather commuter information through studies that generate static data with a limited shelf-life.

Now, the argument goes, they can skip the study and tap the replicated database to gauge the impact of a new light rail line or a street closure.

“This is all information that we would otherwise have to collect by sending people out into the field to do counts and surveys, which is very expensive and time-consuming,” said Rose.

Importantly, with location data from mobile devices in people’s pockets, they can track movement patterns of walkers, bikers, skaters and others not traveling in more-trackable vehicles. “We haven’t seen any tools that have offered the kind of information Sidewalk Labs is offering regarding pedestrian and cyclist mobility,” said Rose.

The city will not receive any actual mobile location data through the pilot; rather, it accesses an online platform that allows staff to query and filter data based on the simulated population.

Despite the privacy-safe assurances, data privacy and security experts question whether mobile location data, even when deidentified, is safe from re-identification if leaked, hacked or obtained by law enforcement. Others worry about the consequences of predictive modeling or machine learning techniques that determine with high levels of accuracy whether a simulated person might visit a particular location at a particular time.

“People who are moving within cities need to be able to trust that their data is truly de-identified and will never come back at them — for example, through a judicial process or law enforcement process — and would not be used against them in a discriminatory or unfair manner,” said Dixon.

According to a Portland Bureau of Transportation spokesperson, the contract with Sidewalk Labs prevents law enforcement or any entities other than Metro, PBOT or TriMet from accessing Replica software or data. But lots of data-related questions remain; rules dictating access to the information used to build the synthetic models, or the data reflecting their behavior, are unclear.

Portland City Council voted to use the Replica software in December. The city now awaits an initial dataset from Sidewalk Labs that it will test against criteria the three agencies have devised, possibly by July. If it passes muster, it will begin paying 12 cents per resident to access the software for a year, receiving refreshed data each quarter.

Rose said the city is testing whether Replica can be used to identify people. “If we find that it can, we intend to terminate the agreement,” he said.

The Portland City Council intended to vote on a privacy resolution requiring the city to ensure transparency and accountability in its data use and collection on Wednesday, but the vote was postponed.

How Replica got its start

Research partially funded by AT&T and the State of California Department of Transportation conducted at UC Berkeley and presented in a 2017 paper offers a glimpse into the origins of the Replica system. “One of our goals is to enable activity based travel demand models that use cellular data to create synthetic agent travel patterns without compromising the privacy of cell phone users,” states the paper.

One of the researchers involved in the study, Alexei Pozdnoukhov, currently serves as director of research at Sidewalk Labs. Another researcher on the project mentioned his internships at Sidewalk Labs and AT&T in his dissertation.

Not only was the research funded by AT&T, it employed data from the company, as confirmed by a Sidewalk Labs spokesperson. The paper notes the research employed anonymized and aggregated Call Detail Record logs “collected in Summer 2015 by a major mobile carrier in the US, serving millions of customers in the San Francisco Bay Area.” CDR logs feature various details of mobile calls including time, duration, completion status, source number, and destination number. AT&T did not respond to requests to comment for this story.

Several companies pool mobile location data over time to build models for things like ad targeting, marketing insights, or to help municipalities plan development. Those data providers rarely reveal where the information comes from.

Sidewalk Labs is not alone in its apparent use of raw cellphone data, which is made available to businesses directly from telcos as well as through third party data firms.

A New York Times article this past December revealed how dozens of companies sell anonymous smartphone location data to advertisers, retailers, and even hedge funds, without the knowledge of individual users.

Sidewalk Labs launched in 2015 but has already drawn heat in conjunction with its highly-scrutinized partnership with a Toronto revitalization group aiming to turn the Quayside waterfront neighborhood into a “global hub” of “urban innovation.”

Despite introducing prototypes of heated hexagonal paving systems and “building raincoats” that protect sidewalks from the elements, the plans for Toronto mostly remain a mystery. Local activists there express concerns about Sidewalk’s lack of meaningful community engagement, as well as its proposal to fund Toronto land development and garner tax revenue through a controversial bond scheme.

All of the stakeholders have an interest in this data so it needs to be managed with the input of all of the stakeholders.

In April, the Canadian Civil Liberties Association filed a lawsuit against Waterfront Toronto and Canadian government entities, arguing that “the process that resulted in the Quayside agreements was not transparent, reasonable or accountable.” Sidewalk Labs has also spurred criticism for its efforts to guide government policy for city data management.

Questions around city data use, ownership and security are immensely important. But the Sidewalk Labs partnership in Toronto also demonstrates the need to ensure that citizens can participate in decisions regarding technology use that ultimately affects government policy.

“There’s a different level of responsibility for municipalities,” said Dixon. “All of the stakeholders have an interest in this data so it needs to be managed with the input of all of the stakeholders.”

Whether Portland city residents are aware of and understand the Replica project is questionable. When asked about government outreach involving Replica, PBOT pointed to the City Council’s public hearing on the issue and a Civic Data Forum hosted in January by Smart City PDX, the city’s group overseeing emerging tech projects in Portland. That event was intended to “foster engagement with and outreach to diverse stakeholders in the community impacted by data collection,” according to the Smart City PDX website.

Though it does not oversee the Replica project, Smart City PDX was created to guide Portland’s data and technology investments. Smart City PDX Manager Kevin Martin admitted the city could do a better job of communicating the value proposition for tech projects such as Replica to residents.

“I think that’s on us to really figure out how we better communicate,” he said. “How innovations around data and technology can make your life better and how you need to work with us to make sure that things that happen with data and technology don’t make your life worse.”