Yes, websites track your behavior online. But some go much further than what you'd reasonably expect, using so-called session replays to create a detailed log of everything you do and type on a site. And new research shows that in some cases these movie-like recordings are even storing your passwords.

Bulk data collection is always a privacy red flag. But the Princeton research group that first published findings about session replay scripts has uncovered a troubling series of situations where seemingly well-intentioned safeguards fail, leading to an unacceptable level of exposure.

The investigation started with Mixpanel, a product analytics company that offers a comprehensive user data collection service known as Autotrack. The company admitted in an email to its customers at the beginning of February that the feature had been unintentionally collecting password data, even though Autotrack includes heuristics meant to prevent that very thing. Autotrack isn't a session replay script, but it collects whole-hog user interaction data so that Mixpanel's clients can query later for any information about their users. Mixpanel corrected the password flaw and issued an SDK update, but the Princeton researchers—Steven Englehardt, Gunes Acar, and Arvind Narayanan—say they realized that these types of password redaction failures were probably a larger problem.

"It kind of snowballed and I think it’s likely that there are other design patterns out there that are also weakened," says Englehardt, a web privacy PhD candidate. "We’ve highlighted some, but we could continue to go down this road and find other things again and again just because of the way that these scripts are designed."

You Shall Not Password

Even after Mixpanel issued fixes for the password retention issue, the Princeton researchers still found situations in which Autotrack recorded passwords. The feature tries to avoid retaining passwords by automatically redacting input fields that have a name or ID that includes the term "pass." The limitations are obvious: A password field might, say, be named "pwd," or a site might use a language other than English.

'These leaks will happen no matter what unless they stop collecting all inputs from fields.' Günes Acar, Princeton

One prevalent example the group found centers on "Show Password" features—tools offered by many sites and browser extensions that allow users to see the password they're entering in plaintext so they can catch typos. The researchers discovered that on certain Mixpanel client sites, like testbook.com, the feature confused the password redaction protections. If a user clicked Show Password and then took any other action, like re-obscuring the password or editing it in the text field, Autotrack recorded the password, even if the user decided not to log in and didn't submit it. This happens when the Show Password feature stores the password in a second invisible field, so Autotrack is collecting it from that second field, which it doesn't know to classify as sensitive. The researchers found that this problem also came up when users added Show Password browser extensions to the mix, altering website behavior in ways neither the site nor its third-party services control.

"The structure of the rendered webpage is being modified, changing the type of input field from a password field to a regular text field. When this happens, Autotrack loses the ability to identify whether or not a field is being used to enter a password," Mixpanel said in a statement. "Per our documentation, if a customer is collecting sensitive information in non-password fields, they should explicitly blacklist it for collection."

Mixpanel has also put its entire Autotrack feature "on hold" in recent weeks, making the tool inaccessible to new users while the company "evaluate[s] how to provide seamless, easy integration of Mixpanel in a way that’s transparent and predictable to our customers." A spokesperson said that the company has realized that some of its customers didn't understand how much data Autotrack collected, and wanted more control over what information the tool retained. Mixpanel also says it is developing mechanisms to make it easier for customers to review the totality of the data the feature collects, so they can more quickly spot things that don't belong.