There’s a lot going on in the background when you visit most websites. Scripts may be providing analytical data about what you click on. Trackers may link your activity back to your social media accounts. In fact, one type of script allows whoever owns the website you’re perusing to literally watch whatever you’re doing. Called “session replay” scripts, these services record everything you type, where you move your mouse, and more. This isn’t anonymized data collection–it’s very personal. It’s “as if someone is looking over your shoulder,” write the Princeton computer science researchers Steven Englehardt, Gunes Acar, and Arvind Narayanan.

Englehardt, Acar, and Narayanan, who are part of Princeton’s Center for Information Technology Policy, are studying these session replay scripts. These tools are supposed to help web developers and companies understand how users are interacting with their sites, so they can boost engagement and redo “broken or confusing pages.” In short, they’re like a little window into a user’s experience with your site–what one web design firm describes as creepy but useful. While the companies that provide this service claim to give website owners the option to hide their users’ personal information, the three researchers have found that in most cases, the scripts capture it anyway.

“Improving user experience is a critical task for publishers,” the trio writes. “However it shouldn’t come at the expense of user privacy.”

The researchers looked at seven popular session replay companies that offer the service–like Yandex, FullStory, Hotjar, and UserReplay–and found signs of scripts from one of these companies on 482 of the 50,000 largest websites. They found session replay evidence on the websites for HP, Comcast, Intel, Lenovo, Gap, Costco, Autodesk, Microsoft Windows, T-Mobile, Adobe, Nintendo, Crunchbase, Nest, Walgreens, and more (the full list is here). Chances are, you’ve been on one of these sites at some point, and maybe even plugged in your credit card information to buy something.

This isn’t the same thing as general analytics tracking, which is aggregated and anonymous. The research shows that highly personal data like credit card numbers, health information, addresses, and more is likely sitting in third-party servers–and they could even be tied directly to your identity.

While session replay companies typically provide tools to redact this kind of personal information from their recordings, the researchers found that these tools don’t work very well. They set up test websites to observe how each script functioned and learned that companies vary greatly in what information they redact and how they do it. Some redact your credit card information only to record your date of birth and social security number; some protect your password and nothing else. Others conceal any personal data you enter–a better solution–but still reveal the length of your name and password.

The researchers point out that these practices put much of the burden on the website creators, who can painstakingly go through the site manually and ensure that any type of identifying information is redacted from recordings. But this has to be constantly monitored and updated in the website’s back end because its code will change over time–which is expensive and error prone. And any slight modification to the site’s design would require an audit of the entire redaction system.