California Consumer Privacy Act, PII and NLP

NLP for CCPA and Privacy

The California Consumer Privacy Act (CCPA) or AB 375 came into effect on 1 Jan 2020, it's a State of California law to protect the personal information of California residents among other things. Personally identifiable information (PII) is any data that can be used to identify a user/customer. General Data Protection Regulation (GDPR) is an EU law passed in 2018 giving similar rights to EU residents. With businesses collecting user data at ever-increasing scale all over the world, I think there will be more such laws passed by different governments everywhere.

CCPA

CCPA gives the below rights to California residents (taken as-is from here, Section 2) →

(1) The right of Californians to know what personal information is being collected about them. (2) The right of Californians to know whether their personal information is sold or disclosed and to whom. (3) The right of Californians to say no to the sale of personal information. (4) The right of Californians to access their personal information. (5) The right of Californians to equal service and price, even if they exercise their privacy rights.

In the case of data breaches and other violations, businesses have to pay fines as well.

PII

One of the goals of CCPA is to protect the PII information of consumers and this means companies need to develop tools to understand and store PII information in a safe manner. Natural language processing (NLP) and Natural language Understanding (NLU) technologies are going to play a key role as a significant percentage of data collected is in text form like names, email, addresses, social security numbers, health information etc. The above graphic from Imperva shows various examples of PII.