The Latest Data Risk Landscape

Enterprise data continues to change rapidly in form, size, use, and residence. Rarely does it remain in siloed constructs anymore, limited to certain business units or untouched by the outside world. Data now freely crosses the prior conceived thresholds that limit business potential. It floats about in the cloud, spreads between business units, and flows everywhere.

But for all the change and opportunity that data represents, once it’s created or collected, it is under threat of attack and misuse. With the number of reported data breaches doubling in the last ten years, and half a billion records exposed last year, our reliance on information is under increasing threat from a lack of security.

With the exposure of personal data at industrial scale, the growth of data privacy legislation was inevitable. Companies and government agencies collecting and handling personally identifiable information (PII) must now comply with Payment Card Industry Data Security Standard (PCI DSS) and Health Insurance Portability and Accountability Act (HIPAA) requirements in the United States, the General Data Protection Regulation (GDPR) in Europe, and many international and local follow-on laws like POPI in South Africa, KVKK in Turkey, and the California Consumer Privacy Act (CCPA).

Data breaches also carry explicit costs. The Ponemon Institute Cost of a Data Breach Study found that the average cost per compromised record is roughly $150 in 2019. The study puts the risk of a having 10,000 stolen or lost records above 26%. So, you have just above a one-in-four chance of losing 10,000 records. Would you take the risk if you could use technology to prevent it?

Organizations stuck in old operational models and mindsets fail to recognize the importance of company-wide security protocols. To improve, they must address their need for what Gartner calls Data Security Governance and thus protect information in structured and coordinated events, not as an afterthought or remediation after a breach.

What is Data Security Governance?

Gartner defines data security governance (DSG) as “a subset of information governance that deals specifically with protecting corporate data (in both structured database and unstructured file-based forms) through defined data policies and processes.”

You define the policies. You define the processes. There is no one-size-fits-all solution to DSG. Furthermore, there is no single product that meets all of the needs of DSG. You must look at your data and weigh which areas have the greatest need and the most importance to your company. You take data governance into your own hands to avert disaster. Remember that your information is your responsibility.

While there are multiple pathways to safeguarding data — logical, physical, and human — three primary software methods that IRI customers successfully employ are the classification, discovery, and de-identification (masking) of PII and other data considered sensitive.

Data Classification

In order to find and protect specific data at risk, it must first be defined in named categories or groups. Data so classified can be cataloged not only by its name and attributes (e.g., US SSN, 9 numbers), but also subject to computational validation (to distinguish it from other 9-digit strings), and sensitivity attribution (secret, sensitive, etc.).

In addition to those assignments, data classes or class groups can be characterized by where they are located and/or how they should be found (search method/s) if their locations are unknown. Also possible is the global assignment of a remediation, or masking function, so that de-identification can be carried out consistently for all members of the class, regardless of location, preserving its referential integrity.

Data Discovery

To find sensitive data, search functions that may or may not be associated with data classes can be executed. Examples of discovery techniques includes RegEx or Perl Compatible Regular Expression (PCRE) searches through databases or files, fuzzy (soundalike) matching algorithms, special path or column filtering logic, named entity recognition (NER), facial recognition, etc.

It is also possible to leverage machine learning in the recognition process. IRI supports semi-supervised machine learning in NER model building, for example, in its DarkShield product (below).

Data De-Identification

One of the ways to reduce, and even nullify, the risk of data breaches is by masking data at rest or in motion, with field-level functions that render it protected but still usable to some extent.

According to Gartner analyst Marc Meunier, “How Data Masking Is Evolving to Protect Data from Insiders and Outsiders:”

Adopting data masking helps organizations raise the level of security and privacy assurance for their sensitive data — be it protected health information (PHI), personally identifiable information (PII) or intellectual property (IP). At the same time, data masking helps meet compliance requirements with security and privacy standards and regulations.

Most enterprises — either by virtue of internal rules or data privacy laws — have been, are now, or will soon be, making data masking a core element of their overall security strategy.

Proven Software Solutions

IRI provides static and dynamic data masking solutions for databases, flat files, proprietary mainframe and legacy application sources, and big data platforms (Hadoop, NoSQL, Amazon, etc.) in its FieldShield product or Voracity platform, as well as data at risk in Excel via CellShield.

For data in semi-and unstructured sources like NoSQL DBs, free-form text files and application logs, MS Office and .PDF documents, plus image files (even faces), you can use DarkShield to classify, discover, and de-identify it.

In these ‘shield’ products, you can use functions like blurring, deletion, encryption, redaction, pseudonymization, hashing, and tokenization, with or without the ability to reverse those functions. Voracity — which includes those products — also folds data masking into data integration and migration operations, as well as data federation (virtualization), reporting and data wrangling for analytic operations.

Built-in data discovery, classification, metadata management, and audit logging features facilitate both automatic and manual assessments of the re-identifiability of affected records. See www.iri.com/solutions/data-masking and www.iri.com/solutions/data-governance for more information, and contact your IRI representative if you need help creating or enforcing your DSG framework through a data-centric, or ‘startpoint’ security approach.