Many organizations are deploying Hadoop to help launch their big data projects. Unfortunately, Hadoop runs in non-secure mode by default, which means sensitive data is at risk of both internal and external threats. Given the value of data in Hadoop, it’s critical that organizations secure their big data deployments before moving them into production.

As part of that effort, organizations should strictly control user access to nodes in a Hadoop cluster. At the same time, however, they also need a way to centrally manage access rights to avoid additional operational overhead and the risk of manual error. Leveraging Active Directory for big data identity and access management can solve both issues.

The Pitfalls of Using Kerberos to Secure Hadoop

Developers are aware of the implications of deploying Hadoop in non-secure mode—and many have implemented a secure mode, including incorporating Kerberos to authenticate users and services from one node to the next. While this is a step in the right direction, it still presents a variety of challenges for enterprise IT organizations.

Even with the incorporation of Kerberos, Hadoop continues to run in non-secure mode by default. To use Kerberos, organizations must go through the time-consuming, error-prone, multi-step process of setting up an MIT Kerberos environment. Once set up, organizations need a way to centrally manage user access. Without it, user accounts must be set up on each of the hundreds or thousands of nodes inside the organization’s multiple Hadoop clusters.

And it doesn’t end there. Like regulatory compliance requirements, the Hadoop ecosystem is highly dynamic. Each time the environment or a regulatory requirement changes, so must the user access rights—making the job of managing access increasingly complex.

Finally, setting up a Kerberos environment also means creating a parallel identity infrastructure that’s redundant to most organizations’ Active Directory environments. This means any changes to a user’s role and responsibilities must be applied to two identity management environments.

Leveraging Existing Identity Infrastructure for Big Data Security Authentication

A better approach to securing Hadoop production deployments involves using a solution that takes advantage of an existing Active Directory infrastructure, which already provides Kerberos authentication capabilities. Using this centralized, cross-platform identity management infrastructure solution allows IT organizations to grant access to Hadoop clusters using existing identities and group memberships, versus creating new identities for users across every Hadoop cluster.

This approach also allows organizations to leverage existing skill sets and management processes to set up user accounts and access to big data nodes, and helps reduce costs and the risk of error, in turn improving security. Using existing Active Directory accounts to log in also secures Hadoop environments while helping to prove compliance in a repeatable, scalable, and sustainable manner.

How it Works

Active Directory deployments are often complex, but a unified identity management solution can simplify and streamline connecting and managing non-Windows servers in complex Active Directory environments. Through Hadoop integration, an identity management solution can connect Hadoop clusters to the existing Active Directory infrastructure. Once cluster nodes are integrated, automated authentication from one node to the next only requires the addition of new service accounts.

A unified identity management solution can also automate Hadoop service account management. The power of the Active Directory’s Kerberos and LDAP capabilities are extended to Hadoop clusters, in turn delivering authentication for both Hadoop administrators and end users. If privileges are already defined and associated with Active Directory users, they can be reused in the Hadoop environment. When users log in to Hadoop through Active Directory, they receive the same privileges and restrictions they are assigned outside the Hadoop ecosystem. This single sign-on capability helps increase user productivity as well as overall security.

In addition to access management across Windows, Linux, and Unix servers, a unified identity management solution provides privilege management and auditing capabilities that can be extended across the entire organization—including the Hadoop environment. The solution can control access, manage privileges, audit activity, and associate everything back to an individual Active Directory account. The system also generates reports indicating who has access and who did what across Hadoop clusters, nodes, and services to help address compliance and audit requirements.

The Bottom Line

Security is crucial for big data deployments, but it must be applied in a way that is both efficient and reliable. Implementing an identity management solution that integrates with an organization’s existing Active Directory infrastructure meets both requirements. Using this approach, organizations avoid setting up a siloed identity infrastructure just for Hadoop, and instead leverage a trusted solution that delivers group-based access controls for Hadoop cluster access management. This provides tightly enforced access controls and centrally managed least-privilege security policies for their Hadoop environment.

To learn more about how Centrify’s unified identity management solution can help you secure your big data deployments, download the white paper How Identity Management Solves Five Hadoop Security Risks.