Tom Phelan, Chief Architect at BlueData talks about the appropriate situations in which to virtualize Hadoop, either in containers or in virtual machines. In evaluating the situations he explains what questions you should and should not be asking.

Summary

What BlueData has learned about running Hadoop jobs

Under what situations should one virtualize a Hadoop cluster

The shape and components of a physical cluster Both master and worker controllers contain Disks, server, named node, and resource manager

Types of virtualization: public cloud private cloud / hypervisor (strong fault isolation) private cloud / containers (weak fault isolation) paravirtualization

The appropriate infrastructure for various situations Questions NOT to ask Questions to ask

Performance and Data Locality

Five use-cases and their infrastructure needs

Pt 2: Orchestration with Docker

Joel Baxter, also at BlueData, leads a breakout session about what an ideal orchestration manager would look like for managing Hadoop clusters and associated data. The state of the art is evolving but not there yet.

Summary