INTRODUCTION Apache Spark is an opensource tool for data analysis and transformation. It uses RDD (resilient distributed dataset) a read only dataset distributed over a cluster of machine that is mantained is a fault tolerant way.It obtain high performance for batch and streaming data. It uses:DAG schedulerQuery OptimizerA Physical Execution Engine.DOCKER-SPARKDocker is a tool to create an isolated system into a host computerI use docker every day to create and simulate more scenario. Today i will explain how to create a master and worker apache spark node.docker-compose is a tool to create an environment consist of a set of containeris big-data-europe 's github repository. We can create an environment usingfile:version: "3.3"services:spark-master:image: bde2020/spark-master:2.4.0-hadoop2.7container_name: spark-masterports:- "8080:8080"- "7077:7077"environment:- INIT_DAEMON_STEP…