Apache Spark is generally known as a fast, general and open-source engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing. It allows you to speed analytic applications up to 100 times faster compared to technologies on the market today. You can interface Spark with Python through "PySpark". This is the Spark Python API exposes the Spark programming model to Python.

The cheat sheet below was produced by DataCamp. You can find the original version (PDF format) here. Zoom in on the picture below, by clicking on it.

You can find many more cheat sheets, covering all data science topics, by clicking here.

DSC Resources

Popular Articles