PySpark code should generally be organized as single purpose DataFrame transformations that can be chained together for production analyses (e.g. generating a datamart).

This blog post demonstrates how to monkey patch the DataFrame object with a transform method, how to define custom DataFrame transformations, and how to chain the function calls.

We’ll also demonstrate how to run multiple custom transformations with function composition using the cytoolz library.

This blog post has been migrated here.