Interesting Talks from PyData Amsterdam 2017

I’ve always been fascinated by the openness and collaborative spirit that exist in the wider Python community. This is particularly embodied in the fact that most Python conferences record all of their talks, and even tutorials, and make them available on YouTube.

This year, the PyData Amsterdam conference was held from the 7th to the 9th of April in Booking.com’s offices. The recordings of the talks were posted several days ago, and I decided to make the time to watch all of the talks, and do a short write-up with links to the talks I found the most interesting.

To begin, Lucas Bernardi, a Senior Data Scientist with Booking.com gave a fantastic talk on diagnosing and rectifying certain “pathologies” that arise in statistical and Machine Learning models:

Rafael Schultze-Kraft from WattX gave an interesting talk about a project he’s working on related to smart offices, giving a good overview of IoT, sensor data, and how they use Spark and AWS for analysis and modelling. With sensors being pretty much everywhere around us, learning how to wrangle, analyze and model time series data is quickly becoming a must.

Holden Karau from IBM’s Spark Technology Center is a regular at PyData conferences, and gave a very engaging and fun expose on understanding the often very cryptic error messages that PySpark can produce. She’s also almost done writing a book on best practices for using and tuning Spark for high performance, which should be published in May.

Katharine Jarmul’s keynote was about a topic that IMHO is not discussed often enough (and I’m guilty of not giving enough thought to this as well) in the Data Science community — the ethics of modelling and the social and human impact of the various Machine Learning models that are being used. She gives a good overview of several pieces of research, as well as giving suggestions on how to combat “injustices” caused by models.

Her talk reminded me of a very interesting website, Algorithm Tips, that catalogues the various algorithms and models used by the U.S. Government that impact people’s daily lives.

Giovanni Lanzani, Chief Scientist of GoDataDriven (who hosted the Tutorial Day) talked about the disconnect that exists between Data Science as it is taught in books and MOOCs, and Data Science as it is done in a real startup environment, where the prototyped models need to be put into production. He spoke about the growing demand (yet lack of supply) for Type B Data Scientists, and the need for supplementing the typical Machine Learning skillset with software engineering skills — something that Trey Causey has blogged about some time ago.

Giovanni’s colleague, Niels Zeilemaker, presented the infrastructure that GoDataDriven have built to make it easier for their Data Scientists to productionize models, while observing engineering best practices. They make use of a very interesting workflow, with their ML production pipeline consisting of GitLab, Jenkins, Docker, Kubernetes, and the ELK stack.

In a very informative and application-oriented talk, Tristan Boudreault from Shopify educated the audience as to how to use survival methods to analyze and model conversion rates:

In another talk at the intersection of Data Science and Data Engineering, Stephen Helms from Optiver gave a very good introduction to Bayesian statistics, its application to finding outliers in financial time series, and presented the analytics infrastructure they’ve built.

About a quarter of the talks at the conference were about Deep Learning, and a very interesting one that presented business applications was given by Emrah Tasli and Stas Girkin, Data Scientists from conference host, Booking.com:

Finally, there were 5 tutorials given at Tutorial Day, but unfortunately they weren’t recorded. I really wanted to see Stephen Simmons’ tutorial titled Pandas from the Inside / “Big Pandas”, and thankfully a recording of a similar tutorial from PyData DC 2016 is available:

That’s it for PyData Amsterdam 2017. We have PyData London coming up on May 5th, and hopefully I’ll be able to make time to do something similar after that conference again. Enjoy!