Summary

Analyzing and interpreting data is a large portion of the work involved in scientific research. Getting to that point can be a lot of work on its own because of all of the steps required to download, clean, and organize the data prior to analysis. This week Henry Senyondo talks about the work he is doing with Data Retriever to make data preparation as easy as retriever install .

Do you want to try out some of the tools and applications that you heard about on Podcast.__init__? Do you have a side project that you want to share with the world? With Linode’s managed Kubernetes platform it’s now even easier to get started with the latest in cloud technologies. With the combined power of the leading container orchestrator and the speed and reliability of Linode’s object storage, node balancers, block storage, and dedicated CPU or GPU instances, you’ve got everything you need to scale up. Go to pythonpodcast.com/linode today and get a $60 credit to launch a new cluster, run a server, upload some data, or… And don’t forget to thank them for being a long time supporter of Podcast.__init__!



Preface

Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.

I would like to thank everyone who supports us on Patreon. Your contributions help to make the show sustainable.

When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at www.podastinit.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app.

Visit the site to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch.

To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.

Your host as usual is Tobias Macey and today I’m interviewing Henry Senyondo about Data Retriever, the package manager for public data sets.

Interview

Introductions

How did you get introduced to Python?

Can you explain what data retriever is and the problem that it was built to solve?

Are there limitations as to the types of data that can be managed by data retriever?

What kinds of data sets are currently available and who are the target users?

What is involved in preparing a new dataset to be available for installation?

How much of the logic for installing the data is shared between the R and Python implementations of Data Retriever and how do you ensure that the two packages evolve in parallel?

How is the project designed and what are some of the most difficult technical aspects of building it?

What is in store for the future of data retriever?

Keep In Touch

Github

@henrykironde on Twitter

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA