Summary

HDF5 is a file format that supports fast and space efficient analysis of large datasets. PyTables is a project that wraps and expands on the capabilities of HDF5 to make it easy to integrate with the larger Python data ecosystem. Francesc Alted explains how the project got started, how it works, and how it can be used for creating sharable and archivable data sets.

Do you want to try out some of the tools and applications that you heard about on Podcast.__init__? Do you have a side project that you want to share with the world? With Linode’s managed Kubernetes platform it’s now even easier to get started with the latest in cloud technologies. With the combined power of the leading container orchestrator and the speed and reliability of Linode’s object storage, node balancers, block storage, and dedicated CPU or GPU instances, you’ve got everything you need to scale up. Go to pythonpodcast.com/linode today and get a $60 credit to launch a new cluster, run a server, upload some data, or… And don’t forget to thank them for being a long time supporter of Podcast.__init__!



Preface

Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.

I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable.

When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at linode.com/podcastinit and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app. Linode will has announced new plans, including 1GB for $5 plan, high memory plans starting at 16GB for $60/mo and an upgrade in storage from 24GB to 30GB on our 2GB for $10 plan.

Visit our site to subscribe to our show, sign up for our newsletter, read the show notes, and get in touch.

To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers

Your host as usual is Tobias Macey and today I’m interviewing Francesc Alted about PyTables

Interview

Introductions

How did you get introduced to Python?

To start with, what is HDF5 and what was the problem that motivated you to wrap Python around it to create PyTables?

Which are the most relevant contributors for PyTables? How you interacted?

How is the project architected and what are some of the design decisions that you are most proud of?

What are some of the typical use cases for PyTables and how does it tie into the broader Python data ecosystem?

How common is it to use an HDF5 file as a data interchange format to be shared between researchers or between languages?

Given the ability to create custom node types, does that inhibit the ability to interact with the stored data using other libraries?

What are some of the capabilities of HDF5 and PyTables that can’t be reasonably replicated in other data storage systems?

One of the more intriguing capabilities that I noticed while reading the documentation is the ability to perform undo and redo operations on the data. How might that be leveraged in a real-world use case?

What are some of the most interesting or unexpected uses of PyTables that you are aware of?

Keep In Touch

@FrancescAlted on Twitter

FrancescAlted on GitHub

Picks

Tobias The Accountant

Francesc Blosc a high speed compressor, specially meant for binary data The Lego Batman Movie



Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA