Top features of Pandas 1.0

New improvements you can start using today

Note: Pandas 1.0.0rc was released on January 9th. The previous version was 0.25

The first new major release of Pandas contains lots of great features, including better auto-summaries of data frames, more output formats, new data types, and even a new documentation site.

The full release notes are available on the new documentation site, but I thought a less-technical overview would be helpful too.

To use the new version, you can easily upgrade Pandas using pip . At the time this article was written, Pandas 1.0 is still a release candidate, which means installing it requires specifying its version explicitly.

pip install --upgrade pandas==1.0.0rc0

Of course, upgrading might break some of your code because this is a major version release, so you should be careful!

This version of Pandas also drops support for Python 2. Using Pandas 1.0+ requires at least Python 3.6+, so make sure your pip and python are using the correct versions.

$ pip --version

pip 19.3.1 from /usr/local/lib/python3.7/site-packages/pip (python 3.7) $ python --version

Python 3.7.5

You can confirm that everything is working correctly and Pandas is using the right version.

>>> import pandas as pd

>>> pd.__version__

1.0.0rc0

Better auto-summary with DataFrame.info

My favorite new feature is the improved DataFrame.info method. It now uses a much more readable format, making your data-exploration process easier.

>>> df = pd.DataFrame({

...: 'A': [1,2,3],

...: 'B': ["goodbye", "cruel", "world"],

...: 'C': [False, True, False]

...:})

>>> df.info()

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 3 entries, 0 to 2

Data columns (total 3 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 A 3 non-null int64

1 B 3 non-null object

2 C 3 non-null object

dtypes: int64(1), object(2)

memory usage: 200.0+ bytes

Output formats for markdown tables

My next favorite feature is the ability export data frames to markdown tables with the new DataFrame.to_markdown method.

>>> df.to_markdown()

| | A | B | C |

|---:|----:|:--------|:------|

| 0 | 1 | goodbye | False |

| 1 | 2 | cruel | True |

| 2 | 3 | world | False |

That makes it easier to display tables in places like Medium via github gists.

New Data types for booleans and strings

Pandas1.0 also introduced experimental data types for booleans and strings.

Since those changes are experimental, the API for the data types might change slightly, so you should use them with caution. But Pandas recommends using those data types wherever it makes sense, and future versions will improve the performance of type-specific operations like regex matching.

By default, Pandas won’t automatically coerce your data into these types (yet). But you can still use them if you explicitly tell Pandas to do so.

>>> B = pd.Series(["goodbye", "cruel", "world"], dtype="string")

>>> C = pd.Series([False, True, False], dtype="bool")

>>> df.B = B, df.C = C

>>> df.info()

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 3 entries, 0 to 2

Data columns (total 3 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 A 3 non-null int64

1 B 3 non-null string

2 C 3 non-null bool

dtypes: int64(1), object(1), string(1)

memory usage: 200.0+ bytes

Notice how the Dtype column now reflects the new types string and bool .

The most useful benefit of the new string dtype is that you can now select just the string columns from a DataFrame. That makes it faster to build analyses of just the text components of your dataset.

df.select_dtypes("string")

Previously, you could only select the string type columns by using their names explicitly.

More documentation for the new types is available here.