Let’s review the many ways to do the most common operations over dataframe columns using pandas.

import pandas as pd

Adding columns to a dataframe

The three most popular ways to add a new column are: indexing, loc and assign :

df = pd.DataFrame({"A": [1,2,3],

"B": [2,4,8]})



df["C"] = [1,2,3]

df.loc[:, "D"] = [1,2,3]

df = df.assign(E=[1,2,3])

df

Indexing is usually the simplest method for adding new columns, but it gets trickier to use together with chained indexing. It may add the column to a copy of the dataframe instead of adding it to the original. When this happens pandas will show a warning:

df = pd.DataFrame({"A": [1,2,3],

"B": [2,4,8]}) df[df["A"] < 3]["C"] = 100

df

SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

after removing the cwd from sys.path.

To avoid those cases, it’s better to use loc :

df = pd.DataFrame({"A": [1,2,3],

"B": [2,4,8]}) df.loc[df["A"] < 3, "C"] = 100

df

loc has two limitations: it mutates the dataframe in-place and it can't be used with method chaining. If that's a problem for you, use assign :

df = pd.DataFrame({"A": [1,2,3],

"B": [2,4,8]}) df = df.assign(C=[1,2,3]).assign(D=4)

df

assign is particularly useful when you want to create a new column based on a column from an intermediate dataframe. You can pass a lambda to assign to get the intermediate dataframe:

df = pd.DataFrame({"A": [1,2,3],

"B": [2,4,8]}) df = df.assign(C=[1,2,3]).assign(D=lambda idf: idf["C"] * 2)

df

In the previous examples the column name is fixed, but you can also use variable column names:

df = pd.DataFrame({"A": [1,2,3],

"B": [2,4,8]}) my_column_name = "C"

another_name = "D"

df = df.assign(**{my_column_name: [1,2,3], another_name: 100})

df

Another option, when you need to insert a column in a specific location, you can use insert :

df = pd.DataFrame({"A": [1,2,3],

"B": [2,4,8]}) df.insert(loc=1, column="C", value=[1,2,3])

df

Finally, you can also use concat to add a new column:

df = pd.DataFrame({"A": [1,2,3],

"B": [2,4,8]}) new_column = pd.Series([1,2,3])

df = pd.concat([df, new_column.rename("C")], axis=1)

df

No matter what method you use, a common mistake is adding a column with a different index:

df = pd.DataFrame({"A": [1,2,3],

"B": [2,4,8]},

index=[0,1,2]) new_column = pd.Series([1,2,3],index=[2,3,4])

df["C"] = new_column

df

If you don’t care about the indexes and just want to add the column using the current items order, you can use the values array:

df = pd.DataFrame({"A": [1,2,3],

"B": [2,4,8]},

index=[0,1,2]) new_column = pd.Series([1,2,3],index=[2,3,4])

df["C"] = new_column.values

df

Renaming columns

The easiest way to rename columns is:

df = pd.DataFrame({"A": [1,2,3],

"B": [2,4,8]})



df = df.rename(columns={"A":"X", "B":"Y"})

df

If you need to do something more complex with the name you can pass a lambda to rename :

df = pd.DataFrame({"A": [1,2,3],

"B": [2,4,8]}) df = df.rename(columns=lambda cname: cname + "_" + cname)

df

You can also manipulate columns directly:

df = pd.DataFrame({"A": [1,2,3],

"B": [2,4,8]}) df.columns = "column-" + df.columns.str.lower()

df

Changing columns order

You can change the order of columns by explicitly listing each column:

df = pd.DataFrame({"A": [1,2,3],

"B": [2,4,8],

"C": [5,5,5]})



df = df[["A", "C", "B"]]

df

For larger dataframes is easier to use list operations to reorder the columns:

df = pd.DataFrame({"A": [1,2,3],

"B": [2,4,8],

"C": [5,5,5]}) cols = df.columns.tolist()

column_to_move = "C"

new_position = 1 cols.insert(new_position, cols.pop(cols.index(column_to_move)))

df = df[cols]

df

Deleting columns

You can use dict operations, like del and pop , to remove columns:

df = pd.DataFrame({"A": [1,2,3],

"B": [2,4,8],

"C": [5,5,5]})

del df["B"]

C = df.pop("C")



df

For multiple columns (or for keeping the original dataframe intact) you can use drop :

df = pd.DataFrame({"A": [1,2,3],

"B": [2,4,8],

"C": [5,5,5]}) df = df.drop(["B", "C"], axis=1)

df

You can also use columns to select the columns to drop:

df = pd.DataFrame({"A1": [1,2,3],

"B2": [2,4,8],

"C2": [5,5,5]}) cols_to_drop = [cname for cname in df.columns if cname.endswith("2")]

df = df.drop(cols_to_drop, axis=1)

df

Sometimes is easier to select the columns you want to keep: