If you are just getting started and would like to learn about working with data in Python, take DataCamp's interactive course, Importing Data in Python to work with CSV and Excel files in Python.

Using Python And Excel For Data Science

Excel is a spreadsheet application that was developed by Microsoft in the Year 1987. It is officially supported by almost all of the operating systems like Windows, Macintosh, Android, etc. It comes pre-installed with the Windows OS and can be easily integrated with other OS platforms. Microsoft Excel is the best and the most accessible tool when it comes to working with structured data.

It organizes, analyzes, and stores your data in tabular row-column fashion. You can perform calculations and create pivot tables, graphs, and a lot more! Since its release, this software gained popularity and is widely used in many different application fields and all sorts of domains across the world.

Since the day internet was created, it has grown exponentially, and so has the amount of data. The growth in data has pushed the need for people to understand how to analyze it. Corporations and governments were collecting big data. Hence, the term data science was coined.

When working with data, you'll need to deal with spreadsheets at some point; however, working directly with spreadsheets can get annoying at times, especially when you are a developer. To get rid of this problem, Python developers came up with ways of reading, writing, analyzing all kinds of file formats, including spreadsheets.

Today’s tutorial will be mostly on how you can use the Python programming language and work with Excel without directly using the Microsoft Excel application. It will provide you hands-on experience with the packages that you can use to load, read, write, and analyze these spreadsheets with the help of Python. You will be dealing with packages like pandas, openpyxl, xlrd, xlutils, and pyexcel.

The Data is the Oil

When you start any project that directly or indirectly deals with data, the first and foremost thing you would do is search for a dataset. Now gathering data could be done in various ways, either using web scraping, a private dataset from a client, or a public dataset downloaded from sources like GitHub , universities , kaggle , quandl , etc.

This data might be in an Excel file or saved with .csv , .txt , JSON , etc. file extension. The data could be qualitative or quantitative. The data type could vary depending on the kind of problem you plan to solve. Hence, as a first step, you should figure out whether you're working with qualitative or quantitative data.

The data could be:

Continuous

Discrete

Categorical - Binary, Unordered, Ordered

Pixels of images, etc.

Best Practices For Spreadsheet Data

Before you start with loading, reading, and analyzing your excel data in Python, it is a good practice to view the sample data and understand whether the following points are in line with the file you plan to work with:

The first row of the spreadsheet is usually reserved for the header, which describes what each column's data represents unless the data in the spreadsheet is pixels of images.

Avoid names or values field header with blank spaces or names comprising of multiple words having gaps or spaces between them. Consider using Python's standard PEP-8 format like: Underscores, Dashes, Camel case, where the first letter of each section of text is capitalized, or



Prefer using short names instead of long names or sentences,

Try to avoid using names that contain special characters such as ? , $ , % , ^ , etc. since special characters do not tell anything about the data.

Your data might have missing values in some columns. Make sure to fill those with NA or fill them with the mean or median of the complete column.

While working with Microsoft Excel, you’ll find a considerable amount of options to save your file. Besides the default extension .xls or .xlsx , you can go to the “File” tab, click on “Save As” and select one of the extensions that are listed as the “Save as Type” file extension options. The most commonly used extensions to save datasets for data science are .csv and .txt (as tab-delimited text file) and even .xml . Depending on the saving option that you choose, your data set’s fields are separated by tabs or commas, which will make up the “field separator characters” of your data set.

Knowing the extension of your file is important since when you load the data stored in excel, your Python library would need to explicitly know whether it is a comma-separated or tab-separated file.

Below are all of the file extensions that MS excel supports:

Preparing Your Workspace (Optional)

Preparing your workspace is good to have a step in your pipeline, but it is not a mandatory step and can be skipped. But having this as a first step in your pipeline makes life simpler and ensures that you start well.

Verify that your working directory is the same as your Python codebase directory.

When you're working in the terminal, you might first navigate to the directory that your file is located in and then start up Python. That also means you have to make sure your file is located in the directory that you want to work from!

But someone who's a beginner and has already started your Python session and you’ve got no clue of the directory that you’re working in; you should consider executing the following commands:

# Import `os` import os # Retrieve current working directory (`cwd`) cwd = os.getcwd() cwd # Change directory os.chdir("/path/to/your/folder") # List all files and directories in current directory os.listdir('.')

Another way could be if you keep track of where your dataset file is kept. You also could just give the absolute path of that folder in your code instead of changing the directory for where you plan to write the Python code. The absolute path will ensure that no matter where you write the Python code, it will be able to fetch the data for you!

You'll see that these commands are pretty vital, not only for loading your data but also for further analysis. For now, let's just continue. You have gone through all the checkups, you have saved your data, and prepared your workspace.

Before you finally start reading the data in Python, there is one more thing left: installing the necessary packages for reading and writing excel files.

Install Packages to Read and Write Excel Files

Make sure you have pip and setuptools installed on your system. Don't use Python 2 as it has been discontinued and make sure you have Python 3 >=3.4 installed, you won’t need to worry because then you’ll normally already have it ready. If you already have Python3, just make sure you have upgraded to the latest version.

If you do not have Python installed on your system, then feel free to check out this tutorial.

Check whether your pip or pip3 command is symbolically linked to Python3, use the one which is linked to the current version of Python (>=3.4) you plan to use in this tutorial. Also, check by typing Python in the terminal what version it shows is it >=2.7 or >=3.4, if it is 2.7, then check by typing Python3 , if this works, then it means that you have two different Python version installed on your system.

To do this, run the following command in your terminal:

# For Linux/OS X pip install -U pip setuptools or pip3 install -U pip3 setuptools # For Windows python -m pip install -U pip setuptools or python3 -m pip install -U pip setuptools

In case you haven’t installed pip yet, run the python get-pip.py script that you can find here. You can also follow the installation instructions on the page if you need more help to get everything running properly. You can even check this link to install pip in case the first link doesn't work for you.

Installing Anaconda is the way to go!

Anaconda Python Distribution is probably what you should be looking for because it comes bundled with almost everything that you would need to start your data science journey. Starting from Python, Pip, Pandas, Numpy, Matplotlib, etc. everything will be installed inside it. This would provide you an easy and quick way to get started with doing data science because you won’t need to worry about separately installing the packages that you need for doing data science. However, there would still be a lot of packages that might not be covered by Anaconda, which you could install manually via Pip or build from source.

Anaconda is useful for not just beginners but also useful and heavily considered by seasoned developers. It's a way to quickly test out some proof-of-concept without having to install each package separately, which saves a lot of time.

Anaconda includes 100 of the most popular Python, R, and Scala packages for data science and several open source development environments such as Jupyter Lab/Notebook and Spyder IDE. If you’d like to start working with Jupyter Notebook after this tutorial, go to this page.

You can go here to install Anaconda. To learn how to install Anaconda, check out the documentation. Follow the instructions to install, and you’re ready to start!

Congratulations, your environment has been setup!

You are all set to start loading your files and analyzing them.

Load Excel Files As Pandas DataFrames

Pandas package is one of the best ways that you could often use to import your dataset and represent it in a tabular row-column format. The Pandas library is built on top of Numerical Python popularly known as NumPy and provides easy-to-use data structures and data analysis tools for the Python programming language. Pandas have built-in functions that could be used to analyze and plot your data and make sense of it!

Because of the power and flexibility this library provides, it has become the first choice of every data scientist. Of course, there are some disadvantages of this library; especially when dealing with big datasets, it can be slower in loading, reading, and analyzing big datasets with millions of records.

If you already have Pandas available through Anaconda, you can just load your excel file in Pandas DataFrames with pd.Excelfile() function as shown below:

Just create a dummy example.xlsx file and fill in some arbitrary values in rows and columns and save it in a .xlsx format.

# Import pandas import pandas as pd # Assign spreadsheet filename to `file` file = 'example.xlsx' # Load spreadsheet xl = pd.ExcelFile(file) # Print the sheet names print(xl.sheet_names) # Load a sheet into a DataFrame by name: df1 df1 = xl.parse('Sheet1')

If you didn’t install Anaconda, you might get a no module error. Just execute pip install pandas on a terminal or !pip install pandas on jupyter notebook cell to install the Pandas package in your environment and then execute the commands included in the code chunk above.

It's so simple, right?

To read in .csv files, you have a similar function to load the data in a DataFrame: read_csv() . Here’s an example of how you can use this function:

# Import pandas import pandas as pd # Load csv df = pd.read_csv("example.csv")

The pd.read_csv() function has a sep argument which acts as a delimiter that this function will take into account is a comma or a tab, by default it is set to a comma, but you can specify an alternative delimiter if you want to. Go to the documentation to find out which other arguments you can specify to import your file and load it successfully.

How To Write Pandas DataFrames to Excel Files

Since you load and read the files with .csv or .xlsx file format in Pandas, similarly, you can save the pandas data frames either as an excel file with a .xlsx extension or as a .csv file. Let’s say that after data analysis and machine learning predictions, you want to write the updated data or result back to a new file. You can achieve that using the pandas to_excel() function.

But, before you use this function, make sure that you have the XlsxWriter installed if you want to write your data to multiple worksheets in a .xlsx file, as shown below:

# Install `XlsxWriter` !pip install XlsxWriter import pandas as pd # Specify a writer writer = pd.ExcelWriter('example.xlsx', engine='xlsxwriter') # Write your DataFrame to a file # yourData is a dataframe that you are interested in writing as an excel file yourData.to_excel(writer, 'Sheet1') # Save the result writer.save()

Let's breakdown the above code chunk and understand it step by step:

You first define the writer in which the data frame output will be saved using an ExcelWriter object to output the DataFrame.

The pd.ExcelWriter function takes two arguments, the filename and the engine being the xlsxwriter .

Next, you pass in the writer variable to the to_excel() function, and you also specify the sheet name. This way, you add a sheet with the data to an existing workbook, which could have many worksheets in a workbook: you can use the ExcelWriter to save multiple, different DataFrames to one workbook having multiple sheets.

A much better and a simple option is to write data in .csv extension. As you saw above how you can read the .csv file using read_csv , you can also write the data frame results back to a comma-separated file using the pandas to_csv() method as shown below:

# Write the DataFrame to csv df.to_csv("example.csv")

If you want to save the output in a tab-separated fashion, all you need to do is pass a \t to the sep argument. Note that there are various other functions and ways that you can use to write your files. You can even pass the header and index argument to the to_csv function. You can find all of them here.

Using Conda Environment

The general advice for installing these packages is to do it in a Python or Anaconda virtualenv without system packages. The benefit of installing packages inside a virtual environment is that it doesn't upgrade or downgrade base system packages, and you could have different conda environments for different projects.

To start working with virtualenv, you first need to install it. Installing a virtual environment is very simple, especially with Anaconda. On your base, anaconda simply creates the virtual environment with a name and the python version you want it to use. Just activate it, install whatever packages you need, and go to your project folder.

Tip: don’t forget to deactivate the environment when you’re done!

# Install virtualenv $ conda create --name excel python=3.5 # Activate `excel` $ conda activate excel # Go to the folder of your project $ cd my_folder # Deactivate `excel` $ conda deactivate

Having virtual environments makes life very simple. Imagine as a developer, you will be working on multiple different projects, and each project might need a different package with different versions. When your projects have conflicting requirements, then the virtual environment will come in handy!

Otherwise, you would keep circling on installing one package, then upgrading it for a project and downgrading it for another. A much better idea would be to have different environments for every project.

Now you can finally start installing and importing the packages that you have read about to load in your spreadsheet data.

How To Read and Write Excel Files with Openpyxl

Openpyxl package is recommended if you want to read and write .xlsx , xlsm , xltx , and xltm file formats.

You can install openpyxl using pip but inside the excel conda environment, as shown in the below code cell.

You can read more about openpyxl here.

# Activate virtualenv $ conda activate excel # Install `openpyxl` in `excel` $ pip install openpyxl

Now that you have installed openpyxl , you can start loading in the data.

But before you load in the data, you would need to create it.

To create the data, you can follow the below workbook, which has three sheets that you will be loading into Python:

The load_workbook() function takes the filename as an argument and returns a workbook object wb , which represents the file. You can check the type of wb by running type(wb) .

# Import `load_workbook` module from `openpyxl` from openpyxl import load_workbook # Load in the workbook wb = load_workbook('test.xlsx') # Get sheet names print(wb.sheetnames)

['Sheet1', 'Sheet2', 'Sheet3']

You see that the code chunk above returns the sheet names of the workbook that you loaded in Python. Next, you can use this information also to retrieve separate sheets of the workbook.

You can also check which sheet is currently active with wb.active . As you can see in the code below, you can also load another sheet from your workbook:

# Get a sheet by name sheet = wb['Sheet1'] # Print the sheet title print('Sheet Title:',sheet.title) # Get currently active sheet anotherSheet = wb.active # Check `anotherSheet` print(anotherSheet)

Sheet Title: Sheet1 <Worksheet "Sheet1">

Though you'll think that these Worksheet objects are of no use at first, you can do a lot of stuff with these. Like you can retrieve values from specific cells in your workbook's sheet by using square brackets [] , and in these square brackets, you would pass the exact cell from which you want to retrieve the value.

This way of extracting values from a cell is quite similar in spirit to selecting and extracting values from NumPy arrays and Pandas data frame via index positions. But with Openpyxl, you need to specify the attribute .value apart from just specifying the index from where you want to extract the value as shown below:

# Retrieve the value of a certain cell print(sheet['A1'].value) # Select element 'B3' of your sheet c = sheet['B3'] # Retrieve the row number of your element print('Row No.:', c.row) # Retrieve the column number of your element print('Column Letter:', c.column) # Retrieve the coordinates of the cell print('Coordinates of cell:', c.coordinate)

ID Row No.: 3 Column Letter: 2 Coordinates of cell: B3

As you can see, besides the value attribute, there are other attributes that you can use to inspect your cell like row , column , and coordinate .

From the above code cell output when selecting the B3 element from sheet1:

The row attribute gives 3 ;

attribute gives ; Adding the column attribute gives 2 , and

attribute gives , and The coordinate of the cell gives B3 .

This was information about the cells, now what if you want to retrieve the cell values?

You can retrieve cell values by using the sheet.cell() function. All you need to do is pass the row and the column arguments and add the attribute .value in the end, as shown below:

# Retrieve cell value print(sheet.cell(row=1, column=2).value)

AGE

To extract values continuously instead of manually selecting the row and column index, you can use the for loop along with the help of the range() function.

This would give you a lot of flexibility in terms of extracting the cell values without much hardcoding. Let's print out the values of the rows that have values in column 2. If those particular cells are empty, you’ll just get back None .

If you want to know more about for loops, consider taking our Intermediate Python for Data Science course.

# Print out values in column 2 for i in range(1, 4): print(i, sheet.cell(row=i, column=2).value)

1 AGE 2 22 3 15

The openpyxl has a utility class that has two methods get_column_letter and column_index_from_string . As the name suggests, the former returns the letter given the number/integer and the latter returns the number provided a letter as a string.

You can see how it works below:

# Import relevant modules from `openpyxl.utils` from openpyxl.utils import get_column_letter, column_index_from_string # Return 'A' print('Column Letter:', get_column_letter(1)) # Return '1' print('Column Index:', column_index_from_string('A'))

Column Letter: A Column Index: 1

You have already retrieved values for rows with values in a particular column, but what do you need to do if you want to print out the rows of your file without just focusing on one column?

You use another for loop, of course!

You say, for example, that you want to focus on the area between A1 and C3 , where the first specifies the left upper corner and the second in the right bottom corner of the area on which you want to focus.

This area will be the so-called cellObj that you see in the first line of code below. You then say for each cell that lies in that area; you print the coordinate and the value that is contained within that cell. After the end of each row, you'll print a message that signals the row of cellObj area has been printed.

Note again how the selection of the area is very similar to selecting, getting and indexing list, and NumPy array elements, where you also use square brackets and a colon : to indicate the area you want to get the values. In addition, the above loop also makes good use of the cell attributes!

To make the above explanation and code visual, you might want to check out the result that you’ll get back once the loop has finished:

# Print row per row for cellObj in sheet['A1':'C3']: for cell in cellObj: print(cell.coordinate, cell.value) print('--- END ---')

A1 ID B1 AGE C1 SCORE --- END --- A2 1 B2 22 C2 5 --- END --- A3 2 B3 15 C3 6 --- END ---

Lastly, there are some attributes that you can use to check up on the result of your import, namely max_row and max_column . These attributes are, of course, general ways of making sure that you loaded in the data correctly, but nonetheless, they can and will be useful.

# Retrieve the maximum amount of rows print('Max Rows:', sheet.max_row) # Retrieve the maximum amount of columns print('Max Columns:', sheet.max_column)

Max Rows: 4 Max Columns: 3

Great, so until now, you saw how you could read data and retrieve it using openpyxl in Python. A lot of you might have felt that this is an awfully hard way to work with these files and when you haven't even looked at how you can manipulate the data, which definitely could be even more convoluted.

But don't worry, there is a much easier way!

DataFrames to the Rescue.

You can use the DataFrame() function from the Pandas package to put the values of a sheet into a DataFrame and then use all the data frame functions to analyze and manipulate your data:

# Import `pandas` import pandas as pd # Convert Sheet to DataFrame df = pd.DataFrame(sheet.values)

If you want to specify headers and index, you can pass a header argument with a list of headers and index as True, however, since the sheet you have converted to data frame already has headers you do not need to add headers:

from itertools import islice # Put the sheet values in `data` data = sheet.values # Indicate the columns in the sheet values cols = next(data)[1:] # Convert your data to a list data = list(data) # Read in the data at index 0 for the indices idx = [r[0] for r in data] # Slice the data at index 1 data = (islice(r, 1, None) for r in data) # Make your DataFrame df = pd.DataFrame(data, index=idx, columns=cols) print(df)

AGE SCORE 1 22 5 2 15 6 3 28 9

You can even append or write the values back to the Excel files as shown below with the help of dataframe_to_rows method you pass the data frame df you had created above along with the index and header:

# Import `dataframe_to_rows` from openpyxl.utils.dataframe import dataframe_to_rows from openpyxl import * # Initialize a workbook wb = Workbook() # Get the worksheet in the active workbook ws = wb.active # Append the rows of the DataFrame to your worksheet for r in dataframe_to_rows(df, index=True, header=True): ws.append(r)

The openpyxl package offers you high flexibility on how you want to write your data back to Excel files. It allows you to change cell styles and whatnot, which makes it one of those packages that you need to know while working with spreadsheets.

Note: To know more about openpyxl like how you can change cell styles or how the package works with NumPy and Pandas check this out.

Another such package which is useful when dealing with spreadsheets is xlrd . Let's check that out!

Reading And Formatting Excel Files: xlrd

This package is ideal if you want to read and manipulate the data from files with the .xls or .xlsx extension.

# Import `xlrd` import xlrd # Open a workbook workbook = xlrd.open_workbook('test.xlsx') # Loads only current sheets to memory workbook = xlrd.open_workbook('test.xlsx', on_demand = True)

xlrd provides you functions that you can use to retrieve or filter only a specific sheet and not the whole workbook. It provides you functions such as sheet_by_name() or sheet_by_index() to retrieve the sheets that you want to use in your analysis and filter the rest.

# Load a specific sheet by name worksheet = workbook.sheet_by_name('Sheet1') # Load a specific sheet by index worksheet = workbook.sheet_by_index(0) # Retrieve the value from cell at indices (0,0) sheet.cell(1, 1).value

'ID'

Writing Your Data to Excel Files with xlwt

Like other Excel Python packages, you can use xlwt to create spreadsheets that have your data in them even manually. You can also use the xlwt package, apart from the XlsxWriter package. xlwt is ideal for writing data and format information to files with older extensions like .xls .

At first go, you will hardly find much difference in how it is better than the previous Excel packages that you learned about, but it is more to do with how much comfortable you feel while working with this package compared to the others.

Let's understand it with the help of an example wherein you will manually create a workbook using Python code and write data to it:

# Import `xlwt` import xlwt # Initialize a workbook book = xlwt.Workbook(encoding="utf-8") # Add a sheet to the workbook sheet1 = book.add_sheet("Python Sheet 1") # Write to the sheet of the workbook sheet1.write(0, 0, "This is the First Cell of the First Sheet") # Save the workbook book.save("spreadsheet.xls")

Automatizing the Data Writing Process

Automatizing the Data Writing Process in an excel file is essential, especially when you want to write data to the file but at the same time don't want to spend time manually entering the data to the file. In such scenarios, you could automate the whole pipeline using very simple techniques like a for loop.

Let's understand how it can be achieved:

# Initialize a workbook book = xlwt.Workbook() # Add a sheet to the workbook sheet1 = book.add_sheet("Sheet1") # The data cols = ["A", "B", "C", "D", "E"] txt = [0,1,2,3,4] # Loop over the rows and columns and fill in the values for num in range(5): row = sheet1.row(num) for index, col in enumerate(cols): value = txt[index] + num row.write(index, value) # Save the result book.save("test.xls")

Let's break the above code and understand it step-by-step:

You first initialize a workbook using xlwt.workbook() ;

Then you add a sheet to the workbook with a name Sheet1 ;

Then you define the data, i.e., the header (cols) and the rows (txt);

Next, you have a for loop that will iterate over the data and fill all the values into the file: For every element ranging from 0 to 4, you are going to fill in the values row by row. You specify a row element that goes to the next row at every loop increment. Next, you have another for loop that for each row goes over all the columns of your sheet. You’ll fill in a value for every column in that row.



When you have filled all the columns of each row with values, you’ll go to the next row until you have zero rows left.

The output of the above code is shown below:

Using pyexcel To Read .xls or .xlsx Files

pyexcel is a Python Wrapper that provides a single API interface for reading, manipulating, and writing data in .csv , .ods , .xls , .xlsx , and .xlsm files. With pyexcel , the data in excel files can be turned into an array or dict format with minimal code.

Below is an example of how you can convert your excel data into an array format using get_array() that is a function within the pyexcel package:

# Import `pyexcel` import pyexcel # Get an array from the data my_array = pyexcel.get_array(file_name="test.xls")

Let's find out how you can convert your excel data into an ordered dictionary of lists. To achieve this you can use the get_dict() function, and it also comes within the pyexcel package:

# Import `OrderedDict` module from pyexcel._compact import OrderedDict # Get your data in an ordered dictionary of lists my_dict = pyexcel.get_dict(file_name="test.xls", name_columns_by_row=0)

You can also get a dictionary of two-dimensional arrays. To put it simply, you can extract all the workbook sheets in a single dictionary with the help of the get_book_dict() function.

# Get your data in a dictionary of 2D arrays book_dict = pyexcel.get_book_dict(file_name="test.xls")

To make your life easier, remember that the above two outputs, my_dict , and book_dict , can be converted to a DataFrame using pd.DataFrame() . This will make it easier for you to handle your data!

To learn how you can efficiently work with Python lists, check out DataCamp's 18 Most Common Python List Questions.

Writing Files With pyexcel

Just like it’s easy to load your data into arrays with this package, you can also easily export your arrays back to a spreadsheet. You can obtain this using the save_as() function and pass the array and the destination file name to the dest_file_name argument as shown below:

# Get the data data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] # Save the array to a file pyexcel.save_as(array=data, dest_file_name="array_data.xls")

Note that if you want to specify a delimiter, you can add the dest_delimiter argument and pass the symbol that you want to use as a delimiter in between like \t , , , "" .

If, however, you have a dictionary, you’ll need to use the save_book_as() function. Pass the two-dimensional dictionary to bookdict and specify the file name and you’re good to go:

# The data 2d_array_dictionary = {'Sheet 1': [ ['ID', 'AGE', 'SCORE'] [1, 22, 5], [2, 15, 6], [3, 28, 9] ], 'Sheet 2': [ ['X', 'Y', 'Z'], [1, 2, 3], [4, 5, 6] [7, 8, 9] ], 'Sheet 3': [ ['M', 'N', 'O', 'P'], [10, 11, 12, 13], [14, 15, 16, 17] [18, 19, 20, 21] ]} # Save the data to a file pyexcel.save_book_as(bookdict=2d_array_dictionary, dest_file_name="2d_array_data.xls")

Note that the order of your data in the dictionary will not be kept in the above code. If you don’t want this, you will need to make a small modification. You can read all about it here.

Reading and Writing .csv files

Python has an enormous amount of packages for achieving similar tasks with a different set of libraries. Hence, if you’re still looking for packages that allow you to load, read and write data to .csv files besides Excel packages and Pandas, you can consider using the CSV package as shown in the below code cell:

# import `csv` import csv # Read in csv file for row in csv.reader(open('data.csv'), delimiter=','): print(row) # Write csv file data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] outfile = open('data.csv', 'w') writer = csv.writer(outfile, delimiter=';', quotechar='"') writer.writerows(data) outfile.close()

Final Check of the Data

When you have the data available, it is generally recommended to check whether the data has been loaded correctly. If you have put your data in a DataFrame, you can easily and quickly check whether the data has been loaded as expected by running head() and tail() functions. The head() will output the first few rows of the data frame while the tail() will output the last few rows of the data frame.

# Check the first entries of the DataFrame df1.head() # Check the last entries of the DataFrame df1.tail()

Tip: make use of DataCamp’s Pandas Cheat Sheet this would come in handy when you are loading files as Pandas DataFrames. For more guidance on how to manipulate Python DataFrames, take our Pandas Tutorial: DataFrames in Python.

Let's check the shape, dimensions, and the data type of the data frame data :

# Inspect the shape data.shape # Inspect the number of dimensions data.ndim # Inspect the data type data.dtype

If you want to know more about how you can leverage NumPy arrays for data analysis, consider going through our NumPy tutorial, and don’t forget to use our Golden NumPy cheat sheet!

Conclusion

Congratulations on finishing this tutorial!

You have successfully gone through our tutorial that taught you all about how to read Excel files in Python.

But importing data is just the start of your data science workflow. Once you have the data from your spreadsheets in your environment, you can focus on what matters: analyzing your data. If you have loaded your data in DataFrames, consider taking our Pandas Foundations course or Manipulating DataFrames with Pandas courses.

However, if you want to continue working on this topic, consider checking out PyXll, which enables you to write functions in Python and call them in Excel.