Intro to Reading and Writing Spreadsheets with Python

Who is this post for?

Beginners and Mac people (Windows people needs to have Python installed and set up first)

What do you need?

Any Mac, a text editor (I recommend Atom) and persistence

What are you going to get out of this?

You will learn how to read, transform and output a spreadsheet using Python. For the transform, we will be looking at obtaining Customers who we spoke to before a certain date.

Who the f*** am I?

I am the founder of Love Spreadsheets. We aim to bridge the gap between Excel users and coders.

Let’s get started!

I. Verify and Install Libraries

First we are going to check if Python is installed and install another library that will help us deal with spreadsheets.

A library is a collection of code that has implemented (usually) hard things to do in a simpler way.

  1. We need to first open up the Terminal which will let us interact with our system. Go to the search on your Mac and find and open up the Terminal
Image for post
Image for post
Search and Open the Terminal Application

2. Open up your Terminal, type in python and press Enter

$ python

Note: the dollar sign specifies the beginning of the line of the terminal, there might be a lot of different text at the beginning.

You should see a console appear with the Python version running. I’m running Python 2.7 but this tutorial and code will work up to Python 3.7. Let me know if you encounter any problems.

Image for post
Image for post
Type in python and click enter to see the console

If there is an error here, then Google the error and see what you need to do to fix it.

3. If python appeared correctly, first we will need to exit out of python by pressing

Ctrl + d

This will bring us back to the command terminal with the $

Now we are going to install the library that will help us read and handle spreadsheets. It’s called pandas and we are going to install it with the following command. Type the following and press Enter:

$ pip install pandas

We are also going to install another 2 libraries xlrd, openpyxl that help with Excel reading and writing. You won’t be dealing with these libraries but they will need to be installed.

$ pip install xlrd openpyxl

4. If the pandas library is successfully installed, you will see a bunch of text with the last line stating that it was successful. If you encounter an error here, you will need to fix that before proceeding.

Image for post
Image for post
There will be a bunch of text and it will not match mine but as long as the Successfully installed statement appears with no errors, you are good

Similarly if the xlrd and openpyxl libraries are successfully installed you would get a similar “Successfully installed” message as pandas

II. Organize Files

Now we are going to save our Sample Spreadsheet and create a Python file in which we will write our code in.

5. I have created a sample data file called CustomerSample.xlsx. You can download it by clicking here.

The file has two sheets: Customers and Prospects. The file has 8 columns of dummy data.

6. Now we are going to create a Python code file. Open up Atom or any text editor. No need to write anything right now, and save it as

Image for post
Image for post

7. There are a couple of very important things to remember:

i. Both the CustomerSample.xlsx and the file should be placed in the same folder called PythonSpreadsheetTutorial on the Desktop for all the code here to work

Image for post
Image for post
These aren’t rules but if you are new it’s easier to follow my naming first before branching out

ii. The names of the files and folder should be the ones above for the code to work

III. Read Excel File

If you have made it this far, congrats! Most people give up while setting up their environment and getting everything ready.

Now we can finally get to the fun part. The final code is only 4 lines and is as follows. We will break it down in the post

The full file

8. Open up the file that you created and get ready to write some code

The whole code for reading in the file consists of only 2 lines

import pandas as pdcustomer_sample_file = pd.read_excel("CustomerSample.xlsx", sheet_name="Prospects", parse_dates=[0])

9. The first line tells python we are going to be using the pandas library in our code and we are going to be calling it pd

10. The second line is how we read Spreadsheets with Pandas. We make a variable customer_sample_file and store the results from calling a function pd.read_excel (read_excel from the Pandas library)

11. The pd.read_excel function takes 1 mandatory parameter, the name and location of the Spreadsheet (this is why it’s important to have the naming and location be correct, but if you decided to change it, you need to change it to the full path here).

We provide two more optional parameters, sheet_name and parse_dates. The sheet_name tells the function which sheet to read and parse_dates takes a list of columns that should be read as dates. The 0 specifies the 1st column and so on.

This function has a lot of parameters that you can specify. I encourage you to read about them all here. You can experiment with the different parameters to see what they can do.

12. The variable customer_sample_file now contains all the rows and columns of the spreadsheet in a data frame and since our first row were column names they become the column names of this data frame.

If you printed out customer_sample_file then this what you see it contains

Image for post
Image for post

IV. Transform File

For our transformation, we are going to get only the records that were recorded in 2017 or earlier.

The code for this part is only 1 line.

customers_2017_or_earlier = customer_sample_file[customer_sample_file["DateTime Recorded"] < "2018-01-01"]

13. Now we create another variable customers_2017_or_earlier and we assign it to customers who were recorded before January 1st 2018.

14. This is done in two steps, the inside code

customer_sample_file["DateTime Recorded"] < "2018-01-01"

assigns True or False values to each record where the column values of DateTime Recorded is less than January 1st 2018, tean the outer part of the code

customer_sample_file[customer_sample_file["DateTime Recorded"] < "2018-01-01"]

selects those records from the Data Frame

V. Output Records

Great, so now we have the records from before and we can output them in their own little Excel file.

The code for this is also 1 line


15. We take the variable customers_2017_or_earlier and run a function to_excel. This function takes an argument which is the name of the new file you want to create

16. Now that the file is complete we need to run the script. To do that we need to pen up Terminal, and navigate to the folder where the file is saved, the PythonSpreadsheetTutorial on the Desktop.

Run the following command in my terminal and press Enter

$ cd ~/Desktop/PythonSpreadsheetTutorial

If you saved your file in a different folder, then you will need to specify your path. The squiggle signifies you are in the main directory for your user and then you build up the rest of your path to the folder of the script. If it works correctly then you will see the name before the $ sign change to the name of the folder

Image for post
Image for post
There should be a clear change when you have entered the correct folder

17. Now is time for the the final command. In our folder from the Terminal we run the following command and press Enter.

$ python
Image for post
Image for post
Just press Enter and that’s it

If there is no error, then nothing should happen and you should see a new file appear in your folder with all the records from 2017 or before!

I hope this helps and do let me know in the comments if you have any questions!

Products and resources that magically simplify hard spreadsheet tasks. Check us out at

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store