Who is this post for?
Beginners and Mac people (Windows people needs to have Python installed and set up first)
What do you need?
Any Mac, a text editor (I recommend Atom) and persistence
What are you going to get out of this?
You will learn how to read, transform and output a spreadsheet using Python. For the transform, we will be looking at obtaining Customers who we spoke to before a certain date.
Who the f*** am I?
I am the founder of Love Spreadsheets. We aim to bridge the gap between Excel users and coders.
Let’s get started!
I. Verify and Install Libraries
First we are going to check if Python is installed and install another library that will help us deal with spreadsheets.
A library is a collection of code that has implemented (usually) hard things to do in a simpler way.
- We need to first open up the Terminal which will let us interact with our system. Go to the search on your Mac and find and open up the Terminal
2. Open up your Terminal, type in python and press Enter
Note: the dollar sign specifies the beginning of the line of the terminal, there might be a lot of different text at the beginning.
You should see a console appear with the Python version running. I’m running Python 2.7 but this tutorial and code will work up to Python 3.7. Let me know if you encounter any problems.
If there is an error here, then Google the error and see what you need to do to fix it.
3. If python appeared correctly, first we will need to exit out of python by pressing
Ctrl + d
This will bring us back to the command terminal with the $
Now we are going to install the library that will help us read and handle spreadsheets. It’s called pandas and we are going to install it with the following command. Type the following and press Enter:
$ pip install pandas
We are also going to install another 2 libraries xlrd, openpyxl that help with Excel reading and writing. You won’t be dealing with these libraries but they will need to be installed.
$ pip install xlrd openpyxl
4. If the pandas library is successfully installed, you will see a bunch of text with the last line stating that it was successful. If you encounter an error here, you will need to fix that before proceeding.
Similarly if the xlrd and openpyxl libraries are successfully installed you would get a similar “Successfully installed” message as pandas
II. Organize Files
Now we are going to save our Sample Spreadsheet and create a Python file in which we will write our code in.
5. I have created a sample data file called CustomerSample.xlsx. You can download it by clicking here.
The file has two sheets: Customers and Prospects. The file has 8 columns of dummy data.
6. Now we are going to create a Python code file. Open up Atom or any text editor. No need to write anything right now, and save it as excel.py.
7. There are a couple of very important things to remember:
i. Both the CustomerSample.xlsx and the Excel.py file should be placed in the same folder called PythonSpreadsheetTutorial on the Desktop for all the code here to work
ii. The names of the files and folder should be the ones above for the code to work
III. Read Excel File
If you have made it this far, congrats! Most people give up while setting up their environment and getting everything ready.
Now we can finally get to the fun part. The final code is only 4 lines and is as follows. We will break it down in the post
8. Open up the excel.py file that you created and get ready to write some code
The whole code for reading in the file consists of only 2 lines
import pandas as pdcustomer_sample_file = pd.read_excel("CustomerSample.xlsx", sheet_name="Prospects", parse_dates=)
9. The first line tells python we are going to be using the pandas library in our code and we are going to be calling it pd
10. The second line is how we read Spreadsheets with Pandas. We make a variable customer_sample_file and store the results from calling a function pd.read_excel (read_excel from the Pandas library)
11. The pd.read_excel function takes 1 mandatory parameter, the name and location of the Spreadsheet (this is why it’s important to have the naming and location be correct, but if you decided to change it, you need to change it to the full path here).
We provide two more optional parameters, sheet_name and parse_dates. The sheet_name tells the function which sheet to read and parse_dates takes a list of columns that should be read as dates. The 0 specifies the 1st column and so on.
This function has a lot of parameters that you can specify. I encourage you to read about them all here. You can experiment with the different parameters to see what they can do.
12. The variable customer_sample_file now contains all the rows and columns of the spreadsheet in a data frame and since our first row were column names they become the column names of this data frame.
If you printed out customer_sample_file then this what you see it contains
IV. Transform File
For our transformation, we are going to get only the records that were recorded in 2017 or earlier.
The code for this part is only 1 line.
customers_2017_or_earlier = customer_sample_file[customer_sample_file["DateTime Recorded"] < "2018-01-01"]
13. Now we create another variable customers_2017_or_earlier and we assign it to customers who were recorded before January 1st 2018.
14. This is done in two steps, the inside code
customer_sample_file["DateTime Recorded"] < "2018-01-01"
assigns True or False values to each record where the column values of DateTime Recorded is less than January 1st 2018, tean the outer part of the code
customer_sample_file[customer_sample_file["DateTime Recorded"] < "2018-01-01"]
selects those records from the Data Frame
V. Output Records
Great, so now we have the records from before and we can output them in their own little Excel file.
The code for this is also 1 line
15. We take the variable customers_2017_or_earlier and run a function to_excel. This function takes an argument which is the name of the new file you want to create
16. Now that the file is complete we need to run the script. To do that we need to pen up Terminal, and navigate to the folder where the excel.py file is saved, the PythonSpreadsheetTutorial on the Desktop.
Run the following command in my terminal and press Enter
$ cd ~/Desktop/PythonSpreadsheetTutorial
If you saved your excel.py file in a different folder, then you will need to specify your path. The squiggle signifies you are in the main directory for your user and then you build up the rest of your path to the folder of the script. If it works correctly then you will see the name before the $ sign change to the name of the folder
17. Now is time for the the final command. In our folder from the Terminal we run the following command and press Enter.
$ python excel.py
If there is no error, then nothing should happen and you should see a new file appear in your folder with all the records from 2017 or before!
I hope this helps and do let me know in the comments if you have any questions!