A Practical High Level No Nonsense Guide to Learning Data Science (Part I)

Love Spreadsheets
4 min readJan 31, 2020

--

The term Data Science has been muddled more than any other tech term over the past few years (maybe except for blockchain).

Different people define the activities that comprise Data Science differently.

But all of them would agree that the 4 steps we are about to lay out form the foundation of any Data Science project.

Photo by Markus Spiske on Unsplash

We know getting started is the hardest part, and that’s what this guide is for.

It is not meant to be in-depth. Nor is it meant to court everyone’s opinion.

Or touch upon adjacent fields and activities (like Data Engineering, Data Analysts, Business Requirements).

This is our opinionated view on how to get started learning the technical components of Data Science. It takes work but like anything else very do-able with practice.

These are the 4 major steps of any Data Science project:

  1. Source
  2. Clean
  3. Model
  4. Report

We will cover each step in a separate blog post. This is Part I and will cover Source.

Source

Source refers to getting the data you need to do the project.

The method of getting the data varies widely depending on where the data lives.

However, you can safely say that Databases, APIs and Files dominate the source arena.

Databases

From wikipedia itself:

A database is an organized collection of data, generally stored and accessed electronically from a computer system. Where databases are more complex they are often developed using formal design and modeling techniques.

Databases come in different types with the most common being relation databases such as Postgres, MySQL and SQLite.

Even though there are various implementations of relational databases, you can interact with any of them using SQL.

SQL stands for Structured Query Language and is pronounced (“Ess-que-el” or “Sequel”). Depending on what databases you are using some syntax may need to be changed but worry not, it is usually very minor and not worth fretting over.

To learn SQL, you need to practice on a real database. The best way to set up a real database is to set up a small SQLite database and run SQL queries on it

Files

Even though everyone complains about a lot of data being stored in files, they are still the most widely used data store.

Spreadsheets, CSV files and Text files being the most common.

In order to do anything meaningful with files, you need to learn a language. The most common Data Science languages are Python & R.

Eventually you should pick up both, but in the beginning we recommend starting with Python because of its versatility.

Pandas is a library built in python that makes a lot of data processing tasks easier. Especially with reading and writing files.

The quickest way to get started with files is to install and run Python and start reading, manipulating and writing files.

You can even bring in your database to Python and use SQL within Python itself. That way you have all your data sources connected to a single script and can work off them together.

APIs

API stands for Application Programming Interface. Applications create a set of guidelines and procedures to interact with their data.

Some of your favorite applications such as Twitter, Facebook, Google, Yelp all have APIs to get the data you need from them.

You interact with an API in a programming language and if you followed the above suggestion of using Python, then you can get started right away.

The key to getting better at sourcing from APIs is to do something practical with the data. You can do this by reading the API documentation and contacting the API developers for any clarifications.

Other Sources

Data Sources can be super diverse, especially in today’s landscape where almost everything creates data. However, the three sources above will cover a huge chunk of any data you would need to access.

Practice

While going through the above, the key thing is to practice, practice, practice. There are some things you can do to practice

  • Store data in SQLite database
  • Phrase questions you need answering from the database in SQL queries
  • Query, query, query
  • Read spreadsheets
  • Query data in Python using SQL and output to a CSV file
  • Read data from an API and build a small practical project
  • Read data from an API and create a table in the SQLite database. All in Python

Let us know if you have any questions!

Coming Up Next…

Part II: Clean

--

--

Love Spreadsheets
Love Spreadsheets

Written by Love Spreadsheets

An AI powered Data & Analytics company. On a mission to make everyone love spreadsheets!

No responses yet