A Quick Guide on Cleaning People’s Names from Your Data

Love Spreadsheets
7 min readNov 3, 2020

People or customer names are frequently used data types. So you would think they would be straightforward to work with, right? Right?

You may find yourself having a spreadsheet with a bunch of different customer names and you’re stuck figuring out how to clean them.

If you want a quick way to do it, check out Clean Spreadsheets or Love Spreadsheets!

However, if you are trying to code or manually clean them yourself here is a general overview on what you need to keep in mind.

First… what is a Name?

This may seem like a question that doesn’t need to be answered. Everyone and everything has a name. Animals, places, people, things are all defined and known by a word or combination of words that becomes their identity.

My name is Astha, I’m in New Jersey and I’m using a MacBook Pro.

In this article we will be focusing on names of people and the different types of naming conventions there are around the world.

So… what does a name consist of?

We will dive into different regions of the world and how the specific naming conventions work in those parts.

Common legal names will include a persons first name or forename to identify the person and a surname or last name that is commonly shared with other family members.

Keep in mind that in this article we will looking at names from a Western perspective. We will be broadly looking into 5 different regions around the world and their naming conventions.

Western Countries

In Western culture naming conventions are most commonly personal name, middle name, and family name or surname, such as John Adam Harris.

It will be written as such and most often the middle name is given in case of the first and surname being too common. In alphabetical lists the name will be noted as Harris, John A.

Now let’s walk through the most often seen exceptions

France

French naming conventions rarely have a middle name but their first names can be hyphenated such as Jean-Luc. People can also have two first names but usually go by the one that goes first.

Some surnames can have the word De (of) or Du (of the) (De La is the feminine equivalent) that were used for nobility but are still seen today.

This adds an extra word to the surname and must be accounted for when cleaning French names.

Spanish Speaking Countries

According to Spanish customs, a child will get their first surname from their father and their second surname from their mother’s first surname.

The Philippines follow the same naming conventions because of the older Spanish system that was in place.

Confusing, I know, but lets say the father’s name Richard Garcia Rodriguez and the mother’s name is Maria Lopez Hernández , the child would be Marvin Garcia Lopez.

Furthermore, in Hispanic cultures the first surname is primary and would be written as Garcia, Marvin Lopez for example.

There can also be 2 names that make up the first name, such as Juan Esteban Aristizábal Vásquez

Portuguese Speaking Countries

Portuguese countries follow the same convention except the mother’s surname will come first in the child’s official name.

Middle East

Arabic names can consist of 5 parts: the ism, kunya, nasab, laqab and nisba in no particular order.

To break it up, the ism is the given or birth name and usually means something.

The kunya is an honorific name and not part of the person’s formal name. An example can be using Umm or Abu to mean “mother of” or “father of”.

Next, the nasab is patronymic and refers to the person’s heritage using bin or ibn, meaning “son of” or ibnant or bint meaning “daughter of”.

The laqab is usually a religious epithet such as al-Rashid meaning the rightly guided and will usually be after the ism.

Now the nisba can be similar to a surname but it’s not used in Egypt or Lebanon. It can also stand for an occupation, geographic location, or a tribe or family. For example, al-Attar means “the spice vendor” and al-Makki is “of mecca”.

This may be a little confusing still so to give you an example the name, Saleh ibn Tariq al-Fulan translates to “Saleh, son of Tariq, son of Khalid; of the family al-Fulan”.

His ism is Saleh, nasab is ibn Tariq, and al-Fulan is his nisba or family name .

In Western countries, Middle Eastern people will often drop their names and simplify it to a first, middle, and last name.

So Saleh ibn Tariq ibn Khalid al-Fulan will go by Saleh al-Fulan

Africa

Africa varies widely but they most commonly use western conventions of: First Name, Middle Name, Last Name

Fun Fact:

Traditional African names are very unique and often can tell a story.

A few ways children are named can be inspired from events surrounding birth, emotional warnings, order of birth, day of the week the child is born, faith-based names, and/or day and night.

Sometimes names can be full sentences that differ from countries in Africa.

However, in countries such as Ethiopia and Eritrea there technically isn’t a family name. They usually consist of a personal name and separate patronymic.

Children have a given name at birth and get their father’s and sometimes grandfather’s first names added.

Legally in other countries, the last name of their full name is seen as the surname and the names in between are the middle names.

For example, popular singer The Weeknd is of Ethiopian descent and his full name is Abel Makkonen Tesfaye. This makes his first name Abel, middle name Makkonen and last name Tesfaye like any other western name.

East Asia

Unlike Western names, most of these East Asian countries are presented with the surname or family name first and then followed by the given or birth name. However, many will change the order to follow Western patterns.

Fun Fact:

There are many different alternative names Chinese people may have including nicknames, Western name, School name, courtesy name, pseudonym, temple name, etc.

For example, in China the name Mao Zedong will be written as Zedong Mao in Western ordering.

Fun Fact:

95% of Chinese people belong to the Han ethnicity and Han family names are generally one syllables. While given names are generally two syllables.

In Cambodia the same pattern is followed however they do not have a family name and take their mother or father’s first name as their surname instead.

Vietnam naming conventions also include a middle name because 50% of last names in Vietnam are Nguyen. So the middle name is very important to tell the gender of the person.

For example, Nguyen Van Minh is a man while Nguyen Thi Minh is a woman.

In Thai culture the first name is followed by the surname as Western countries.

It’s common for Thai surnames to be quite long so people will can refer to their first names followed by “khun” which means Mr. or Miss.

South Asia

Most South Asian naming conventions follow: First name, optional middle name, and surname.

However, in South India it is common to see two initials before the first name. Such as H.D. Kumarawamy Rao.

The initials can represent the village name, and father’s name followed by the first name, and last name. In Western cultures the initials are usually dropped and people will go by their first and last name.

Fun Fact:

It is common practice in Pakistan for the father’s first name to become the child’s surname.

Things To Keep in Mind

When cleaning names it’s important to keep different naming conventions in mind to ensure your data gets cleaned the right way. This is especially important if you have a diverse set of names from all over the world.

Where to go from here?

You can use this guide to come up with an implementation to clean names from your database, programming language or spreadsheets.

And if you need to clean names in spreadsheets, you can check out our tool Clean Spreadsheets to automatically clean and transform any names in your spreadsheets.

If you want a custom app or project built using spreadsheets, you can check out our consulting service here: https://www.lovespreadsheets.com!

Happy Data Cleaning!

--

--

Love Spreadsheets

AI software to get data from your data sources using just natural language. Try it out for free at www.lovespreadsheets.com