Top 5 Baby Names

A look into the top baby boy and girl names in the U.S. from 2014–2019.

A pciture of a baby girl.
Photo by kaushal mishra on Unsplash

As part of the Google Data Analytics Certificate course on Coursera, I learned about the BigQuery sandbox and some basic SQL queries.

One of the assignments involved analyzing 2014 baby names, which was in a zipped file located in the course assignment. The data can also be found on the US Social Security Administration, although I didn’t see the count listed on the website.

As for the assignment, I unzipped the file and then uploaded the data into BigQuery as a csv file. The data in the file has the format “name,sex,number,” where name is 2 to 15 characters, sex is M (male) or F (female) and “number” is the number of occurrences of the name. I entered the information below to set the data types for the three columns.

To get the top baby names, I entered the query below based on the directions.

A SQL text to query the top 5 boy names from the 2014 file.
A SQL text to query the top 5 boy names from the 2014 file.
Query to get the top 5 baby boy names in descending order.

The meaning of this query is to look at the name and the count (total number of babies with that name) for the year specified in the ‘FROM’ criteria. For the gender, only look at the boy names (‘M’). Put the names in descending order based on the count, but only show the top 5 in the results.

The result is shown below where Noah was the number one boy name in 2014.

A chart with the top five baby boy names in 2014.

This got me thinking on what the top names were for girls in 2014. Then what about from 2014 to 2019, which was the latest data that was included in the file.

So I queried the top names for girls in 2014, then imported the data from 2015–2019 into the sandbox. Since I didn’t know how to export the query results from BigQuery, I copied and pasted all the query results from 2014–2019 into a Google sheet. Then I exported the file in order to import into Tableau for analysis.

Data from 2014 in a Google sheet. 2015–2019 is not shown.

This is what the data looks like when I imported it into Tableau. The data types looked right for the columns, so I went on to look into the data further.

There were no errors and blanks that I saw as I manually copied the results into the the sheet before exporting.

I wanted to see how many names were repeating from 2014–2019. This required using the name’s order, count, and year. I also filtered the data based on gender, looking at the girl’s first and then the boy’s.

For the girls:

There are five names that alternate between the top 5 girl names from 2014–2019. For the number 1 girl names, it is a close match between Olivia (in 2019) and Emma (from 2014–2018). For the 2nd top name, these two names swapped places in the years listed. For the third top name, it is between Ava (2016–2019) and Sophia (2014–2015). The 4th and 5th places were split between three remaining names: Sophia, Isabella, and Ava.

A chart on Tableau showing top girl names from 2014–2019, with the count and order.

Since Olivia and Emma were the number one names, I looked further into the count of each name. The names come pretty close in the six year time frame, with 2016 as the closest with the count almost overlapping.

Count of Emma and Olivia.

For the boys:

There are nine boy names that were the top 5 names from 2014–2019. All nine names are William, Oliver, Noah, Mason, Logan, Liam, James, Jacob, and Elijah in no particular order. The number 1 boy name alternated between Liam (from 2017–2019) and Noah (from 2014–2016). This was the opposite for the 2nd top name where where these names switched places. For the 3rd top, it goes from the most recent being Oliver, William, and then Mason. The fourth and fifth names vary between four to five the remaining names listed above.

Since Noah and Liam were the top names, I looked further into the count of each name. The names come pretty close in the six year time frame, with 2017 as the closest with the count close to overlapping.

It is interesting to see how both the girl and boy names had two names that were the top 1 from 2014–2019.

Given how there was data spanning back from 1880–2019, it would be interesting to see if these top names reappeared in prior years.

Another application of this data is to create a baby name generator, either a general one or given a specific criteria, like ‘give a girl name in the top 50, for the past 5 years’.

Another idea is to see how many other people have your first name.

This was an interesting experiment to do and learn from. Given more time, I would delve into the additional quests noted above.

Thank you for reading. As always, please let me know of any comments on ways to improve my writing or any questions. Have a great rest of your day!

Interested in product, tech and business. Enjoy trying new foods, singing, and watching movies.