top of page

World life expectancy project

  • mailfreda
  • Aug 19, 2025
  • 2 min read

Updated: Sep 1, 2025

This MySQL workflow prepares and analyses real-world life expectancy data.


Link to code:


First, let's look at the data:



It begins with data cleaning, including removing duplicate records

Let's run a count on the ID that should be unique:

SELECT Country, Year, CONCAT(Country, Year), COUNT(CONCAT(Country, Year))

FROM world_life_expectancy

GROUP BY Country, Year, CONCAT(Country, Year)

HAVING COUNT(CONCAT(Country, Year)) > 1;


With this code, duplicates were identified

Then I used a ROW_NUMBER() window function to systematically delete duplicate rows while keeping one valid entry. Finally, I re-ran the duplicate check to confirm the data was clean.

SELECT Row_ID, CONCAT(Country, Year),

ROW_NUMBER() OVER(PARTITION BY CONCAT(Country, Year) ORDER BY CONCAT(Country, Year)) AS row_num

FROM world_life_expectancy;


Missing status values were imputed by using country-level data :

UPDATE world_life_expectancy t1

JOIN world_life_expectancy t2

ON t1.Country = t2.Country

SET t1.Status = 'Developing'

WHERE t1.Status = '' AND t2.Status <> '' AND t2.Status = 'Developing';


Missing life expectancy values were imputed with calculated averages:

UPDATE world_life_expectancy t1

JOIN world_life_expectancy t2 ON t1.Country = t2.Country AND t1.Year = t2.Year - 1

JOIN world_life_expectancy t3 ON t1.Country = t3.Country AND t1.Year = t3.Year + 1

SET t1.`Life expectancy` = ROUND((t2.`Life expectancy` + t3.`Life expectancy`) / 2, 1)

WHERE t1.`Life expectancy` = '';


Taking a look at factors influencing or correlating with life expectancy

Using exploratory data analysis


Life expectancy trends:

SELECT Country, MIN(`Life expectancy`), MAX(`Life expectancy`),

ROUND(MAX(`Life expectancy`) - MIN(`Life expectancy`), 1) AS life_increase_15_years

FROM world_life_expectancy

GROUP BY Country

HAVING MIN(`Life expectancy`) <> 0

AND MAX(`Life expectancy`) <> 0

ORDER BY life_increase_15_years DESC;


Many African countries and South Africa had a 15-year life expectancy increase over 15 years:

Life expectancy trends in developing countries
Life expectancy trends in developing countries

Versus a low number of increases seen in developed countries that already had a high life expectancies:

Life expectancy trends in some developed countries
Life expectancy trends in some developed countries

A Positive Correlation was shown between GDP and life expectancy:

Avg life expectancy vs GDP (per country) :

SELECT Country, ROUND(AVG(`Life expectancy`), 1), ROUND(AVG(GDP), 1)

FROM world_life_expectancy

GROUP BY Country

HAVING ROUND(AVG(`Life expectancy`), 1) > 0

AND ROUND(AVG(GDP), 1) > 0

ORDER BY ROUND(AVG(GDP), 1) DESC;


Status vs combined averages of life expectancy and countries:

SELECT Status, ROUND(AVG(`Life expectancy`), 1), COUNT(DISTINCT Country)

FROM world_life_expectancy

GROUP BY Status;


Relationship between BMI and life expectancy

According to this, the high BMI in some developed countries does not seem to lower life expectancy and

requires more exploration:

SELECT Country, ROUND(AVG(`Life expectancy`), 1), ROUND(AVG(BMI), 1)

FROM world_life_expectancy

GROUP BY Country

HAVING ROUND(AVG(`Life expectancy`), 1) > 0

AND ROUND(AVG(BMI), 1) > 0

ORDER BY ROUND(AVG(BMI), 1) DESC;



Mortality Trends

Cumulative adult mortality by country and year were investigated:

SELECT Country, Year, `Life expectancy`, `Adult Mortality`,

SUM(`Adult Mortality`) OVER(PARTITION BY Country ORDER BY Year) AS rolling_total

FROM world_life_expectancy;


Together, these steps provide a structured approach to transforming raw health and economic data into meaningful insights.


bottom of page