World life expectancy project
- mailfreda
- Aug 19, 2025
- 2 min read
Updated: Sep 1, 2025
This MySQL workflow prepares and analyses real-world life expectancy data.
Link to code:

First, let's look at the data:

It begins with data cleaning, including removing duplicate records
Let's run a count on the ID that should be unique:
SELECT Country, Year, CONCAT(Country, Year), COUNT(CONCAT(Country, Year))
FROM world_life_expectancy
GROUP BY Country, Year, CONCAT(Country, Year)
HAVING COUNT(CONCAT(Country, Year)) > 1;
With this code, duplicates were identified
Then I used a ROW_NUMBER() window function to systematically delete duplicate rows while keeping one valid entry. Finally, I re-ran the duplicate check to confirm the data was clean.
SELECT Row_ID, CONCAT(Country, Year),
ROW_NUMBER() OVER(PARTITION BY CONCAT(Country, Year) ORDER BY CONCAT(Country, Year)) AS row_num
FROM world_life_expectancy;
Missing status values were imputed by using country-level data :
UPDATE world_life_expectancy t1
JOIN world_life_expectancy t2
ON t1.Country = t2.Country
SET t1.Status = 'Developing'
WHERE t1.Status = '' AND t2.Status <> '' AND t2.Status = 'Developing';
Missing life expectancy values were imputed with calculated averages:
UPDATE world_life_expectancy t1
JOIN world_life_expectancy t2 ON t1.Country = t2.Country AND t1.Year = t2.Year - 1
JOIN world_life_expectancy t3 ON t1.Country = t3.Country AND t1.Year = t3.Year + 1
SET t1.`Life expectancy` = ROUND((t2.`Life expectancy` + t3.`Life expectancy`) / 2, 1)
WHERE t1.`Life expectancy` = '';
Taking a look at factors influencing or correlating with life expectancy
Using exploratory data analysis
Life expectancy trends:
SELECT Country, MIN(`Life expectancy`), MAX(`Life expectancy`),
ROUND(MAX(`Life expectancy`) - MIN(`Life expectancy`), 1) AS life_increase_15_years
FROM world_life_expectancy
GROUP BY Country
HAVING MIN(`Life expectancy`) <> 0
AND MAX(`Life expectancy`) <> 0
ORDER BY life_increase_15_years DESC;
Many African countries and South Africa had a 15-year life expectancy increase over 15 years:

Versus a low number of increases seen in developed countries that already had a high life expectancies:

A Positive Correlation was shown between GDP and life expectancy:
Avg life expectancy vs GDP (per country) :
SELECT Country, ROUND(AVG(`Life expectancy`), 1), ROUND(AVG(GDP), 1)
FROM world_life_expectancy
GROUP BY Country
HAVING ROUND(AVG(`Life expectancy`), 1) > 0
AND ROUND(AVG(GDP), 1) > 0
ORDER BY ROUND(AVG(GDP), 1) DESC;
Status vs combined averages of life expectancy and countries:
SELECT Status, ROUND(AVG(`Life expectancy`), 1), COUNT(DISTINCT Country)
FROM world_life_expectancy
GROUP BY Status;

Relationship between BMI and life expectancy
According to this, the high BMI in some developed countries does not seem to lower life expectancy and
requires more exploration:
SELECT Country, ROUND(AVG(`Life expectancy`), 1), ROUND(AVG(BMI), 1)
FROM world_life_expectancy
GROUP BY Country
HAVING ROUND(AVG(`Life expectancy`), 1) > 0
AND ROUND(AVG(BMI), 1) > 0
ORDER BY ROUND(AVG(BMI), 1) DESC;

Mortality Trends
Cumulative adult mortality by country and year were investigated:
SELECT Country, Year, `Life expectancy`, `Adult Mortality`,
SUM(`Adult Mortality`) OVER(PARTITION BY Country ORDER BY Year) AS rolling_total
FROM world_life_expectancy;
Together, these steps provide a structured approach to transforming raw health and economic data into meaningful insights.

