top of page

HEALTH METRICS

  • mailfreda
  • Aug 26, 2025
  • 3 min read

Updated: Sep 10, 2025

The dataset simulates health metrics for 1,000 individuals, including information on blood pressure, cholesterol levels, BMI, smoking status, and diabetes status. The data was generated randomly, with certain constraints to mimic real-world distributions.

Link to code:


Health and comorbidities
Health and comorbidities

DATA CLEANING
  • Load all data from the enhanced_health_data table

  • Switch to the health_metrics database

  • Check for duplicate names in the dataset

Note: 5 Names are duplicated, but the data to each name is unique

SELECT

*

FROM

enhanced_health_data

WHERE

Name IN ('Jessica Jones' , 'Timothy Williams',

'Amy Dixon',

'Steven Webb',

'Nicole Anderson')

  • Assign row numbers to identify duplicates uniquely

  • Add a primary key (row_id) for unique identification

    ALTER TABLE enhanced_health_data ADD COLUMN row_id INT AUTO_INCREMENT PRIMARY KEY;

    SELECT

    *

    FROM

    enhanced_health_data;

  • Rename duplicate names safely using row numbers to ensure uniqueness

  • Verify that all previously duplicated names are now unique


DATA EXPLORATION
  • Compare gender representation as counts and percentages

    SELECT

    gender,

    COUNT(*) AS count_gender,

    ROUND((COUNT(*) * 100.0 / (SELECT

    COUNT(*)

    FROM

    enhanced_health_data)),

    2) AS percentage

    FROM

    enhanced_health_data

    GROUP BY gender;

  • Examine minimum and maximum age per gender

  • Categorize age groups (young, middle_age, elderly)

  • Count number of people in each age group

  • Determine percentage distribution of age groups

  • SELECT

    CASE

    WHEN age BETWEEN 18 AND 39 THEN 'young'

    WHEN age BETWEEN 40 AND 65 THEN 'middle_age'

    WHEN age BETWEEN 66 AND 80 THEN 'elderly'

    ELSE 'other'

    END AS age_group,

    COUNT(*) AS count_people,

    ROUND( COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2 ) AS percentage

    FROM enhanced_health_data

    GROUP BY age_group;







HYPERTENSION ANALYSIS

  • Identify individuals with type 1 and type 2 hypertension

  • Count of individuals per hypertension classification

  • Determine percentage of individuals in each hypertension classification

    SELECT

    CASE

    WHEN

    `Systolic BP` > 140

    OR `Diastolic BP` > 90

    THEN

    'Type_2_hypertension'

    WHEN

    `Systolic BP` >= 130

    OR `Diastolic BP` >= 80

    THEN

    'Type_1_hypertension'

    ELSE 'No_hypertension'

    END AS Hypertension_classification,

    COUNT(*) AS count_hypertension,

    COUNT(*) * 100 / SUM(Type_2_hypertension

    AND Type_1_hypertension)

    FROM

    enhanced_health_data

    GROUP BY Hypertension_classification;

CHOLESTEROL ANALYSIS

  • Review minimum and maximum cholesterol values

  • Count individuals with high cholesterol (≥ 240), equals zero

  • Count individuals with borderline high cholesterol (200–239)

  • Cholesterol requires further exploration with lipogram counts

BMI AND OBESITY

  • Review minimum and maximum BMI values

  • Determine obesity prevalence (BMI ≥ 30)

  • Count obesity prevalence per gender with percentages

    SELECT

    Gender,

    COUNT(*) AS Obesity_Count,

    ROUND(COUNT(*)*100/SUM(COUNT(*)) OVER(),2) as percentage

    FROM enhanced_health_data

    WHERE BMI >= 30

    GROUP BY Gender;

SMOKING STATUS

  • Count total smokers in the dataset

  • Count smokers per gender with percentages

SELECT Smoker,Gender,COUNT(*) as Smoker_count,

ROUND(COUNT(*)*100/SUM(COUNT(*)) OVER(),2) as percentage

FROM enhanced_health_data

WHERE Smoker ='True'

GROUP BY Smoker,Gender;








DIABETES STATUS

  • Count total diabetic individuals and calculate percentages

    SELECT

    *

    FROM

    enhanced_health_data;

    SELECT Diabetes,COUNT(*) as Diabetes_count,

    ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2) AS percentage

    FROM enhanced_health_data

    GROUP BY Diabetes;


HEALTH CLASSIFICATION

  • Count individuals per Health category and gender with percentages

    SELECT health,gender, COUNT(*) as health_classification,

    ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2) AS percentage

    FROM enhanced_health_data

    Group by Health,gender

    Order by Health;

  • Health classification for individuals without comorbidities (age < 40, normal BP, BMI < 30, non-smoker, non-diabetic)

    SELECT Health,

    COUNT(*) AS health_without_comorbitities,

    ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2) AS percentage

    FROM enhanced_health_data

    WHERE age < 40

    AND `Systolic BP` <= 140

    AND `Diastolic BP` <= 90

    AND BMI < 30

    AND Smoker = 'False'

    AND Diabetes = 'False'

    GROUP BY Health

    ORDER BY Health;

  • Health classification for individuals with all comorbidities (age ≥ 40, high BP, BMI ≥ 30, smoker, diabetic)

    SELECT Health,

    COUNT(*) AS health_with_comorbitities,

    ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2) AS percentage

    FROM enhanced_health_data

    WHERE age >= 40

    AND `Systolic BP` > 140

    OR `Diastolic BP` > 90

    AND BMI >= 30

    AND Smoker = 'True'

    AND Diabetes = 'True'

    GROUP BY Health

    ORDER BY Health;

    INSIGHTS

    1. Gender Representation: Males and females are equally represented in the sample.

    2. Age Distribution: All age groups are represented, although the elderly population is smaller compared to the middle-aged and younger groups.

    3. Hypertension Prevalence: 17% of individuals in this sample have type 2 hypertension.

    4. Obesity Trends: Obesity is slightly more prevalent among females.

    5. Smoking Patterns: Both males and females have nearly 50% of individuals who smoke.

    6. Diabetes Prevalence: Diabetes is present across both gender groups.

    7. Health Classification Without Comorbidities: Individuals with no comorbidities generally fall into the Fair or Good health classification.

    8. Health Classification With All Comorbidities: Individuals with all comorbidities show significantly poorer health, highlighting the detrimental impact of multiple coexisting conditions.









\


bottom of page