top of page

US Household income data

  • mailfreda
  • Aug 25, 2025
  • 2 min read

Data cleaning and exploration


Link to code:

Household income and statistics
Household income and statistics
Data cleaning

Look for duplicates

Delete duplicates

DELETE FROM us_project.us_household_income

WHERE row_id IN (

SELECT row_id

FROM (

SELECT row_id,id,

ROW_NUMBER () OVER (PARTITION BY id ORDER BY id) AS row_num

FROM us_project.us_household_income) as duplicates

WHERE row_num>1

);

Update spelling of state names

UPDATE us_project.us_household_income

SET State_Name = 'Georgia'

WHERE State_Name = 'georia'

;

Populate missing data in the place column



SELECT *

FROM us_project.us_household_income

WHERE place = '';


SELECT *

FROM us_project.us_household_income

WHERE county ='Autauga County' ;


UPDATE us_project.us_household_income

SET place = 'Autaugaville'

WHERE COUNTY = 'Autauga County'

AND city ='Vinemont'

;


Look for missing Awater and Aland columns

SELECT DISTINCT Awater,Aland

from us_project.us_household_income

WHERE Awater =0 OR Awater =''OR Awater is NULL

AND Aland =0 OR Aland =''OR Aland is NULL

;

Exploratory data analysis

Landarea per state

SELECT DISTINCT State_Name, SUM(ALand) as total_Aland

FROM us_project.us_household_income

GROUP bY State_Name

ORDER BY total_Aland Desc

;








Comparison of water and land area per state

SELECT DISTINCT State_Name, SUM(Awater) as total_Awater, SUM(Aland) as total_Aland

FROM us_project.us_household_income

GROUP bY State_Name

ORDER BY 3 Desc

;

Top 10 states according to land area

SELECT DISTINCT State_Name, SUM(Awater) as total_Awater, SUM(Aland) as total_Aland

FROM us_project.us_household_income

GROUP bY State_Name

ORDER BY 3 Desc

Limit 10

;








Combing tables with inner join,

Mean of 0 indicates inaccurate reporting, thus filter out

SELECT *

FROM us_project.us_household_income i

INNER JOIN us_project.us_household_income_statistics s

ON i.id=s.id

WHERE mean <>0

;

AVERAGE median and mean income per state

SELECT i.State_name, ROUND(AVG( Mean),1), ROUND(AVG(Median),1)

FROM us_project.us_household_income i

INNER JOIN us_project.us_household_income_statistics s

ON i.id=s.id

WHERE mean <>0

GROUP BY i.State_name

ORDER BY 2 DESC

LIMIT 10

;






 Key Insights
  • Data quality issues (duplicates, misspellings and zero statistics) were systematically identified and corrected.

  • Land and water distribution varies widely across states, with some states dominating in either category.

  • States and cities with the highest average mean/median incomes were identified, offering insights into regional income disparities.

  • Categorical variables such as type also influenced income patterns, with meaningful trends once outliers were excluded.

bottom of page