Skills Used:
– Data Cleaning and Exploration: Utilized tidyverse functions to handle duplicate and missing values, and explored the data using summary statistics, histograms, and boxplots.
– Data Visualization: Created visualizations using ggplot to represent the distribution of numeric variables and the distribution of categorical variables like gender and union membership.
– Outlier Detection: Employed boxplots to identify and remove outliers in the CreditDebt variable.
– Correlation Analysis: Calculated and visualized correlations between numeric variables using ggcorrplot.
– Scatterplot Analysis: Utilized scatterplots to visually assess relationships between CreditDebt and other numeric variables like DebtToIncomeRatio and HHIncome.
– Linear Regression Modeling: Built linear regression models to investigate relationships between CreditDebt and independent variables, starting with HHIncome and later including OtherDebt and DebtToIncomeRatio.
– Log Transformation: Explored the impact of log transformation on the dependent variable (CreditDebt) to improve model fit.
– Model Evaluation: Assessed model assumptions using diagnostic plots (Residuals vs Fitted, Normal Q-Q, Scale-Location, Residuals vs Leverage).