This project performs data exploration, preprocessing, and regression modeling on a dataset. It builds and compares Linear Regression, Lasso (L1), and Ridge (L2) models, tunes hyperparameters via cross-validation, analyzes coefficients, and produces visualizations.
- Data loading and inspection using pandas
- Train/test splitting (80/20 with fixed random state) and feature scaling via
StandardScaler - Training of Linear Regression, Lasso, and Ridge regression models
- Cross-validated hyperparameter tuning using
LassoCVandRidgeCV(5-fold CV over log-spaced alphas) - Coefficient analysis and comparison across all three models
- Model evaluation with R² scores and mean squared error
- Visualization including feature distributions, boxplots, scatter plots, coefficient comparisons, predictions vs. actual plots, and residual analysis
Python 3.x with the following packages:
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn
python3 PSUData_Exploration.pyProvides hands-on experience with regression modeling, regularization, hyperparameter tuning, and exploratory data analysis.
Suitable for quick experiments on small tabular datasets (e.g., 30+ samples) where comparing OLS, L1, and L2 regularization is needed.
Designed for research purposes associated with data mining and statistical learning coursework. Datasets in this repository are dummy data. Penn State University (PSU), IST 557 Data Mining. Fall 2025.