Skip to content

AlexanderUbaldoGutierrez21/PSUTrainingDataModel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Training Model

This project performs data exploration, preprocessing, and regression modeling on a dataset. It builds and compares Linear Regression, Lasso (L1), and Ridge (L2) models, tunes hyperparameters via cross-validation, analyzes coefficients, and produces visualizations.

Capabilities

  • Data loading and inspection using pandas
  • Train/test splitting (80/20 with fixed random state) and feature scaling via StandardScaler
  • Training of Linear Regression, Lasso, and Ridge regression models
  • Cross-validated hyperparameter tuning using LassoCV and RidgeCV (5-fold CV over log-spaced alphas)
  • Coefficient analysis and comparison across all three models
  • Model evaluation with R² scores and mean squared error
  • Visualization including feature distributions, boxplots, scatter plots, coefficient comparisons, predictions vs. actual plots, and residual analysis

Usage

Prerequisites

Python 3.x with the following packages:

  • pandas
  • numpy
  • matplotlib
  • seaborn
  • scikit-learn

Terminal MAC Run Script

python3 PSUData_Exploration.py

Use Cases

Educational / Coursework

Provides hands-on experience with regression modeling, regularization, hyperparameter tuning, and exploratory data analysis.

Prototyping Small-Scale Regression Tasks

Suitable for quick experiments on small tabular datasets (e.g., 30+ samples) where comparing OLS, L1, and L2 regularization is needed.

Research Purposes

Designed for research purposes associated with data mining and statistical learning coursework. Datasets in this repository are dummy data. Penn State University (PSU), IST 557 Data Mining. Fall 2025.