A basic music recommendation engine built with Python, Pandas, and Scikit-learn by using content based filtering techniques.
This script analyzes a dataset of songs, finding similar tracks based on their audio features (like danceability, energy) and their genre. It then combines this similarity score with a song's popularity to provide recommendations that are both relevant and popular.
The recommendation logic follows these steps:
- Feature Selection: It selects key audio features (
danceability,energy,loudness,tempo, etc.) and thetrack_genre. - Scaling: All numeric audio features are scaled to a 0-1 range using
MinMaxScalerso that no single feature (likeloudness) can dominate the others. - Genre Encoding: The
track_genrecolumn is converted into a one-hot encoded vector usingpd.get_dummies. This turns categorical genre names into a numeric format that can be used in similarity calculations. - Master DataFrame: The scaled audio features and the encoded genre vectors are combined into a single
master_features_df. - Similarity Calculation: When a user provides a
track_id, the script calculates the Cosine Similarity between that song's feature vector and all other songs in the master dataframe. - Hybrid Scoring: To improve results, the final recommendation score is a weighted blend of two metrics:
- Similarity Score (Alpha = 87%): How similar the songs are.
- Popularity Score (1 - Alpha = 13%): How popular the songs are. This helps to surface popular, relevant tracks and push down very obscure ones.
- Cleaning: The final list is cleaned to remove the original song and any duplicate tracks.
- Python 3
- A
dataset.csvfile (see below) - The Python libraries in
requirements.txt
-
Clone the repository:
git clone [https://github.com/HilariousSoupXD/Project_Medley.git](https://github.com/HilariousSoupXD/Project_Medley.git) cd Project_Medley -
Create and activate a virtual environment:
python3 -m venv venv source venv/bin/activate -
Install dependencies:
pip install -r requirements.txt
This script requires a dataset.csv file in the same directory.
The file can be obtained from Kaggle: https://www.kaggle.com/datasets/maharshipandya/-spotify-tracks-dataset?resource=download
The CSV must contain the following columns:
track_idtrack_namealbum_nameartistspopularitytrack_genredanceabilityenergyloudnessspeechinessacousticnessinstrumentalnesslivenessvalencetempo
To run the default example, simply execute the app.py file:
python app.py