Skip to content

HilariousSoupXD/Project_Medley

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project Medley 🎵

A basic music recommendation engine built with Python, Pandas, and Scikit-learn by using content based filtering techniques.

This script analyzes a dataset of songs, finding similar tracks based on their audio features (like danceability, energy) and their genre. It then combines this similarity score with a song's popularity to provide recommendations that are both relevant and popular.


How It Works

The recommendation logic follows these steps:

  1. Feature Selection: It selects key audio features (danceability, energy, loudness, tempo, etc.) and the track_genre.
  2. Scaling: All numeric audio features are scaled to a 0-1 range using MinMaxScaler so that no single feature (like loudness) can dominate the others.
  3. Genre Encoding: The track_genre column is converted into a one-hot encoded vector using pd.get_dummies. This turns categorical genre names into a numeric format that can be used in similarity calculations.
  4. Master DataFrame: The scaled audio features and the encoded genre vectors are combined into a single master_features_df.
  5. Similarity Calculation: When a user provides a track_id, the script calculates the Cosine Similarity between that song's feature vector and all other songs in the master dataframe.
  6. Hybrid Scoring: To improve results, the final recommendation score is a weighted blend of two metrics:
    • Similarity Score (Alpha = 87%): How similar the songs are.
    • Popularity Score (1 - Alpha = 13%): How popular the songs are. This helps to surface popular, relevant tracks and push down very obscure ones.
  7. Cleaning: The final list is cleaned to remove the original song and any duplicate tracks.

How to Use

1. Prerequisites

  • Python 3
  • A dataset.csv file (see below)
  • The Python libraries in requirements.txt

2. Setup

  1. Clone the repository:

    git clone [https://github.com/HilariousSoupXD/Project_Medley.git](https://github.com/HilariousSoupXD/Project_Medley.git)
    cd Project_Medley
  2. Create and activate a virtual environment:

    python3 -m venv venv
    source venv/bin/activate
  3. Install dependencies:

    pip install -r requirements.txt

3. Data

This script requires a dataset.csv file in the same directory.

The file can be obtained from Kaggle: https://www.kaggle.com/datasets/maharshipandya/-spotify-tracks-dataset?resource=download

The CSV must contain the following columns:

  • track_id
  • track_name
  • album_name
  • artists
  • popularity
  • track_genre
  • danceability
  • energy
  • loudness
  • speechiness
  • acousticness
  • instrumentalness
  • liveness
  • valence
  • tempo

4. Run the Script

To run the default example, simply execute the app.py file:

python app.py

About

My attempt at recreating Spotify's dope recommendation algorithm.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages