
Predicting Streams
This project we uses machine learning to predict Spotify song streaming success. We analyzed 6,513 songs using audio features (tempo, danceability, energy), song attributes (key, mode), and popularity metrics (artist followers, Grammy awards, TikTok viral status) from a Kaggle dataset.
After testing 10 different models through nested cross-validation, Gradient Boosting emerged as the best performer with an R² of 0.8349 and MSE of 0.3654. The optimal model used 100 estimators, 0.1 learning rate, and maximum depth of 3.
The team addressed challenges like skewed streaming data through log transformation and engineered additional popularity features to improve predictions. This work demonstrates how machine learning can help the music industry make data-driven decisions for revenue forecasting and risk reduction.