<!-- TITLE: Spotify data --> # How to get it? + There were some public challenges that released huge gobs of data: [link prediction task](https://www.aicrowd.com/challenges/spotify-sequential-skip-prediction-challenge-old). + They include some interesting [song features](https://www.semanticscholar.org/paper/Deep-content-based-music-recommendation-Oord-Dieleman/eeff60867041d2ea92d1b38a20c2031d240d8872) that encode some "deep" content of what the song is, for neural net training + [Here](https://towardsdatascience.com/predicting-spotify-track-skips-49cf4a48b2a5) is an example analysis + All the analyses submitted for this challenge are (by requirement) open source + [Top 50 by year](https://www.kaggle.com/leonardopena/top50spotify2019) + [Hit predictor dataset](https://www.kaggle.com/theoverman/the-spotify-hit-predictor-dataset) has 40k songs labeled "hit" or "flop" + [All the songs](https://www.kaggle.com/yamaerenay/spotify-dataset-19212020-160k-tracks) has 170k+ songs # What to do with it? + First, need to isolate a *behavior* (so we're not just analysing the acoustic shapes of albums, etc.) + This includes a skip, or starting on a specific song (i.e. the beginning of a session) + Second, there should be some interesting datapoint + Maybe this just has to do with *who* listens to *what*? Cultural grouping etc.