Music Analytics: A Decade of Listening Habits

Tools used:

  • Python
  • last.fm's API
  • Jupyter notebooks
  • libraries: numpy, pandas, matplotlib, csv, datetime
  • Stack Overflow

Context

If I'm awake, there's an excellent chance that I've got music playing. My love of listening to music—largely alternative, electronic, and indie—began in middle school, when I joined the BMG Music Club and spent my babysitting money on mail-order CDs. Ever since, music has played in the background of my life, though I haven't bought a CD since Bush Jr. was president.

In 2007, I joined the website last.fm, which lets you link your music apps to their site to track your listens (a "scrobble" in last.fm parlance) and also discover new artists similar to those you already listen to. Although my use of the site is much more passive than it used to be, last.fm has been silently tracking my music listening habits for over a decade. As of this writing, I have 131,226 recorded scrobbles.

The raw data is kindly made available through an API, so I thought it would be fun to play around with it and see what trends I could extract.

Favorites

Within a Python-based Jupyter notebook, I used last.fm's API to download my top artists, albums, and tracks to pandas dataframes. I checked the data to make sure there was nothing I was too embarrassed to share with the world. (All clear.) Then I plotted it using matplotlib to see what my favorites have been cumulatively between 2007 and 2017:

top_artists.png
top_albums.png
top_tracks.png

Some interesting things pop out to me from these plots:

  • If you asked me who my favorite band is, I would say Arcade Fire, and the Top Artists plot supports that. However, none of their albums even break the top 5. It's because I tend to get obsessed with a few tracks from each album rather than the album in full. I basically listened to the track "Creature Comfort" on loop for a day when it was released.
  • LCD Soundsystem made my #1 track and #1 album yet they're my #4 artist. That's because I think that Sound of Silver is a masterpiece and usually listen to it in full, while I favor certain songs from multiple albums by artists #1 through #3 .
  • I've barely listened to Snow Patrol, Incubus, and Anberlin since college (RPI '09) but I had them in heavy rotation back then, so they're still high up on the list.
  • I've seen 16 out of my top 20 favorite artists live; I love going to concerts. Sadly, the Gallagher brothers hate each other too much to ever get back together, so I'll never see Oasis live. Does Incubus even still tour...?

Timing

I was curious if there were any time-based trends with my listening habits, so I converted the UTC times of each scrobble to datetime objects, which I then parsed. I pivoted the data to count how many times I listened to each artist every year. Here are my top 20 artists again, now in table form:

Screen Shot 2018-06-26 at 10.52.01 PM.png

When I discover a new artist that I love, I become obsessed for a while. This is particularly apparent in the table with Caravan Palace, Mumford and Sons, Disclosure, and alt-J. Then my interest wanes, sometimes to the point where stop listening to them at all. Compare, for example, Disclosure and Anberlin. When I came across Disclosure's 2012 album Settle in 2013, I listened to them 855 (!) times in one year. In contrast, I discovered Anberlin in high school and listened to them a lot for the first couple years of scrobbling but have since lost interest.

I often have a visceral reaction to the bands I've listened to excessively over a short period of time because the music reminds me of that period of my life. For example, I was listening to LCD Soundsystem nonstop in the summer of 2012 while I was a JPL intern. But then bands like Oasis and Third Eye Blind are comfortable favorites from high school ("Semi-Charmed Life", anyone?) that never get old for me.

top5.png

This plot is interesting not just because you can visually see when I listened to my top 5 artists the most, but also that I listened to them all less starting in 2014. This was the year I started using Spotify. When I first joined last.fm, I had an IBM Thinkpad and was using a mix of Windows Media Player (barf) and last.fm radio. Then I got a Macbook in 2010 and used iTunes until I switched to Spotify in 2014.

All of a sudden, Spotify made it so easy to listen to discover new music that I started listening to lots of artists who I'd listen to fewer times. That explains the 2013-2014 spike:

unique_artists.png

Why the dip in 2017? Let's dig a little deeper. Here's a breakdown by year:

scrobbles_by_year.png

Why was 2017 the lowest year out of all the whole dataset? Two reasons: (1) last.fm had randomly disconnected from Spotify on my work laptop and it took me a while to realize it, and (2) my partner and I began dating. When we're together, we're often listening to music using his Spotify account. As a result, my listening in 2017 was underreported. 

Another observation: my grad school years are pretty obvious! I started in August 2009 and defended in August 2014. During my first couple years, I was taking classes and generally didn't listen to music while doing homework. As my classes eased up starting in 2011-2012 and I began spending more time on research, my listening picked up.

Speaking of work, let's see how my listening changes throughout the week:

scrobbles_by_day.png

The Monday through Friday work week is clear, and I interpret the low of Friday as me being less likely to work late. On weekends I'm often out and about, but if I'm home there's usually music playing.

How about by month? I didn't expect this to be interesting but it was. Let's see:

scrobbles_by_month.png

I'm not sure why January is so high. Maybe after the holidays I'm really excited to get back to work? And then by February my enthusiasm tapers? Hmm...

March is oddly low. I was stumped until I recalled that there is a professional meeting for my field every March. I've attended it 6 times between 2007 and 2017, and during that time I don't listen to music for a full week. It's not surprising that when 54% of Marches have no scrobbles for 25% of the month, this effect would be noticeable. Cool!

Random

Playing around with the dataset a bit more, I made the following observations:

  • I have listened to 4,041 unique artists, 8,235 unique albums, and 21,856 unique tracks.
  • I have listened to 1,634 artists only once.
  • I have listened to 796 artists more than 10 times, 239 artists more than 100 times, and 21 artists more than 1,000 times.
  • I have never listened to Miley Cyrus.

Next Steps

I have more questions I'd like to answer that will require some additional work:

  • How do my habits change across the day? I expect it to be obvious that I'm much more of a night person than morning person. However, this one is tricky because I've moved and changed time zones 4 times during the period of data collection, so I'll have to correct the data based on when I lived where.
  • What countries are the bands I listen to from? I'll need to merge this dataset with another one to add geographic information and then incorporate a mapping library, maybe GDAL. I'm guessing the United States is #1 and England is #2 because I love British music.
  • What countries have I never heard any music from? Most, no doubt.
  • What genres do I usually listen to? I know the answer in general, but last.fm lets users add tags to artists and it would be interesting to see what the data says.