Let's Talk
Do you have a project that would benefit from a world-class team of data analysts, pop culture writers, and marketing strategists? We’d love to hear from you.
Get in TouchSpotify’s former "Data Alchemist" chats with us about the reflections on music algorithms that inspired his new book.
I first became aware of Glenn McDonald around the turn of the millennium. The writer, programmer, and data analyst had a recurring column on his website called The War Against Silence, with long ambitious pieces about the stacks of CDs he picked each week. In the years after, I knew him as the proprietor of Needlebase, a site that performed a data analysis of music critic year-end polls, specifically The Village Voice’s Pazz & Jop poll.
These days, McDonald is known as Spotify’s former “Data Alchemist,” having helped create many of the algorithms that power the platform’s discovery architecture. As part of his work at Spotify, he created the site Every Noise at Once, whose mission he describes as “an ongoing attempt at an algorithmically-generated, readability-adjusted scatter-plot of the musical genre-space.” It contains thousands of hyper-specific genres sourced from around the globe—6,291 of them as of November 2023.
In December of 2023, McDonald was laid off from Spotify. While Every Noise had to cease its real-time updates, McDonald knew there was still much for people to learn about music streaming and the data that powers it. In June, he published a new book called You Have Not Yet Heard Your Favourite Song: How Streaming Changes Music. It’s a must-read for anyone interested in the streaming economy, the history of music formats, the relationship between data and discovery, and the role of community in the creation of genre. We spoke to him about these ideas and more.
I read a lot of writing about streaming technology and music. I felt over and over that somebody who knows how this actually works and isn’t afraid of it should write a book like this. Eventually, I was like, Oh, wait, damn it—it’s me. I’m the one who has to write this book.
Ultimately, the goal for me, and the goal is for music [in general], is community building. That’s what a genre is—a community. Fans are a community. Artists form communities. A lot of listening is not just what I’m hearing at the moment, but how it helps me relate to the bands and to other fans, or to what we’re trying to accomplish in the world. So that was a thing I’m trying to orbit around, the thing that connects all those pieces. You can’t just be a listener—I mean, you can, but in a moral sense, you can’t. Because what you do has effects, and you have to enter into this world.
The content matters. Don’t just look at the metrics and say, “Listening time didn’t go down,” or “Some satisfaction proxy went up by a 10th of a percent.” It’s music. I want an explanation. If the metrics changed, why’d they change? We gave people different music than we gave them before, so how is it different? Do we give more popular music, less popular music, shorter songs, longer songs, deeper cuts?
Some engineering problems are not explainable in that way. They’re too complicated and there are too many variables, but recommending songs to people is not one of them. You’re going to make a playlist for people. It’s got 50 songs on it. They’re going to listen to five of them. The listeners make one decision every three minutes. That’s not very much information.
The question of what is “normal” is hard. [In terms of earnings,] from a current artist’s point of view, it’s most compelling to point at the highest peak in the past as normal. So like, 1999—that was normal. But the CD era was massively inflated in money terms by controlled high prices. It makes more sense to ask, “How big is the whole industry?” The only authoritative thing I could look at was the RIAA database, which is the big graph of what’s happening.
Here’s what happened: We had LPs. We had CDs. Napster killed CDs, then came downloads. But [downloads] weren’t really fixing it. Then streaming came along and the [numbers went] back up. I could look at how long it took for CDs to build to their peak, and how long streaming has been going on. Growth is similar for the same number of years, and we haven’t had as many years in streaming as we had in CDs. As best I can tell, it’s reasonable to think that we’re going to get back to the peak, or approximately.
I think that’s fair. Of course, it’s all tied into everything else that’s going on in the world. Were LPs good for psychedelic trance? We don’t know because they didn’t exist in the same era. Formats make a relatively small difference, compared to human tastes, but they clearly matter.
Telling people that they’re going to get paid after 30 seconds definitely changes what people do. All these things make a difference. They’re mostly like gentle forces, nudging the trajectory a little bit. It’s not like nobody makes classical music anymore.
My attachment is sort of the same, but it’s moved up in levels of granularity. When all I had was a few records, then it was very much one record at a time—how many times can I listen to Permanent Waves by Rush and memorize Geddy Lee’s bass pedals? As I went on as a listener, I moved more up to the artist level. I was just as much a fan of Low later as I was of Boston or Rush when I was 16. Now, it’s become more of a genre attachment. There are individual gothic symphonic metal bands that I’m super fond of, and when their new album comes out I’ll listen to it multiple times. But I’m much more attached to the gothic symphonic metal scene as a whole.
I thought about genre before streaming, but it became more central to my thinking because data made it possible to understand it better, and because streaming availability made it more reasonable to explore.
Most people’s experience of genre before streaming was record store bins or radio formats, right? There was no reason to think about there being more than 12 of them. Your record store didn’t have a bachata bin—it had World Music, if you’re lucky. I came to think of genre as a community. It’s not a list of musical criteria—it’s all people.
Working on algorithms, I basically had two sets of data to work with: cultural information about what people listen to, and all the analytical information about the characteristics of music, the machine learning that’s trying to model acoustic characteristics: how “dark” or “bright” it feels, or how “danceable” it is. Everything I tried to do in terms of recommendation or categorization worked well with only the cultural information, and was crap with only the audio information. But it generally worked a little better if you combined them.
That was the point I thought, this makes sense: music is organized by how people listen to it. Those decisions are often based on sonic characteristics, but they’re much richer than our current computational framework for understanding the audio.
We can analyze two hip-hop songs and think, these songs are the same. They share characteristics. And then a human puts them on, and is like, “that one is in Bulgarian. I don’t speak Bulgarian. This one’s in Romanian. I speak Romanian.” They have a completely different experience of these two songs because one is nonsense and the other is, you know, revelatory or offensive or some combination of them. We might eventually get to the point where we can programmatically analyze everything—model all the dimensions of listening, the listening equivalent of a large language model. It seems like it’s possible that we’ll get there.
So that sent me into [the idea that] almost all the most interesting things that you can find out from data have to do with groups of people and subsets of listening, sometimes subsets of time or place. The same techniques for, “What are the gothic symphonic metal bands that fans of gothic symphonic metal bands most like?” can be used to be used for, “What’s most distinctively Filipino about Filipino music?”
I’m in week two of my new job, which is in AI, which is exactly that. We have these tools. What the hell do we do with them? Chatbots are interesting, but we’re at the point where we made a museum of all human knowledge, and we’re standing outside on the sidewalk asking the security guard questions about what’s inside it.
What do we do with these tools in a practical sense? How can we make them reliable? How can we make them accomplish larger things that we need done? What is the goal? All this technology that gets between people either helps them come together and connect, or it isolates them. And connecting them is the right answer. To me, it’s very clear.
Yeah, it’s all system design. And it’s the same questions, whether it’s a streaming service or the transportation infrastructure for a city. All those things pertain to how human culture works, and they can either help it towards something, or they can undermine things.
No. Multiple times a day, I hear songs that affect me the same way. I don’t feel like music has gotten any less magical. I hear new things and think, “That’s fantastic.” I basically never stopped loving anything I ever loved. So yes, “Magic Power”—I can put it on right now and enjoy it the same way I ever did.
Do you have a project that would benefit from a world-class team of data analysts, pop culture writers, and marketing strategists? We’d love to hear from you.
Get in Touch