Import Spotify Extended Streaming History into ListenBrainz

listenbrainz

ListenBrainz / MusicBrainz is a service that tracks music listens (it had many other components). It can integrate with Spotify so that Spotify notifies it when a song is played, but that only “starts” with songs at that point in time (well it actually imports the last 30 days before then but).

Spotify however allows users to export the full (-ish, as I only see things since 2014) history as a set of JSONs. This is actually a very cool dataset as it includes even brief plays, so you can track skipped tracks and things like that.

But for our purposes here, I wanted to get those pre-2021 stats into ListenBrainz. There are a number of tools to do this, but currently all of them are missing a bit on the filter side (or are buggy? or I am buggy?), so this is my quick way to get those stats in.

ListenBrainz for myself after the import

Spotify Extended Streaming History #

This is a zip file you can request from Spotify. More details on the process and what the data is available from the Spotify website.

To get it, go to your Account -> Privacy -> Download your data -> Tick the box that says Extended streaming history. Do be aware that it may take 30 days to get it! You’ll get an email from Spotify eventually and you can flip back to this page then :D

Eventually, you’ll get a zip file that contains a bunch of JSON files. There’s actually quite a few cool tools or examples to analyze some of this data (I have not tested these):

But this is besides the point here (I should at some point finish my project that works in this space, but that’s neither here not here).

Of interest to us are the files called Streaming_History_Audio_<some_year(s)>_<number>.json. The number here is in the order, so it’s easy to either go backwards or forwards from here.

Filtering #

To import the data into ListenBrainz, we need to convert from the “custom” Spotify JSON to the jsonl expected by ListenBrainz. A tool called elbisaur has a pretty good convert and importer, but unfortunately right now its filter is not working.

So the idea here is going to be to use jq to filter out the json prior to passing them to elbisaur.

We want to filter out for four things:

  1. Tracks played for less than X, to remove any tracks we just briefly played. This is subjective of course, but we are working with a 30 seconds time for this to match the defaults.
  2. Tracks that are missing an artist or track name. I assume this may be tracks that Spotify no longer has perhaps?
  3. Tracks that you already imported into ListenBrainz. There’s not really an easy way to do this if you have overlapping history, but if you do not, it’s easy to just use a timestamp. This is what we’re doing here.


So, ensure you have jq installed, and prepare some filters. I used:

jq '[.[] | select(.master_metadata_track_name != null and .master_metadata_album_artist_name != null and .ms_played >= 30000)]' Streaming_History_Audio_2018-2019_2.json > filtered_2018_2019_2.json

This excludes any missing track names or artist names, and requires ms_played to be 30,000 (30 seconds).


If you need to also filter out by timestamp (I did require this for my most recent one), pick the first timestamp you see in ListenBrainz (e.g. click on Oldest in the listens history) and convert it to UTC (as ListenBrainz will use the local timestamp, but Spotify uses UTC). You can then add it pretty easily, for instance I did

jq '[.[]
  | select(
      .master_metadata_track_name != null and
      .master_metadata_album_artist_name != null and
      .ms_played >= 30000 and
      .ts < "2021-02-10T00:00:00Z"
  )]'   Streaming_History_Audio_2020-2022_5.json > filtered_2020_2021.json

You can see here .ts < as the important part, you can also add two filters with > and < to get a range.

You may also want to look into excluding Podcasts from your import if you did listen to podcasts on Spotify (I have not so did not explore this).

This gets us one or more jsons that we can pass to elbisaur to get jsonl.

Convert to JSONL #

For this, I used Deno to run elbisaur. With Deno, we will need to add specific permissions or we will be prompted for it. I chose to add most permissions, but leave the y/n on the final upload there.

It also needs environment variables configured in the folder you run it from to get the token and username for ListenBrainz. I believe it requires this even for parsing, not just for importing. So you need either need a .env file in the local folder formatted like:

LD_TOKEN=<token>
LD_USER=<your_username>

or to set the environment variable as part of your run command:

LD_TOKEN=<token> LD_USER=<your_username> deno run .... [commands are below]

The token can be found on your ListenBrainz settings page.


First, we should check that we can see the songs correctly with the preview mode:

deno run --allow-read --allow-env --allow-write jsr:@kellnerd/elbisaur parse --preview filtered_2020_2021.json

This may yield any errors if there are additional filters required that I didn’t need (not too sure if anything else can be missing)


Now to generate the real jsonl:

deno run --allow-read --allow-env --allow-write jsr:@kellnerd/elbisaur parse filtered_2020_2021.json

This will create a jsonl at filtered_2020_2021.json.jsonl, which is fair enough.

Upload to ListenBrainz #

We can again use elbisaur to do the import:

deno run --allow-read --allow-env --allow-write jsr:@kellnerd/elbisaur import filtered_2020_2021.json.jsonl

It will take a bit depending on how many songs you get, but it will do the import.

Results #

Now repeat the jq filter –> convert to jsonl –> import for each audio file you may have.

You will see your history update in ListenBrainz over the next couple of days. First you will see the Top Artists / Top Albums / Top Songs update, then the song count in the top of your stats page. And then eventually your name in any top 9 pages for artists.

There’s a little bit more on how stats work on the ListenBrainz docs but it does not specify anything for artists. I concidentally did this at the same time as the monthly rollover process (and do not need to do it again :D), so your experience may vary.