Smarter Playlists with SQL

Posted on Sun, 19 Oct 2014

Tags: music, coding, sql

I recently discovered that my music player (Clementine) stores its metadata in a SQLite database.

The first thing I did once I knew this was to do a bunch of queries which summarised the data in interesting ways. My favourite so far is a count of albums released on each year, as represented in this graph:

[Graph of albums per year]

However that got boring fairly quickly, and I realised I could use the database to generate playlists with greater control than Clementine's built in "smart playlists" allow.

The smart playlists allow you to pick a series of search terms to filter tracks by, and to choose whether to join them all together with AND or OR.

The most immediately obvious gain is that in SQL you can create more nuanced combinations of the logical oparators, for example:

genre = 'Blues' AND (rating >= 4 OR playcount > 3)

The next thing I wanted to be able to do is to make queries based on information about artists or albums. This wasn't immediately solved by having access to the database, since everything is stored in one big "songs" table. So I created some text-substitution rules, so that I could write some nice easy pseudo-SQL and have it converted into some SQL which would do what I wanted. Here's the substitution rules:

= none = -1
nanoseconds('hh:mm:ss') ((strftime('%s', 'hh:mm:ss') - 946684800) * 1000000000)
unixtime('time', 'offset') strftime('%s', 'time', 'offset')
colA.each(colB) < value (select count(*) from songs S where S.colA = song.colA and not (S.colB < value)) = 0
colA.sum(colB) < value (select sum(colB) from songs S where S.colA = song.colA) < value
colA.count(colB) < value (select count(distinct colB) from songs S where S.colA = song.colA) < value

Here's an example of this in action:

genre like '%reggae%'
and artist.count(title) > 20
and artist+album.each(length) > nanoseconds('00:03:30')
and lastplayed < unixtime('now', '-30 days')

This code is fairlly readable. It states that we want to return all tracks which meet the following conditions:

Once this has been converted to proper SQL it looks like this:

select * from songs song
where unavailable = 0
and genre like '%reggae%'
and (select count(distinct title) from songs S
where unavailable = 0
and S.artist = song.artist
and S.title != -1 and S.title != '') > 20
and (select count(*) from songs S
where unavailable = 0
and S.artist = song.artist
and S.album = song.album
and not (S.length > ((strftime('%s', '00:03:30') - 946684800) * 1000000000)
and S.length != -1 and S.length != '')) = 0
and lastplayed < strftime('%s', 'now', '-30 days')
order by artist, year, album, track, title, filename

When this query is run on my music collection, the result is somethingn like this:

titlealbumartist trackyearlength
/ 10^9
Hold OnSay What You're ThinkingKatchafire 72007235-1
Love LetterSay What You're ThinkingKatchafire 8200721401406417635
Ultra MusicSay What You're ThinkingKatchafire 920073361406416419
Meant To BeSay What You're ThinkingKatchafire 102007316-1
Power Two Dub To PowerPassage To IndicaZion Train 41993549-1
Sending Out A Positive MessagePassage To IndicaZion Train 51993525-1

Great! That's as far as I've got, but the next step will be to convert the query results into a playlist file which can be imported into Clementine.