This is something that comes up at least once for engineers working with web applications. There are several ways to achieve it, with different advantages and disadvantages.
This article is focused on storing tags in a SQL database; it does not suggest this is the best or only way to do it. In fact, there are many other options available with systems like Redis that do a fantastic job of this. However, if your stuck in SQL, feel it's unnecessary to use another tool or simply have existing solutions similar to those below this is may be helpful as a cheatsheet.
I will aim to use ANSI SQL so it remains as portable as possible and will focus on the implementation and understanding rather than the abusing specific features of different SQL engines that may make it more efficient.
Let's Get Relational
Most people will recommend that if your using a relational language like SQL you should be storing the data in a relational fashion, makes sense right? If you've done this before the follow code should make sense and may even look very familiar:
CREATE TABLE artist (artist_id INTEGER,name VARCHAR(64),PRIMARY KEY (artist_id));CREATE TABLE artist_tag (artist_id INTEGER,tag VARCHAR(16),PRIMARY KEY (artist_id, tag));
You may have a third table in the middle that joins unique tags to artists, and it's not that much of a leap beyond what we have - I will just keep it simple for now.
Tags in an OR relationship; that is to fetch any artist where they have at least one of the following tags is quite simple:
-- Find all the artists from any of the 3 decades.SELECT DISTINCT artist_id, nameFROM artistJOIN artist_tag USING (artist_id)WHERE tag IN ('60s', '70s', '80s’);
Most people get stuck with the AND relationship; that is where you want the artist that contains all of the tags in a given set. Theres several says to do this - probably the easiest is to add an additional HAVING clause:
-- We only want artists that are british AND rock.SELECT artist_id, nameFROM artistJOIN artist_tag USING (artist_id)WHERE tag IN ('british', 'rock')GROUP BY artist_idHAVING count(*) = 2;
The count(*) = 2 is more flexible than you think. It also allows you to use specify how many of the tags need to match, so you could say any 3 of the 5 tags:
WHERE tag IN ('british', 'rock', '60s', 'pop', 'alternative')HAVING count(*) >= 3
If your tags are not unique (as in these examples) you simply have to specify tags as DISTINCT:
WHERE tag IN ('british', 'rock')HAVING count(DISTINCT tag) = 2
A less desirable way is to use a JOIN for each of the tags. I say less desirable because this can lead to a more complicated SQL query that is tricky to build. However, if you have a lot of tags to intersect and you know at least one tag is very rare the query planner might be able to do this more efficiently than the method above.
SELECT artist_id, nameFROM artistJOIN artist_tag AS tag1 USING (artist_id)JOIN artist_tag AS tag2 USING (artist_id)WHERE tag1 = 'british' AND tag2 = 'rock’;