Sets – When PK’s Are Too Much

One design problem that trailed around behind me for years was how to handle the situation where the relationship between two elements doesn’t nicely fit the traditional role of a primary key/foreign key relationship. For example, let’s say you created a cool product like Scrabulous and you want to model the players participating in a game. It would be straightforward to model it as:


CREATE TABLE games
(id NUMBER PRIMARY KEY
,status_id NUMBER NOT NULL REFERENCES lk_status(id)
,create_ts DATE default SYSDATE NOT NULL
,update_ts DATE)


CREATE TABLE game_players
(game_id NUMBER REFERENCES games(id)
,user_id NUMBER REFERENCES users(id)
,create_ts DATE default SYSDATE NOT NULL
,update_ts DATE)

Nothing tricky here – a design like this will do the trick 99% of the time. There will be times when you have to look up a game by its status, or find all games created in the past 30 days (e.g. for reporting purposes), or find all games with a particular user_id in it. A “game” and a “game_player” are concrete, first-class entities in your system.

The optimal design is a little less clear when one part of the relationship is not truly an meaningful, independent entity. For example, let’s say you are building a sports application and need a way to model a matchup of 2 teams. Let me walk you through the evolution of the design. The first attempt might be:


CREATE TABLE matchups
(id NUMBER NOT NULL PRIMARY KEY
,home_team_id NUMBER NOT NULL REFERENCES teams(id)
,away_team_id NUMBER NOT NULL REFERNCES teams(id)
,matchup_date DATE NOT NULL
,create_ts DATE default SYSDATE NOT NULL
,update_ts DATE)

This design sucks. If want to find all the matchups for a particular team you’d have to inspect both the home_team_id column and the away_team_id column. Queries using an OR aren’t as bad as outer joins, but they aren’t great and should be avoided when possible. So, next iteration. Instead of trying to cram it all into 1 row, you decide to explode it so that a single matchup is described by 2 rows such as this:


CREATE TABLE matchups
(id NUMBER NOT NULL PRIMARY KEY
,team_id NUMBER NOT NULL REFERENCES teams(id)
,is_home_team CHAR(1) NOT NULL
,matchup_date DATE NOT NULL
,create_ts DATE default SYSDATE NOT NULL
,update_ts DATE)

A little better, but something’s not quite right. A single matchup between 2 teams now produces 2 rows, but the rows aren’t linked in any way. In other words, it’s near impossible to find both sides of a matchup because nothing (other than possibly matchup_date) associates them. So in the next iteration we introduce the concept of a “matchup set”.


CREATE TABLE matchups
(id NUMBER NOT NULL PRIMARY KEY
,matchup_set NUMBER NOT NULL
,team_id NUMBER NOT NULL REFERENCES teams(id)
,is_home_team CHAR(1) NOT NULL
,matchup_date DATE NOT NULL
,create_ts DATE default SYSDATE NOT NULL
,update_ts DATE)

So we can now identify both teams in a matchup because their matchup_set are the same:


SELECT m.id, m.is_home_team
FROM matchups m
WHERE m.matchup_set = 12345

This would return 2 rows. At this point, your refactoring senses start tingling. Every matchup set would implicitly have the same MATCHUP_DATE. Wouldn’t it be better to have a parent table called…well, not sure really what to call it. Guess you’d have to change the MATCHUPS table to be MATCHUP_DETAILS, and then create a new MATCHUPS table. Extracting out MATCHUP_DATE you are left with this design:


CREATE TABLE matchups
(id NUMBER NOT NULL PRIMARY KEY
,matchup_date DATE NOT NULL
,create_ts DATE default SYSDATE NOT NULL
,update_ts DATE)


CREATE TABLE matchup_details
(matchup_id NUMBER NOT NULL REFERENCES matchups(id)
,team_id NUMBER NOT NULL REFERENCES teams(id)
,is_home_team CHAR(1) NOT NULL
,create_ts DATE default SYSDATE NOT NULL
,update_ts DATE)

Doesn’t seem like a big win though. Now we have 2 tables, which creates an extra join in most situations:


SELECT m.id, m.matchup_date, md.team_id, md.is_home_team
FROM matchups m, matchup_details md
WHERE m.id = 12345
AND m.id = md.matchup_id

Now here’s the big IF: if you will ever need to reference a matchup in another table, the last iteration is the correct design. But, again, in that situation a matchup implicitly represents a first-class entity. For those situations where the matchup is a loose association of teams, though, you should consider the benefits of the previous iteration.

In a sense, this pattern inverts the last iteration’s design. Instead of teams belonging in a matchup, the matchup is a property of 2 different teams. As usual, this flexibility comes with benefits and costs. When used correctly you gain a number of things:

  1. fewer joins – there’s no parent table to have to join to all the time
  2. fewer inserts – again, no parent table to have to maintain
  3. #1 and #2 combine to reduce the possibility for your data to become corrupt
  4. there’s less meaningless data in the system – the parent MATCHUP table is mostly cruft

For this you trade the ability to use the matchup set as a foreign key. In some situations, that’s not only perfectly acceptable, it’s preferable.

Be Sociable, Share!
0 comments

Leave a Reply