Outer Joins are Evil
I hate outer joins. Hate them. Hate writing them, hate tuning them, hate jumping through hoops to avoid them. They suck up processing power, they turn otherwise efficient queries into performance pigs, they look ugly, they’re hard to read, and they generally make life more difficult than it needs to be.
Contrary to previous plans, I won’t rehash what an outer join is. The internet being what it is, there are plenty of resources to explain the concept to you, most of which will do a better job than I could. But I would like to present some design techniques I’ve used to avoid outer joins. But before I get into those, I want to make this point crystal clear. Outer joins are evil and you should do whatever is within your power to avoid them. I have gone to some extreme lengths in my designs to avoid outer joins, and every single time the effort has paid off. I recommend you do the same.
Technique #1: Always Populate Column Values
Sometimes avoiding an outer join is as simple as making sure the column always has a value. Might seem straightforward, but I’ve seen many designs that don’t make use of this. Let’s assume you have a table that holds information about some pretty pictures that your users have uploaded into your system. One of the fields you want to track is the status of the picture – whether it has been approved for display (e.g. it’s not naughty). Your schema might look like this:
CREATE TABLE lk_picture_status
(id NUMBER NOT NULL PRIMARY KEY
,status_name VARCHAR2(20) NOT NULL
,create_ts DATE default sysdate NOT NULL
,update_ts DATE)
CREATE TABLE pretty_pictures
(id NUMBER NOT NULL PRIMARY KEY
,picture_status_id NUMBER NOT NULL REFERENCES lk_picture_status(id)
,create_ts DATE default sysdate NOT NULL
,update_ts DATE)
The goal is to ensure that picture_status_id is never null (done here by making the column NOT NULL), otherwise you will have to use an outer join in any queries that fetch the pictures based on status. This could be done in the server-side code that inserts the row or (more preferably) via a column default value or before insert trigger. The latter two have the additional benefit of automatically filling in the column’s value when the row is inserted regardless of what code performs the insert.
Technique #2: Avoid 1:1 Relationship Tables
Generally it’s considered bad practice to design tables with a 1:1 relationship. I’ve seen some variations on the following theme in various places:
CREATE TABLE users
(id NUMBER NOT NULL PRIMARY KEY
,name VARCHAR2(200) NOT NULL
,create_ts DATE default sysdate NOT NULL
,update_ts DATE)
CREATE TABLE user_details
(user_id NUMBER NOT NULL REFERENCES users(id)
,email VARCHAR2(200) NOT NULL
,city VARCHAR2(200)
,state VARCHAR2(20)
,zip VARCHAR2(20)
,create_ts DATE default sysdate NOT NULL
,update_ts DATE)
Where for every row in users there can be 0 or 1 row in user_details. Leaving behind the logic of why you would do that for now, this situation can easily lead to the use of outer joins if user_details does not contain a row for every row in users.
In truth, developers will usually make the the inserts into those tables part of the same transaction, and in most cases this will keep the tables sufficiently in sync. But every once in a while an exception/bug occurs and the child record never gets written. Frequently this become quite a headache to track down and fix because developers never stop to consider that the child record might not have been inserted. So keeping in mind my mantra that data will alway find a way to get corrupted, the best design is to avoid tables with a 1:1 relationship. This has the dual benefit of avoiding the possible data corruption issue while removing any need for an outer join. In the above example, there’s no reason why the relevant columns in the child table can’t be included in the parent table.
An Acceptable (if Unavoidable) Use
Sometimes the lengths required to remove an outer join isn’t worth the effort. Here’s a set of tables that illustrates that point:
CREATE TABLE admins
(id NUMBER NOT NULL PRIMARY KEY
,name VARCHAR2(200) NOT NULL
,is_active CHAR(1) default ‘y’ NOT NULL
,create_ts DATE default sysdate NOT NULL
,update_ts DATE)
CREATE TABLE pretty_pictures_2
(id NUMBER NOT NULL PRIMARY KEY
,was_banned CHAR(1) default ‘n’ NOT NULL
,banning_admin_id NUMBER REFERENCES admins(id)
,create_ts DATE default sysdate NOT NULL
,update_ts DATE)
So a row in pretty_pictures_2 can be banned by an admin, at which point the admin’s admins.id value is placed on the appropriate record in the pretty_pictures table. So if you wanted to report all pictures uploaded within the current day and show the banning admin (if it applicable) you would use:
SELECT pp.id, pp.message, pp.was_banned, nvl(a.name, ”)
FROM pretty_pictures_2 pp, admins a
WHERE pp.banning_admin_id (+) = a.id
AND trunc(pp.create_ts) = trunc(sysdate)
It’s kind of hard to avoid this situation cleanly. You could add a bogus admin in the admins table (e.g. admin_id = 0) that is interpreted to mean that no admin is associated with the record, and thus it’s not actually banned. Then you could safely set a default value on the column as described in Technique #1 above. This is undeniably hackish and probably not worth the effort. Similarly, there’s no meaningful way to adapt Technique #2 in this situation. So you’ll have to let the outer join stand. Luckily I’ve found that this happens infrequently, and when it does it’s on relatively simple queries that don’t get executed often enough to make it on the DBA’s list of naughty queries.
2 comments
Until there is column-level locking in Oracle, 1:1 tables are unavoidable to allow batch updates and user updates to happen at the same time, in your case for users and user_details.
But yes, outer joins are evil.
I would argue that if you have so many user-driven writes executing that it’s affecting batch updates you may have to consider a redesign anyway. The typical web application is about 85% to 95% reads, so it would be very unusual to have that many user-driven writes.
What I’m getting at is that in most cases leaning on the 1:1 table design in hopes that it will improve performance is little more than the work of an overzealous DBA trying to find the “ideal” data model. No comment on what this implies about my old friend NetComrade.