Dangerous Style
Slow week, so thought I’d vent. The top 3 things that seem like good ideas at the time:
- Using IDs in WHERE clauses when a CHAR or VARCHAR column could be used. This is another of those pesky “proper design” practices. As a developer, you should always assume that a row’s ID can change at any time. This is most tempting when joining to lookup tables, where you think the IDs will never change. That might be true in the development environment, but maybe the IDs turn out to be different in the production environment for any number of reasons. All your SQL breaks. It’s always safer to use a unique identifying string value in joins where possible. And if your lookup tables weren’t designed with a unique key on a string column, you should create one. It’s much much safer to do:
SELECT p.name FROM people p, groups g WHERE p.group_id = g.id AND g.name = ‘friends’
than it is to do:
SELECT p.name FROM people p WHERE p.group_id = 1
At that rate, why have a lookup tables there in the first place? (No, I’m not really suggesting this, although I have worked on systems where this was commonplace).
- Misuse/Overuse of IN clauses. When I’m writing ad-hoc queries I tend to throw around the IN clause liberally because it’s usually quicker than a join. In production code, you should severely restrict your use of it. For one, it’s not the most effiecient clause for the database to execute. For another, it’s error prone.Here’s the wrong way to use an IN clause:
SELECT p.name FROM people p WHERE p.group_id in (SELECT id FROM groups WHERE strangers = ‘n’)
I guess some developers have the impression that it’s easier for the DBMS to optimize that because it looks like 2 separate queries squashed together. In most non-trivial cases it’s not easier to optimize. Here’s the right way to use an IN clause:
SELECT p.name FROM people p, groups g WHERE p.group_id = g.id AND g.strangers in (‘n’, ‘notsure’)
In most correct uses, it could be conceptually replaced with one or more OR clauses. So lay off the IN clauses please.
- SELECT * FROM …. queries. There is no reason to ever SELECT * from any table. No reason. It’s a lazy, horrible practice. Spend another 3 minutes and protect yourself from the bugs that will eventually arise when someone adds or modifies a column to that table. If you’re really too lazy to type out the column names, you can do something like this:
SELECT column_name FROM user_tab_columns WHERE table_name = ‘MESSAGES’
The table USER_TAB_COLUMNS is part of Oracle’s data dictionary, which I’m hoping to cover more in future posts. That’s just one way to be lazy and productive at the same time.
I found your ideas intriguing, and subscribed to your newsletter… yet I can’t agree with you about using descriptions instead of id’s.
I find that far too often, the descriptions are required to change by the customers. (We don’t like ‘friends’, could you change that to ‘companions’, we don’t know that they’re actually friends…. Oh, you know that group we are calling ‘companions’? We’d like to put on a more friendly face. Could you change that to ‘friends’…. ad nauseum….)
Now I agree that using column id’s is often a poor idea, so I generally choose to use a unique integer or very occasionally a GUID.
If you don’t mind, why would you choose a descriptive varchar instead?
Jeff,
First, I just wanted to clarify that when I said “a row’s ID” above, I didn’t mean literally oracles “rowid” meta-value. I mean more of what you said later in your post – unique integer numbers. It is this value that I consider to be mutable.
That being said, I see your point. I’m assuming what you’re saying is that the customer’s request to change “friends” to “companions” would cause a change in some lookup table where that text string is associated with a unique identifier. In a proper design sense, it should be generally expected that when a fundamental identifier in the system changes anything that depends on that will have to change as well. Without that concept there can be no appropriate, functional design possible.
Internally I would (and have) split this representation into 2 fields – an “internal-use only” field and a “display” field. So all queries would use the internal-use only value, and this would be immutable. In a sense, it doesn’t matter what this value is because only the application knows its value. The display value associated with it is subject to change at the client’s wish. So, you would do something like this:
— returns ‘Companion’ and ‘Billy Bob’
SELECT rt.display_name, u.name
FROM lk_relationship_types, users u
WHERE u.id = 12345
AND u.relationship_type_id = rt.id
AND rt.internal_name = ‘friend’
If the client wants this to display ‘Friend’ instead of ‘Companion’ you can update the display_name column but you wouldn’t have to change the query at all.
Hope this clears that up. Thanks for the comment.
Sorry, one last comment. As for why I would use a descriptive varchar instead – it’s for readability and maintainability. Reading “WHERE id = 7” is never descriptive enough. That’s why we use words to speak and not numbers 🙂
Secondly, if someone ever re-installs the database (as frequently happens in the applications I’ve worked on) and all of a sudden the ID of 7 is associated with “relative” instead of “companion”, your application breaks.
I can see that. It makes sense in a lot of cases. I’m revamping a database, not of my design, with almost a hundred lookup tables, all of which I have to reference by id numbers. It’s impossible to keep them straight without looking up the actual variables continuously, so I can definitely see your point.
I resist the idea mainly because of a past experience with a database where descriptions were used very poorly as identifiers. The database keeps track of thousands of servers/switches/racks in several data centers. The identifiers they decided on (and I fought against) was a two character representation of the corporation who made the equipment… but with a twist. HP is the code for HP of course. It’s also the code for Compaq and DEC because they own them. SU is the code for Sun, and TS is the code for Toshiba… however Sun bought Toshiba’s storage systems, and the equipment is all tagged with the TS tag both physically as a bar code, and in the database. Many other mergers/splits have happened since the database was created.
The end result being that these descriptive codes now don’t represent the manufacturer, they represent the parent company of the manufacturer as of 2002… Far more confusing than a numerical id.
I’ll have to try out your strategy though, because based on the mess that I am currently working with, the more I think about it, the more I am intrigued by it.
Jeff,
I see your point. Obviously this is a case where the original design drifted over time but the data model didn’t evolve with it. If you are able to make changes to the database (or request that changes be made), I think my 1st suggestion would work well, but it would potentially involve changing a lot of queries to start with. After that you’d see some mental savings as you can maintain the system without going batty, but the fact of the matter is that no one’s invented a way to do cleanup without getting a little dirty. Let me know how it goes, I’d be interested to hear how you approach and solve this.
-Vinny
I’ll let you know how it goes once I start making some architectural changes.
The inventory database, is thankfully out of my hands, as I left that company last year; but my current database (with the hundreds of lookup tables) I thankfully have complete control over. The only constraint I have is time and technical shortcomings, as I have three remote sites that create asynchronous changes to the database using *wince* Access databases that have to be reconciled back to the main SQL 7.0 database as well as my direct users. So any major structural changes have to be pushed out simultaneously to three different sites, and my scripts reconciling changes have to be updated at the same time as well.
It would be easier if the software was more up to date, but I’ve given up on that part of the struggle.
Jeff,
Sounds tricky. Have you done the simultaneous update before? Any tricks/tips you can share?
In general it sounds like you have a nice hackable setup that gives you some room to tinker. Would appear to be very satisfying in some ways.
-Vinny
It is pretty satisfying in some ways. However my plan was to migrate them to a web based fronted to the database, allowing remote sites to access the data real-time. Unfortunately the resources just aren’t available to this project, so that cannot happen.
The main downside is that the tools are a bit kludgey and relatively archaic, and while it is something that I had a lot of experience in, it wasn’t what I had been led to believe I would be doing when I was hired. Thankfully, between my USB key, and ssh access to my personal server, I’m still able to use a decent selection of my regular tools.
I’ve done the simultaneous backup thing quite a few times before, and started to write out a couple of short tips… Six paragraphs later, I figure I should probably get back to work for a bit. I’ll finish it up later today, and send it over if you’re interested.
Jeff,
Yes, I’m very interested. Love to read about it if you have the opportunity to send it over to me. You can email me at vdibart@nodroidsallowed.com or if you post it on a blog of your own just send me the link.
BTW, I’m with you on the USB key and SSH. Must haves for people like us.
-Vinny