Archive for the 'beginner' Category


On Digital Cameras and Wheels

We recently bought my mother a digital photo frame for her birthday. The product itself was really nice – connect via USB, WiFi, flash cards, etc. Shows pictures, plays movies, plays music. Perfect gift for a tech-savvy mom who has a growing family.

But the real star of the show was actually Picasa. If you don’t know, recent versions of Picasa will actually scan faces in your photos and present them to you so you can provide names. It does the hard work, and you can coach it along by accepting its suggestions or providing a correction. It then indexes your images so that you can find all images with Aunt Sally, or all images with your brother *and* Aunt Sally.

With this it became trivial to preload my mom’s digital photo frame with family photos. Go through the index for reach family member and copy over some memorable shots. Easy and amazing.

I know it’s not new, and I know Google didn’t do it alone, but this seems to me to be a major milestone for the digital era. Wheels were important, and a lot of things can be done with a wheel, but history truly took a turn when someone strapped a motor to 4 of them and made a car. I feel the same way about digital cameras now. Important, but the dawn of facial recognition on the desktop makes this something else. I can only imagine what comes next. I’m sure someone out there has been spending a lot of time on exactly that. I can’t wait to see it.

0 comments

The Double-Decker Train Conductor Problem

One of the things I love about being a software developer is the fractal nature of our work. When we design a system we are almost always taking some piece of the universe and attempting to deconstruct it and model it so that it can run inside a computer. Examples of good (or bad) design are all around us, and our work demands that we draw on these examples to create a working piece of software. And software itself is nothing more than a bunch of bits and registers and some electricity that’s pretending to be more than the sum of its parts.

So I found myself reading Coders at Work on the 8:06 train the other day. I don’t usually catch the 8:06. The 8:06 is a double-decker train. And watching the conductor come through to collect our tickets I realized he represented a real-world example of a mutex.

This day there was only 1 conductor for both levels of the double-decker. It dawned on me that it would be very easy for someone to avoid having to pay by hanging out in the upper level and waiting for the conductor to collect the tickets from the lower level, then sneaking down to the lower level while the conductor moves to the upper level.

The lone conductor represented a flawed algorithm. There was no lock on the resource (exit door = I/O stream?). Adding another conductor could solve the leakage problem and lock the resource. But that would limit (or serialize) the free flow of passengers to and from the car.

I could probably go on exaggerating this example for while but I think you probably get the point.

0 comments

In Praise of DigiCert

As I’ve mentioned before, if you develop web sites for a living and haven’t read High Performance Web Sites yet you should be ashamed of yourself. The book’s title unfortunately includes the words “Front-End Engineers” in it, which will cause it to be tuned out by many back-end developers. That’s a mistake on their part. The book does contain information on best practices to improve the experience of a visitor to your site, but many of these solutions require the active participation of backend developers. Other solutions are just important for backend developers to be aware of.

Around the same time the book was released the fellows at Yahoo released the Yahoo Y Slow Plugin for Firefox. It requires the Firebug plugin, which all serious web developers should have installed anyway. The plugin will give you a grade on your compliance with the rules – 0 to 100, just like grade school.

My goal is to have each page in my site score at least 90 in the Y Slow rankings (again, just like grade school). This isn’t terribly hard to do if you’re disciplined. I run a Y Slow check on my pages infrequently to verify that I’m maintaining that goal. So I was a little ticked to see the home page of WhizKidSports.com take a hit when I decided to show the DigiCert badge I purchased (see related post here).

The issue was that 2 images included by DigiCert’s JavaScript. Y Slow was complaining that neither had a far futures expires header or ETags configured. That left my score south of 90, so I decided I’d fire off an email to DigiCert customer support asking if there was any way I could convince them to fix it on their side. I wasn’t expecting much, but figured I should give it a shot anyway. That was at 1am Sunday morning.

Around 11am that same morning I got a response from the CTO of DigiCert, Paul Tiemann. Cool fact #1 – the CTO of DigiCert is scanning customer service emails at 11am on Sunday. Seriously.

He profusely thanked me for noticing this and suggesting it to them. Cool fact #2 – the CTO of DigiCert was willing accept suggestions for improving their service from one of their clients. Seriously.

He got it immediately. As he pointed out, following Y Slow rules not only help visitors to my site, it also reduced bandwidth costs for DigiCert. So he had reconfigured the servers to address the issue. Cool fact #3 – the CTO of DigiCert is still close enough to technology to know how to configure ETags and expires headers on the production servers. Seriously.

I told him that I ran the site back through Y Slow and the news was good. I was back above a grade of 90. And, thanks to this tremendous example of a good business run by good people, I’m a proud DigiCert customer for life.

0 comments

Adventures in SSL – Part I: Shopping Around

I wanted to do a couple of smaller posts around my efforts to obtain and make effective use of a secure certificate for WhizKidSports.com. The smaller posts will let me expand on some of the finer points where those familiar with the process might be able to give feedback.

The first task was to select an SSL issuer. I narrowed my choices down to 2 – GoDaddy.com and InstantSSL. I was leaning towards InstantSSL until I found a chart that shows the SSL issuers for Y Combinator companies. This had some value to me because I figured these companies are generally at a similar place as my company in terms of size and technical requirements. Strangely, after seeing the adoption rates of Godaddy and Comodo (who runs instantssl.com) being two of the top ones, I decided to go with DigiCert anyway.

In terms of GoDaddy, I generally just don’t think too highly of them. I use them for domain registration, but otherwise I tend not to trust them. They’re a little spammy, and I’ve read articles and blog posts over the years with people who have gotten the shaft because of their policies and practices. Few of these articles tend to be flattering. Also, they have a reputation for bargain basement prices and a ton of questionably valuable products. This is something of the antithesis of what I want people to think when they see a secure certificate on my site.

In terms of Comodo, I found the array of products to be a red flag. I was looking at the InstantSSL product, which seemed to suit my needs. The price was reasonable. But something nagged at me. The only differences that I could detect between this product and the InstantSSL Pro product (which is $25 more per year) is telephone support and a larger warranty. Honestly, I don’t expect to need either, but the point was that I also don’t tend to trust companies who invent arbitrary reasons to justify price differences between very similar products. The other research I turned up was good but not incredible, so I didn’t feel they really closed the deal on my business.

And I know this doesn’t have even close to anything to do with the quality of the product, but both GoDaddy and Comodo suffer from psychotic web page design syndrome (that’s a topic for another post). In short, I’ve learned that a company’s home page is usually the best indicator of the soul of that company. Call it crazy.

Whatever the case, I finally decided on SSL Plus certificate from DigiCert. Maybe a little more expensive, but still reasonable. The reviews I found were glowing. And once I saw their instructions for installing the certs on all major web servers – including nginx, I was sold. After I went through the typical purchase flow a real human contacted me for some documents to verify my ownership of the domain. As soon I got them what they needed they issued the certificate. It all went incredibly smoothly and professionally. They even had a cool little wizard that generated the appropriate OpenSSL command to run on the command line. Not essential but nice.

So far so good with DigiCert. Next up I’ll discuss installation, which hit a few tiny snags but was also pretty painless.

(See Part II of this series here)

0 comments

One Flag to Rule Them All

So right now I’m looking at a table that has at least 3 different columns that control whether the particular row is displayed on the front end. In some cases that’s unavoidable, but it has to be kept in check.

Maybe you can tell me what the difference is between the intent of these columns: status (e.g. pending, active, canceled) and should_display (0 or 1). In addition to that, there’s one part of the code that will ignore a record if one of the FK columns is null but will consider it if it’s non null.

This is madness. I now have to piece together which columns are significant to which consumers of the data. And then I have to figure out the magical combination of values to make the row appear on the front end. This leads me to some quick rules for database flags:

  • Limit the number of display flags to as few as possible. I usually use a is_active or display_order column to determine whether the row should be retrieved. There will be cases where the row should be retrieved by one consumer and not another, but there should never be more than one column that does almost the same thing.
  • Use descriptive column names. The ones above are too general. is_active tells me exactly what I need to know.
  • You can use a nullable timestamp column to do both boolean checks and date-triggered checks. In other words, if the column is null it means the column is still valid. If it’s not null you have to check it against the current timestamp. This saves a duplicated column and is fairly easy to get across.
1 comments

Magic Button Syndrome

If there’s one concept I’ve fought my entire career it’s that there can be, or even should be, a way to make everything work “automagically”, a term the afflicted developers use lovingly. I recently christened this the “Magic Button Syndrome”.

Usually a bunch of fairly smart developers sit in a room and start dreaming of how a system might work. “We have to make sure we can easily modify the configuration,” one might say. “We should have a means to generate the configuration based on some other configuration file,” another might respond. “Let’s use annotations to make sure that the configurations stay in sync across versions,” someone else might suggest. Yet another person might think it’d be wonderfully cool if you could auto-inject annotations somehow.

Their triumphant moment comes when the CTO is standing over their shoulder screaming about something that needs to be fixed ASAP and they nonchallantly say, “Oh, I can fix that, one second.” They turn to their machines dramatically, edit one or two lines somewhere, smack the return key, twiddle their thumbs, reload the page, and then smile. “No big deal,” they’ll say with a smirk on their face. That’s it. That’s what they live for. They want that one Magic Button moment.

It sounds foolish, but there are plenty of developers like that out there. For these people, it makes perfect sense that if you can automate something little, automating something bigger containing tons of moving parts must be even better. Eventually the automation will reach singularity in the Magic Button.

The problem is that automation suffers from the same law of diminishing returns as does traveling at the speed of light. It takes an infinite amount of energy to accelerate a particle with any mass to the speed of light. In the same way, it takes an infinite amount of energy to create that Magic Button. Not that it stops people from trying. Sure, changing one of the thousands of options that are contained in a config file or database is easy. But if you’ve worked on systems like these you know that doing anything outside of the realm of what the system was designed to do is absolutely, unbearably painful. That Magic Button hides layers of abstraction upon abstraction upon abstraction. Just when you begin to understand what a peice of code does you realize you forgot what code is calling it. In the effort to make something of uber-value, no single component makes any sense.

You find it takes people months to really understand the system. Changes take weeks to test, and lead to reprecussions that no one really ever expected. Once all the original developers are gone everyone starts to realize the system needs to be redesigned. It’s become like the pyramids – beautiful, absolutely brilliantly designed, but a total mystery. This time we’ll do it differently. In Ruby maybe. And auto generate all the documentation using XML…

0 comments

Be A Data Integrity Watchdog

Funny thing happens when you start to put data into a database. It becomes important. At one point it might have seemed like a nice idea to save the visitor’s IP address. Slowly, as the system evolves, little branches of code pop up around the fact that the IP address is populated. Suddenly you find yourself in a position where you have to protect that piece of data. You can’t sit by idly and let that improperly formatted IP address bring down the whole system. You have to guard your system against these intrusions.

And the intrusions will happen. I’ve designed a number of large systems, and the only common denominator is that somehow, at one point or another, at least some of the data will get corrupted. Transactions fail, databases crash, bugs show up in the margins, users enter in stupid information, or hackers attack. I had been spending time thinking about this issue at The Sporting News, but it wasn’t until MLB that it really congealed into something useful.

In the fantasy baseball domain, a typical roster transaction leads to the addition or removal of a player from a particular manager’s roster of players. If the player is added to player P’s roster he should not be available for any other manager in that league. If he’s removed from P’s roster he should be available to all other managers (including P). A player can’t be on more than one manager’s roster at the same time.

It turns out that every once in a while something hiccups and one of these rules is violated. Over the years I learned that the single most important step to fixing the problem is to make sure it doesn’t get worse. So, for instance, imagine a manager attempts to drop a player from his roster but something goes wrong. The system shows that the player is still on the roster, but he’s also technically available to others. Now that the data is corrupt it’s crucial that the system not allow the player’s status to be modified any further. It can very quickly become an impossible problem to solve if the player is picked up by another manager, then traded to another team, then dropped, etc.

I’ve spent many, many hours fixing transactions by hand. I worked on fantasy applications for over 7 years continuously, and in that time I can’t remember a single year where I didn’t have to fix at least some transactions by hand. Let me tell you, it suuuuucks. It really suuuuucks. Sucks and blows.

With that in mind, I developed a scheme where instead of waiting to hear that some player is on two teams via the message boards I take matter into my own hands. I designed the system to check each player involved in a transaction for corruption immediately after the transaction is committed. If I detect that one of the fundamental rules were broken (e.g. owned by more than one manager), the player is immediately frozen. No further transactions on that player would be allowed until an administrator can come in and fix the issue.

So for a small incremental cost I’ve bought myself some peace of mind. And I can absolutely tell you that it paid off, time and time again. It just took a different way of thinking about the problem – being proactive versus reactive, protecting the integrity of all that data.

0 comments

Wipe Your Feet Before You Come Into My House

My code is my house. I spend a lot of time in it. I fix it up, take care of it lovingly. I indent appropriately and actually spend time spacing out sections so they’re pleasing to the eye. I do this only partly because I’m obsessive compulsive. My greater motivation is that I really feel that these things matter.

Think about the word “code” for a minute. I love the word. I am a coder. I write code. What is code? Code is something that means something to the person that writes it, means something to some people/machines that read it, but means nothing to people who don’t know how to read it. Code is inherently cryptic. So the act of writing code is a struggle against entropy. Over time the code’s intent will change, its implementation will be less clear, or its documentation will drift out of sync with the actual representation.

As when you move into a house, code will never be as nice as it is on day one. Something breaks and you have to fix it quickly, leaving a hole in the wall. People come to visit and leave their shit around. Perfect code never stays perfect. So it’s critical that on day one the code is as clean and clear as it can be. And you should expect to do periodic improvements to keep entropy at bay.

Speaking practically, this implies a number of things. First and foremost, formatting matters. Spacing matters. These things help someone else determine the intention of the code you are writing. Related sections should be grouped together with spacing so someone reading knows what can be moved around and what should stay together. The goal is to make the code as pleasing to someone else’s eye as possible. We all know you are very clever, but a single line that chains together 50 method calls is impossible to decipher. Break it up and I’ll respect you more because you did it for me, not for you.

Everyone has a favorite format. The religious wars about curly braces probably consume half the storage space on slashdot’s servers. I’m not entirely above it – I’m infamous for reformatting code when I take control of it. But if I’m just visiting someone else’s code I have a strict policy that the code I write should be indistinguisable (as much as possible) from theirs. This means formatting it the way they do. Using the same naming conventions. Following their capitalization scheme. The point isn’t to show others how superior my formatting is. It’s to make sure that someone else reading the code doesn’t have an anuerism.

2 comments

The Beauty of is_active and display_order

There are a couple of columns that I always include on any tables that contains that that will be displayed to front end users. Over time I’ve developed a preference that these be declared as CHAR(1) with a check constraint to limit them to y and n (or Enums in MySQL). The columns are:

  • is_active
  • display_order

The is_active column is used to determine whether that row should be included in the result set.


SELECT u.name
FROM users u
WHERE u.employer = ‘R/GA’
AND u.is_active = ‘y’

This is cheap way to support a “safe” delete. I’m very wary of ever removing non-corrupted data from a database. Besides the fact that it’s the database’s job to keep data, it’s a dangerous thing. All it takes is a simple bug in the routine that deletes the rows and you get to spend all afternoon with a DBA recovering data from tape. Or worse yet explaining to an end user why they have to re-enter their data. It’s usually not worth the “purity” of the data. The is_active column helps avoid that mess, and keeps the data around so that it can be used later for reporting, etc. Think about it – you have a business owner breathing down your neck because there’s some invalid data showing up on the front end. Which would you rather do? Update a column from “y” to “n” knowing you can change it back if you made a mistake, or delete the rows permanently and hope you didn’t mess up?

The display_order column is a acknowledgement that everyone will have an opinion on how data should be displayed on the front end and the opinion always changes as you go up the corporate ladder.


SELECT c.color as featured_colors
FROM couches c
WHERE c.name = ‘lazy-boy-recliner’
ORDER BY c.display_order

It’s never a good idea to rely on row IDs unless you truly have to. So by having a built in display_order column we allow the business to change their mind as frequently as they want with minimal impact to our pretty code.

Both these columns embody my personal philosophy of letting the database do as much of the work as possible. It takes a while to learn to use the database as more than just a data store – to learn that it can have its own inherent, hidden logic as much as your client code can. In the end it leads to more robust data models that can stand up to changing requirements and emergencies.

0 comments

Dangerous Style

Slow week, so thought I’d vent. The top 3 things that seem like good ideas at the time:

  1. Using IDs in WHERE clauses when a CHAR or VARCHAR column could be used. This is another of those pesky “proper design” practices. As a developer, you should always assume that a row’s ID can change at any time. This is most tempting when joining to lookup tables, where you think the IDs will never change. That might be true in the development environment, but maybe the IDs turn out to be different in the production environment for any number of reasons. All your SQL breaks. It’s always safer to use a unique identifying string value in joins where possible. And if your lookup tables weren’t designed with a unique key on a string column, you should create one. It’s much much safer to do:

    SELECT p.name FROM people p, groups g WHERE p.group_id = g.id AND g.name = ‘friends’

    than it is to do:

    SELECT p.name FROM people p WHERE p.group_id = 1

    At that rate, why have a lookup tables there in the first place? (No, I’m not really suggesting this, although I have worked on systems where this was commonplace).

  2. Misuse/Overuse of IN clauses. When I’m writing ad-hoc queries I tend to throw around the IN clause liberally because it’s usually quicker than a join. In production code, you should severely restrict your use of it. For one, it’s not the most effiecient clause for the database to execute. For another, it’s error prone.Here’s the wrong way to use an IN clause:

    SELECT p.name FROM people p WHERE p.group_id in (SELECT id FROM groups WHERE strangers = ‘n’)

    I guess some developers have the impression that it’s easier for the DBMS to optimize that because it looks like 2 separate queries squashed together. In most non-trivial cases it’s not easier to optimize. Here’s the right way to use an IN clause:

    SELECT p.name FROM people p, groups g WHERE p.group_id = g.id AND g.strangers in (‘n’, ‘notsure’)

    In most correct uses, it could be conceptually replaced with one or more OR clauses. So lay off the IN clauses please.

  3. SELECT * FROM …. queries. There is no reason to ever SELECT * from any table. No reason. It’s a lazy, horrible practice. Spend another 3 minutes and protect yourself from the bugs that will eventually arise when someone adds or modifies a column to that table. If you’re really too lazy to type out the column names, you can do something like this:

    SELECT column_name FROM user_tab_columns WHERE table_name = ‘MESSAGES’

    The table USER_TAB_COLUMNS is part of Oracle’s data dictionary, which I’m hoping to cover more in future posts. That’s just one way to be lazy and productive at the same time.

9 comments

Next Page »