The request_token Pattern

The idea behind the request token is another one of those simple-but-powerful patterns that I’ve come to rely on in various systems. I’ll jump right into an example of a case where I wanted to use it but alas I didn’t get to make the change before I left the job.

The architecture was a simple producer-consumer model. Some piece of the system was responsible for placing a row into a table and another was responsible for finding those rows and processing them. As it turns out, the system required many more consumers than producers, which I realize is not all that uncommon.

(Before you go screaming at me about “enterprise” solutions like Oracle’s Advanced Queueing or JMS, that’s not entirely the point. It’s incidental that this situation looks like a producer-consumer problem, but this pattern in more generally useful. So bear with me and think about how to apply it elsewhere.)

So, applying it to an email system where one piece of the system generates the emails and dumps them into a table and another piece of the system takes them out and sends them, you might have a table that looks like this:


CREATE TABLE email_jobs
(id NUMBER NOT NULL
,email_to VARCHAR2(255) NOT NULL
,email_subject VARCHAR2(255) NOT NULL
,email_body VARCHAR2(255) NOT NULL
,insert_ts DATE DEFAULT SYSDATE NOT NULL
,update_ts DATE
,processed_ts DATE DEFAULT SYSDATE NOT NULL

You can imagine the consumer might wake up, ask for the oldest 10 items in the table, send them off in batch, and then go back to sleep. As you might expect, I had a recurring problem where 2 consumers were both attempting to pull the same item from the table and process it. In the above case, a bug like that might lead to the person getting 2 identical emails, which no one wants. There are ways to protect against these kinds of things at the level, but in reality you just want to ensure that no 2 consumers get the same item.

Enter the request token. With this, each consumer produces the a unique indentifier and marks the rows that it wants with that value. It then requests only the rows with that token, making it virtually impossible to have the same row processed by 2 different consumers.


CREATE TABLE email_jobs
(id NUMBER NOT NULL
,email_to VARCHAR2(255) NOT NULL
,email_subject VARCHAR2(255) NOT NULL
,email_body VARCHAR2(255) NOT NULL
,request_token VARCHAR2(255)
,insert_ts DATE DEFAULT SYSDATE NOT NULL
,update_ts DATE
,processed_ts DATE DEFAULT SYSDATE NOT NULL

Notice the addition of the request_token column. On the application side:


//produces a unique number
$token = generate_token()


//mark some rows with the token – only where the request_token is already null – important!
UPDATE email_jobs SET request_token = $token WHERE <….find oldest rows…> AND request_token is null


//do this so other consumers won’t see these rows
COMMIT


//go back and find the ones that you marked
SELECT ej.id FROM email_jobs ej WHERE request_token = $token

Even if you have more than one process hitting that table, one of them will overwrite the other’s value for the request_token. Therefore, unless your application is sensitive to the number of rows each consumer processes, this is completely safe in that it won’t lead to multiple consumers processing the same row.

In general, the request token pattern pre-marks some data so that it’s easy to find later on. Another example that I’ve used in the past is in account creation. What frequently happens is that you have to insert a row and the update it soon after. The problem is that the insert generates a new unique ID that the update needs to know, but sometimes doesn’t. My solution has been to pass a request token to the code that does the insert and then pass that same value to the code that does the update. As long as the request token is unique they should both be able to address the correct row.

At this point you might have the idea to create the request_token column with a UNIQUE constraint so that no two rows can have the same value. Not so fast. In an even more useful case, there have been times when I’ve had to create a bunch of rows and then manipulate them in bulk. So, for instance, create a bunch of new accounts and set their email address to the same value. Without a column like the request_token, you’d potentially having nothing to group them by except for an insert_ts or similar column. With the request_token, it becomes a very easy thing to do.

Be Sociable, Share!
3 comments

3 Comments so far

  1. Seun Osewa on October 15th, 2008

    Why can’t the consumer just select the last-n-rows with an exclusive lock, which is dropped after the emails have been sent?

  2. vdibart on October 15th, 2008

    Since I’m not aware of any feature on any DBMS that supports exclusive read locks, I’ll assume you mean some kind of write lock. Oracle supports this with the FOR UPDATE clause, for instance.

    There are a couple of reasons you want to avoid this. For one, there’s no way to know if a row is locked for update. So you could have 2 consumers read some data, request the update lock, do a lot of work, then line up behind each other to update those rows with the same exact data. This duplication of effort will get worse as the system scales. In fact, this doesn’t ensure that there will only be one consumer, which will disqualify it as a solution in many instances.

  3. Seun Osewa on October 16th, 2008

    I agree. I think I ‘get’ the pattern now. Thanks for sharing.

Leave a Reply