Archive for the 'intermediate' Category


The LAMP Stack is Dead. Long Live the LAMP Stack!

Recently Mike Driscoll took a shot across the bow of the LAMP stack in an article called Node.js and the JavaScript Age. Arguing that the LAMP age (2000 to 2009) was about shuttling data back and forth from the database, he asserts that in the JavaScript age

The principal role of the server is to ship an application to the client (Javascript), along with data (JSON), and let the client weave those into a DOM.

Mr. Driscoll envisions a world where the server becomes “dumb” and the client increasingly becomes the brains of the operation. This comes after his team migrated a dashboard application from Django (Python) into node.js (JavaScript). Mr. Driscoll points are compelling. Imagine that after more than a decade of technology specialization we may finally be on the cusp of a fully unified web stack – mongodb as your data store, node.js as your server environment, and JQuery in the browser – based entirely on JavaScript. We could be witnessing the dawn of the JavaScript stack.

The benefits of the JavaScript stack are clear – code libraries shared across a homogenous team of JavaScript developers leading to reduced development and maintenance costs. All of your unit tests in one language! How can it fail?

In reality it’s probably not going to be that simple.

One big hole (pun intended) in Mr. Driscoll’s design is security. Yes, in some applications only 10% of your server-side templates are HTML. But that 10% is the core of your business, the bits of data you can’t expose directly to the client for them to tinker with – things like account numbers and product prices. This might not be critical for internal-facing reporting apps like Mr. Driscoll’s dashboard app but for public-facing dynamic apps like EBay it’s the difference between success or crash and burn.

Furthermore, current web development teams are specialized for a reason. Front end programming has different demands and patterns than does server-side programming, and these are both very different than database design and programming. Startups and smaller teams might have rockstars who operate seamlessly across tiers, but these developers can be hard to find. So even though JavaScript might be a common language across application tiers, the reality is that as your team grows you’ll still need specialized developers to be effective across all three.

Lastly, even though node.js promises performant asynchronous application design, the truth is that many developers face a steep learning curve before they can be effective programming (and debugging) in asynchronous environments. Like Java threading before it, asynchronous programming will likely find its niche solving certain problems but never become mainstream. Or it will be hidden behind so many layers of framework programming as to be nearly unrecognizable and totally unusable.

So LAMP is far from dead. It will persist exactly because it’s a perfectly tuned commodity solution in a market that values commodity solutions (e.g. Google’s famed server farms). It’s the UNIX of the web layer, encouraging extensible designs by nature of its simplicity and interoperability.

Every age has its upstart, and this age is no different. But to win the upstart has to bring more to the table than the established solution, and it remains to be seen whether the JavaScript stack has what it takes to displace the LAMP juggernaut.

0 comments

The Future is DotCloud

Something about the Donors Choose Hacking Education contest appealed to me so I started brainstorming ideas. After a eureka moment (more to come on this later), I was determined to dig in and start coding. With all the hubbub about fan favorite Slicehost dissolving into Rackspace in the near future, I thought the time was right to explore other options.

Enter DotCloud. Similar to Heroku, PHP Fog, or even Google App Engine, DotCloud is a platform as a service. Like Heroku and PHP Fog it’s backed by Amazon AWS. Unlike those other platforms though DotCloud is committed to supporting a variety of open source languages, databases, and message buses. Not just Ruby. Not just PHP. Don’t even get me started about some proprietary environment.

Love. It.

You can mix and match these environments to suit your taste. Want a PHP server with a MongoDB backend? No problem. Spin up a PHP instance and a MongoDB instance and then configure your PHP connection settings. Need some memcached? No problem, throw that in as well. All the instance types are specialized but interoperable, usually requiring nothing more than a little config to connect the dots between the components.

I’m thoroughly impressed with DotCloud. For a developer there’s much to like, and maybe a few concerns to express. Here’s where I stand:

The Good

  • Easy setup: It’s not an exaggeration to say you can have a site up in minutes.  Their tutorials are clear.  I control the code locally and then push it up to DotCloud when it’s ready. They deploy it automatically for me. Perfect.
  • Easy management: I can’t tell you how much time I wasted worrying over minutia in my Slicehost setups.  What Linux flavor and version should I choose?  What ports should I open?  How many user accounts should I create?  Where should I put my web root?  Is my nginx config optimized?  Is the box locked down enough?  Although it was a great learning experience, the fact of the matter is that I’m not and never will be a sysadmin.  I’m a developer.  The good people at DotCloud are (hopefully) competent sysadmins.  Let them worry about it.  Me, I’ll just focus on the code.  Mission accomplished.
  • Truly technology agnostic – Here’s my favorite line from the help docs: “Q: Do I need to use Git to use DotCloud? A: No. We want to help all developers – and not all developers use git.” Fuck yea. Makes Heroku seem snobby by comparison (git only). Look, I know what they’re hoping to accomplish isn’t easy. The reason why Heroku is so popular is that it’s made for Ruby geeks by Ruby geeks. It’s not easy to perfect one language or platform, much less multiple ones. And let you mix and match them at will. But that’s where the next point comes in.
  • Great tools: They’ve hit the perfect balance between “we’ll handle that” and “you can do that if you need to”. They’ll set up a MySQL instance for you. But if you need to get in there to hack some data, you can do that. You can deploy without needing to ssh into your web box, but you can if you want to. Flexibility is the key here. I was thrilled to find that I could customize my nginx.conf file to clean up my CodeIgniter URLs (removing index.php). Last night I spent about 20 minutes configuring an SMTP instance so it will automatically use a gmail account for relay when I call mail() from my code. And yes, you can set up cron jobs. They even provide an SSL cert if you need it (as long as you use their domain). It seems they’ve thought of everything.
  • Command line: Their single command line utility “dotcloud” has everything built into it, but everything makes sense. The commands have become 2nd nature to me after only a few weeks of usage. They nailed it.

The Bad and The Unknown

  • Support: I had a problem pushing some files to my instance recently. Being a beta service, I waited about 6 hours before contacting them about it. There was no notice on their site, and my email went unanswered. But eventually the issue did resolve itself. Still, a little scary. I would expect them to be on top of stuff like that.
  • Background processes: I’m still not really sure where the best place to run background processes is. I typically like to hack up some Perl to do batch updates and stuff like that. Although I can ssh into the instance and set up cron to run some random Perl script, I don’t have permissions to pull down new modules from CPAN, so I’m SOL to some degree. For now I’m running the background processes on a Slicehost box.
  • Pricing: Right now it’s Free (as in Beta). There’s no indication of what the eventual pricing model would look like. My main concern here is that for my small app I’m using 4 separate instances (php, mysql, static, and smtp). A per-instance pricing model is going to kill me.
  • Virtual hosts: Related to pricing, I’m not clear how I would set up virtual hosts. On one of my Slicehost slices I’m hosting 3 instances of WordPress and a random other site, all on unique domains all with the same nginx configuration. I’d hate to have to manage 4 separate instances for this.
  • Scaling: Everything seems to work nicely in a single-server environment. What’s not clear to me yet is how they will handle horizontal scaling of front end web servers running the same code. I’m confident they’ll figure this out, but hopefully it won’t complicate their architecture too much.
  • Performance: Since they’re EC2-backed I’m not as concerned about web server performance. My concern is more about internal network performance, i.e. web server to database. I’m not even sure how to monitor this or what can be done to improve it. If they’ve done their jobs right this shouldn’t be an issue, but it’s likely to come up at some point.
  • Monitoring: I haven’t found any built-in tools for system monitoring yet. These would be nice but again, since it’s EC2-backed it’s possible you can use third party tools for this.
  • Source Control: It might be nice if they offered a git or mercurial instance type so you can keep your code in the cloud as well. I would consider it but I’m not sure how many others would. Right now I’m using CodebaseHQ, which is a poor man’s version of FogBugz/Kiln.

The Future
I’m becoming more comfortable with the idea of deploying a “real” site on DotCloud. Although I don’t imagine R/GA’s clients selecting it any time soon, it seems ideal for everything from rapid prototyping through early startup. Migration to another platform should be simple because you’re not locked into any proprietary languages or workflows. DotCloud is the next logical step in the evolution begun by Amazon years ago – commoditize what can be commoditized, customize the rest. I’m sold.

6 comments

Rackspace Email Hosting vs. Google Apps

I’d been using Google Apps for receiving emails sent to my domain up until an hour ago. As I’ve mentioned before, I’m running my app on Slicehost, and as usual they had some great instructions for using Google Apps for your email needs.

That was working kinda ok but there were a couple of things that annoy me about that solution. First is that I just don’t want Google involved in every single thing I do online. I generally trust them, but there are some things I don’t want to use them for, namely anything to do with my business (I don’t use Google Analytics either). The second is that I think it’s highway robbery to pay $50 per user per year for the premier account. I only need 2 right now, but down the line I might need more. I didn’t relish the thought of giving them $300 or $400 a year to provide a beefed up version of their free tools.

So today I discovered that Rackspace has an email hosting solution as well. And if you’re a Slicehost customer and need 3 or fewer inboxes (that me!) it’s only $3/month. The normal starter package’s price is $10/month for up to 10 inboxes, which is still totally reasonable. So in less than an hour I converted from Google Apps to Rackspace Email Hosting. And of course they have the usual helpful configuration instructions to get you started.

I have a couple of concerns that I’ll follow up on in future posts. The first is that according to the representative I chatted with there’s a limit of about 200 outgoing emails per hour. I think that’s going to be ok for my app, but I guess I’ll see. The other is that I’m pretty useless with mail configuration things and I’m a little nervous about how much effort will be involved in connecting my local postfix to their smtp server for outgoing email. I’m sure I’ll figure that out eventually though.

In any case, for $3/month, moving back to Google won’t be a huge issue if it should come to that. Hopefully it won’t. I’ve already gotten a few small tastes of the fanatical support from Rackspace and I have to say it’s pretty nice so far.

13 comments

Over-engineering is like Snoring

A lot of developer cycles are spent discussing the benefits of YAGNI and KISS. On the surface it would seem that there is an army of righteous developers fighting against the demons of over-engineering and maximum complexity. And despite our valiant battles, despite all the books and blog posts and rallying calls from respected technology professionals, the demons are still churning out bloated, impossible to maintain code.

I’ll let you in on a little secret. We are the enemy. Not just the guy who sits next to you, or the guy that churned out mess of code and then left the company. You are the problem. I am the problem. The enemy is us.

Yes, we can all agree in principle that complexity is bad and simplicity is good. The problem is that complexity is completely subjective. Maybe you misjudge or were misinformed about how likely it is that a certain feature will be needed. Maybe you thought of some brilliant solution and you want to leave it as a placeholder in case you need to come back to it later (or so others can see how clever you are). Maybe you don’t want to do it the cheap way because you’re afraid others will snicker at your solution. Maybe you’re afraid a simple solution will lead to longer development times later on. Maybe your definition of simplicity is skewed. Whatever the case, no one sets out to over-complicate a piece of code. And yet it happens time and time again.

There are rules of thumb that can be followed. But what it boils down to is always discipline. It’s not easy to simplify. It sometimes feels wrong. But I’ve never looked at working code and cursed because it was too simple. I’m not even sure it’s possible for working code to be too simple. But it sure as hell easy for it to be too complex.

So why is over-engineering is like snoring? Because no one thinks they do it. And yet somehow there is a market for snoring relief aides.

2 comments

Adventures in SSL – Part II: Integration Strategy

In my first post about SSL integration on my site, I discussed how I came to a decision about a certificate issuer. I chose DigiCert, and have been very happy with them. One great bonus was their extensive list of instructions for setting up the certs on almost any web server known to man. So even though Part II of this series was intended to be about installation, I think DigiCert has that covered. Their instructions for nginx were spot on, so I wouldn’t be able to add anything meaningful to them anyway.

But buying and installing the certificate is a little different than using it. This post will focus on how I integrated the certificate into the site and what additional nginx configuration I had to make to support that strategy.

After kicking it around for a while I realized I really have 2 options. I can either convert the entire site to use https or convert as few pages as possible (e.g. just the login and register pages). The argument for a limited use of https is that all else being equal, the web server will require a little more CPU to encrypt/decrypt the https traffic. This is apparently an issue particularly with nginx as even the creator has said it can drag down performance for high-traffic sites. Since I’m not expecting Amazon-level traffic, this wasn’t as big a deal to me.

Another argument for limiting the use of https is that some low-cost CDNs, such as Amazon CouldFront, don’t support https traffic. This was a concern for me. I will eventually want to move my images, screencasts, stylesheets, and JS files to a CDN, so the fewer https pages I have the less of an issue this would be.

Related to this, some posts I read claimed that browsers will refuse to cache images, CSS, and scripts if they came across https. In my testing with Charles in Firefox and IE on Windows I did not experience that. In other words, any files that could be cached by the brower were cached. Yes, it was a limited test, but it covers a lot of the target base of my app. I believe either this used to be the case and no longer is or it’s one of those old wive’s tales that people just assume is the case but have never really taken the time to test.

I saw a couple of benefits for using https for the whole site. The first was that it simplified my application architecture. For instance, say you have a login page that’s intended to be served over https but it includes a common header image that’s present on all pages. That image has to also be served over https on the login page or the user will get a popup warning message that the page contains both secure and insecure content. That message is at least annoying if not scary to some users, so it’s best to avoid it by ensuring that the image is served up via https. But that means you may have a situation where you have 2 copies of that image so that it can be served up by both https and http. Or your configuration might become more complex in order to support 2 virtual servers pointing at the same image file on disk. Either way it’s a complicating factor that I wasn’t thrilled about wasting time on. If the entire site is served over https this issue goes away.

Secondly, it would be easier to configure than having only some pages be served via https. For instance, let’s say the login page is https. If someone asks for that page via http, the server should be nice and redirect them to https. But for almost all other pages it should allow regular http requests to process normally. These exceptions are easy to handle for one or two pages, but for more than a couple that quickly becomes difficult to manage effectively.

Lastly, my application is targeted at kids in the 10 to 15 years old range. For me, the more security the better. As with any site that relies on cookies to identify logged in users, it’s theoretically possible to hijack someone’s session via the cookie value, and if that were to happen it would lead to some seriously bad press for me. Again, if the entire site is accessed over https this issue goes away.

So as you can probably guess, I decided to serve the entire site over https. The big question I haven’t answered here is what effects this had on performance. I’ll discuss that the final installment in this series. But for those also using nginx, below is an excerpt of the config changes I made to support this. It should be self-explanatory, but leave me a comment if you need any help through it.


# non-secure site - send all requests to https
server {

        server_name www.mysite.com mysite.com;
        listen 80;

        location / {
           rewrite ^/(.*)$ https://www.mysite.com/$1 permanent;
        }
}

# secure site
server {

        server_name www.mysite.com mysite.com;

        listen 443;
        ssl on;
        ssl_certificate /path/to/pem/file;
        ssl_certificate_key /path/to/key/file;
        .....
}

0 comments

Facebook Status Updates and Infinite Session Keys

Anyone have the first clue as to why Facebook’s developer documentation sucks so hard?

I was developing a simple Facebook application for one of my company’s clients that required me to update a user’s status via a scheduled background process. The developer documentation lead me down all kinds of paths by referencing infinite session keys and the “keep me logged in” check box. So I scoured the internets for some examples, only to find that there aren’t many. All these claims that bajillions of people are creating Facebook apps and not a single one of them that are updating a user’s status offline can document it? ARRRGGG!

So, here is what I hope will save someone else a ton of time – a real life, working code sample for updating a user’s Facebook status offline. Careful – make no sudden moves or you might scare this rare beast back into hiding.

Our app is requesting two extended permissions – “offline_access” and “status_update”. This is also using Elliot Haughin’s Facebook plugin for CodeIgniter. Elliot’s package includes an older version of the Facebook PHP Library, so I had to grab the latest version from Facebook and drop it in place. Other than that it was easy to integrate this into my app.

//http://wiki.developers.facebook.com/index.php/Users.hasAppPermission
//must be one of:
//   email, read_stream, publish_stream, offline_access, status_update, photo_upload, 
//   create_event, rsvp_event, sms, video_upload, create_note, share_item
if( $this->facebook_connect->client->users_hasAppPermission("offline_access", $fbUID) &&
    $this->facebook_connect->client->users_hasAppPermission("status_update", $fbUID) ){
    $this->facebook_connect->client->users_setStatus("some status message", $fbUID); 
}

Seriously, that’s it! All those posts, all that searching – for 3 lines of code! The key point that was conveniently left out of other articles is that there is no “session key” required now. Facebook is smart enough to know that the user granted the app permission for offline_access and status_update, so you only need to send the user’s Facebook ID. Moley.

Another annoyance. They make a big deal out of the fact that they provide a REST-ful interface, but none of the examples in their documentation show the format of the REST request (although they do at least provide the REST server URL and a handy hint to include the “Content-Type: application/x-www-form-urlencoded” header). Yes, I get it, you want me to use the PHP Library, which is nicely designed. But for quick and dirty testing I like to whip up some curl commands. If I don’t know how to format the XML I can’t easily do that. Bah!

4 comments

CodeIgniter…Meet Minify

NOTE: This post has an update that explains an improved technique. The technique below will still work (with some tweaks for CodeIgniter 1.7.1 or above), but is probably not preferred at this point.

As a followup to one of my previous posts I wanted to go through how I managed to get CodeIgniter and Minify to play nice with each other. Hopefully this will make someone else’s life easier. For those not using CodeIgniter this post might be either confusing or boring. Or both I guess.

My approach might seem code-heavy compared to other solutions but it has the virtue of requiring only a small change to single file that would be included by all pages on your site. That’s typically not a problem since the first thing I do when I’m working on a site is to break out the common elements such as the <html> and <head> tags to their own included header file.

In CodeIgniter I created a library called MY_Includes.php (/system/application/libraries/MY_Includes.php). This is the core class that contains the mappings between each controller and the JavaScript and CSS files required by the view that will be loaded by the controller that was invoked by the browser. Obviously this implies the extra step. If I create a new JavaScript or CSS file I can’t go into the globally included header file and add a <script> or <link> tag there – I have to edit MY_Includes.php to map the JavaScript or CSS file to that particular view. Yea, it seems weird to edit a PHP file to add a CSS or JavaScript file, but there are a couple of different factors at work here and this solution made the most sense to me. The big win was that it helped integrate Minify into my codebase with almost minimal effort.

You can see an edited version of MY_Includes.php here (Note: this is an old version). I wanted to walk through this code a bit to highlight the important parts, but hopefully it’s readable on its own.

First, you’ll notice the constructor requires the name of the controller that was invoked. I’ll show you how I get that later on, but essentially the whole class relies on that piece of information. My application is fairly linear in the sense that once I know the controller’s name I know (barring exception cases) which view will be invoked.

This in turn allows me to map controllers directly to JS and CSS files, which is why you’ll see the init method set up 2 hashes containing the JS and CSS files that I have access to, jsFilesHave and cssFilesHave. The key in the hash is a logical name I will use when adding the file to a view. This will improve readability and reduce errors and maintenance. The value in the hash is a string that specifies where the corresponding source file can be found. This is relative to the web root and is of a form that Minify understands. Whenever I create a new JS or CSS file I have to first add it to one of these hashes so that I can refer to it later in the file.

One other note on the init – I’m not sure if I needed to, but I found it easiest to break with the CodeIgniter way of doing things and issue a PHP include statement to tell the class where to find the Minify source in the below snippet from that method.

//from minify examples:
//Add the location of Minify’s “lib” directory to the include_path.
ini_set(‘include_path’, ‘/home/vdibart/minify/lib/.:’ . ini_get(‘include_path’) );
require ‘Minify/Build.php’;
require ‘Minify.php’;

After init, the constructor will call compileTags. This is the heart of the logic. You can see it populate the cssFilesNeed and jsFilesNeed hashes, first with the files that are common to all views and then the ones depending on which controller was invoked.

Determining which controller was invoked is fairly straightforward. The following code is at the top of my globally included header file:

//for globally included header file
//so know which CSS or JS files to include
$pageName = $this->uri->segment(1, 0);
$pageName .= “/” . $this->uri->segment(2, “index”);
$this->load->library(“MY_Includes’, $pageName);

So if the controller was “http://www.mysite.com/member/register”, this code will pass “member/register” to the constructor of my class. Later on in the same header file I have the following 2 lines, which will extract the appropriate CSS and JS links:

<!– for globally included header file –>
<link rel=”stylesheet” href=”<?= $this->CI->my_includes->cssTag(); ?>” type=”text/css” media=”screen” />
<script src=”<?= $this->CI->my_includes->jsTag(); ?>” type=”text/javascript” charset=”utf-8″></script>

Switching back to the source code of MY_Includes.php, you can see those 2 methods invoke Minify to build the included files and then return a URL that can be used to retrieve the files. There’s a little bit of work in each of those to make the URL look like something that CodeIgniter will work with. So once the PHP executes the above tags will look like this in the final source code for the page:

<link rel=”stylesheet” href=”http://www.mysite.com/includetag/css/member-register/1222014216″ type=”text/css” media=”screen” />
<script src=”http://www.mysite.com/includetag/js/member-register/1222098068″ type=”text/javascript” charset=”utf-8″></script>

So each rendered page on my site has only 1 CSS file and 1 JS file included. And those files are minimized and cached. All of that is due to Minify. But you’ll notice there’s one piece of the puzzle still missing. The above <link> and script tags refer back to my site, and there has to be something that knows how to interpret that and return the appropriate CSS or JavaScript data. It turns out that “includetag” is a CodeIgniter controller that I created. I’ve included the source code here. There’s not a ton to mention here. The class loads the exact same helper class MY_Includes.php that interfaces with Minify to retrieve the CSS or JS file and return them to the client.

Hopefully there’s enough to get you through to a working version. To summarize the steps:

  1. Download MY_Includes.php (here – see updated version) and put it in your /system/applications/libraries directory
  2. Edit the init method inside of MY_Includes.php to include the correct path to your Minify installation
  3. Edit the init method inside of MY_Includes.php to include your CSS and JS files
  4. Edit the compileTags method inside of MY_Includes.php to include the correct files for each controller
  5. Download includetag.php (here) and put it in /system/applications/controllers directory
  6. Add the two code fragments commented with “for globally included header file” above to the appropriate file in your application
  7. Fire it up

Feel free to post a comment if you have troubles and I’ll walk you through it or edit the post to fix any errors as needed.

NOTE: This post has an update that explains an improved technique. The technique above will still work (with some tweaks for CodeIgniter 1.7.1 or above), but is probably not preferred at this point.

14 comments

In Praise of Minify

Having read High Performance Web Sites, I figured I’d take a little time out of the development of new features on my side project to look at some basic performance issues. The first stop was YSlow, the Firefox plugin that works with Firebug to give you a simple report on how you rate on the Yahoo! performance scale. Mine being a tiny site, the report before any optimizations was decent but not great. There was definitely room for improvement so I figured I’d put some of the advice I’ve read recently into practice.

The first optimization was very easy. I made sure my images were sufficiently cached by adding a quick .htaccess file in the directory where my images are stored on the server. I saw 2 different techniques for doing this. One was based on file extension, such as the technique discussed here. The second was based on the file’s content-type, which was discussed here. On the margin the one based on content-type seemed a safer bet. That way if I have a file that’s incorrectly named it will still get cached.

The next step was to try to improve my JavaScript and CSS includes. As mentioned in High Performance Web Sites, the files should be minimized in order to save bandwidth. They should have far future expires headers so that the browser doesn’t request them after the first visit. And the number of includes should be limited so that there’s fewer requests that need to be made. Luckily someone much smarter than I already developed just about the perfect solution to all those issues and more. The Minify library for PHP is one of those pieces of code that does exactly what I was hoping it would do in exactly the way I was hoping it would. And to boot it required as little effort to integrate into my existing code base as could reasonably be expected. I recommend that anyone running even a small site on their own take a look at Minify. There’s absolutely no reason not to be using this wonderful little library. None. Go out right now and do it.

There was one snag in process of integrating Minify with my project. As I’ve mentioned, I’m using the CodeIgniter framework. It turns out that Minify and CodeIgniter needed a little bit of coaxing to work together, but nothing that got too messy. I’m going to leave that discussion for my next post, which will hopefully not take 4 months to write 🙂

3 comments

Wherein I Question the Usefulness of MVC

I decided to use CodeIgniter for a PHP project that I’m working on. CodeIgniter is an MVC framework, not too unlike CakePHP. At least I imagine they’re very similar, but I can’t say for sure as the reason I chose CodeIgniter over CakePHP was that the CakePHP documentation is a mess and I didn’t have time to wade through it. CodeIgniter has been fairly easy to work with so far. I’m sure there are tons of CodeIngiter reviews by developers like me out there, so I won’t bore you with that just yet (future post!).

This post is about Model-View-Controller (MVC) architecture. Like any developer, I’ve read countless retellings of why patterns and MVC are good for your code. True to form, I think those claims are overblown. I’ve worked with people that do everything “By the Book” and I’ve worked with people that hack everything together as best they can. Seeing both sides of it I honestly can’t say that one made my life any better than the other. Unstructured code, if kept reigned in to some degree, can be incredibly flexible and allow you to be agile in the face of rapidly-changing priorities.

For instance, I’m not above having SQL statements in a JSP file. I don’t love it. I try to avoid it if it’s going to get messy. But I don’t think it’s something to be embarrassed about. I can’t tell you how many times I’ve been able to move a change out in minutes rather than weeks because I was able to tweak a query in the JSP. No, it’s not “By the Book”. But it works, and in the end that’s what you get paid for.

My general rule of thumb is that the closer to the end user your code is the more flexible it has to be. Consider the following range of technologies that flow from the user end to the server side: HTML/CSS, Javascript, PHP/Java/Ruby, PL/SQL, database schema. HTML needs to be more flexible than Java, which needs to be more flexible than the database schema. So for every 1000 times you tweak your HTML or CSS, you might need to make a couple of changes to your backend Java. Sounds reasonable.

So coming back to MVC, one thing I’ve never understood is why the controller is responsible for selecting which view is invoked. This seems fundamentally flawed to me. In a language like Java the controller is a servlet compiled into a jar file somewhere. To change the behavior of that file you have to go through an entire release process: change code, test, promote to QA, test, promote to production, test. At MLB, a change like that took about 2 weeks from start to finish. (Obviously the situation is a little different if you use PHP, which is why I’ve decided to use an MVC framework for the PHP project).

In essence, it’s like the backend developers are saying “Move aside HTML, let the big boys make the call. We know better which file should be displayed”. You know what, they don’t and they shouldn’t. Yes, I know about Front Controllers. Yawn. Yes, I know you could easily write the system such that the flow through the views is configured using XML so it can be changed on the fly, as they did at MLB. Snore. Don’t get me started on XML for configuration. These are all solutions in search of a problem. These things can be done, but no one has really ever convinced me that they need to be done. Agility requires simplicity. Simplicity can’t be configured with XML.

0 comments

The request_token Pattern

The idea behind the request token is another one of those simple-but-powerful patterns that I’ve come to rely on in various systems. I’ll jump right into an example of a case where I wanted to use it but alas I didn’t get to make the change before I left the job.

The architecture was a simple producer-consumer model. Some piece of the system was responsible for placing a row into a table and another was responsible for finding those rows and processing them. As it turns out, the system required many more consumers than producers, which I realize is not all that uncommon.

(Before you go screaming at me about “enterprise” solutions like Oracle’s Advanced Queueing or JMS, that’s not entirely the point. It’s incidental that this situation looks like a producer-consumer problem, but this pattern in more generally useful. So bear with me and think about how to apply it elsewhere.)

So, applying it to an email system where one piece of the system generates the emails and dumps them into a table and another piece of the system takes them out and sends them, you might have a table that looks like this:


CREATE TABLE email_jobs
(id NUMBER NOT NULL
,email_to VARCHAR2(255) NOT NULL
,email_subject VARCHAR2(255) NOT NULL
,email_body VARCHAR2(255) NOT NULL
,insert_ts DATE DEFAULT SYSDATE NOT NULL
,update_ts DATE
,processed_ts DATE DEFAULT SYSDATE NOT NULL

You can imagine the consumer might wake up, ask for the oldest 10 items in the table, send them off in batch, and then go back to sleep. As you might expect, I had a recurring problem where 2 consumers were both attempting to pull the same item from the table and process it. In the above case, a bug like that might lead to the person getting 2 identical emails, which no one wants. There are ways to protect against these kinds of things at the level, but in reality you just want to ensure that no 2 consumers get the same item.

Enter the request token. With this, each consumer produces the a unique indentifier and marks the rows that it wants with that value. It then requests only the rows with that token, making it virtually impossible to have the same row processed by 2 different consumers.


CREATE TABLE email_jobs
(id NUMBER NOT NULL
,email_to VARCHAR2(255) NOT NULL
,email_subject VARCHAR2(255) NOT NULL
,email_body VARCHAR2(255) NOT NULL
,request_token VARCHAR2(255)
,insert_ts DATE DEFAULT SYSDATE NOT NULL
,update_ts DATE
,processed_ts DATE DEFAULT SYSDATE NOT NULL

Notice the addition of the request_token column. On the application side:


//produces a unique number
$token = generate_token()


//mark some rows with the token – only where the request_token is already null – important!
UPDATE email_jobs SET request_token = $token WHERE <….find oldest rows…> AND request_token is null


//do this so other consumers won’t see these rows
COMMIT


//go back and find the ones that you marked
SELECT ej.id FROM email_jobs ej WHERE request_token = $token

Even if you have more than one process hitting that table, one of them will overwrite the other’s value for the request_token. Therefore, unless your application is sensitive to the number of rows each consumer processes, this is completely safe in that it won’t lead to multiple consumers processing the same row.

In general, the request token pattern pre-marks some data so that it’s easy to find later on. Another example that I’ve used in the past is in account creation. What frequently happens is that you have to insert a row and the update it soon after. The problem is that the insert generates a new unique ID that the update needs to know, but sometimes doesn’t. My solution has been to pass a request token to the code that does the insert and then pass that same value to the code that does the update. As long as the request token is unique they should both be able to address the correct row.

At this point you might have the idea to create the request_token column with a UNIQUE constraint so that no two rows can have the same value. Not so fast. In an even more useful case, there have been times when I’ve had to create a bunch of rows and then manipulate them in bulk. So, for instance, create a bunch of new accounts and set their email address to the same value. Without a column like the request_token, you’d potentially having nothing to group them by except for an insert_ts or similar column. With the request_token, it becomes a very easy thing to do.

3 comments

Next Page »