I will never claim that application profiling and stress testing are my strongpoints, but I’m having a really difficult time understanding the results of some tests I’ve been performing on my application.
Here’s the setup. My application is on 2 256 slices, 1 running nginx and PHP through fast cgi. The other is running MySQL. Outside of things like monit and munin, there is nothing else running on these slices. Perfect time to do some stress testing. The application is fairly database heavy, so I long ago decided to integrate memcache with an eye towards boosting peformance. Or so I thought (notice ominous foreshadowing).
My strategery with memcache is to never assume that it is either running, working, or contains the data I need. So my app will use it if it’s there but will carry on unaffected if it’s not. I left hooks in the app to be able to shut off memcache through config changes for cases where I’m testing via XAMPP and don’t have memcache running locally. This turned out to be very useful.
I have a third slice (which runs this blog and a couple of other smaller sites) that I installed http_load on. I used this box to drive the load tests.
One thing about http_load is that it doesn’t understand cookies. You provide it a URL or list of URLs and it just whacks on them until the server breaks. That poses a problem for apps like mine where being logged in is essential to the experience. So I had to make a few changes to the application to support a load testing mode. Once I change to this mode it will take the session identifier out of my config file instead of the cookie. No muss, no fuss, no meaningful change to the app’s behavior while in test mode, which is essential to ensure I’m comparing apples to apples.
OK, enough setup. Here’s one of my test scripts:
So, run 5 threads for 30 seconds. While that’s going on I’m checking top on my nginx and MySQL slices. First thing I notice – MySQL is pretty much sleeping through the test. Good news. Load on that slice barely breaks above .2. But the fast-cgi processes on the nginx box launch to the top and hog up CPU and memory at an alarming rate. Before the 30 seconds is over load on the nginx box is over 3. Not good. End result was about 27 requests per second. Not horrible, but there’s no way the box could maintain that kind of load long term. I ran this test:
Which simulates 20 requests a second. So what I’m trying to do there is find a reasonable amount of traffic that will stress the server but not kill it. That seemed to be about the breaking point. The server handled 20 requests a second with some negative effects but it seemed it could be stable at that level.
So, had to do some thinking. In an effort to cheer myself up, I figured I’d disable memcache and see how bad it would be without its help. If I got between 20 and 30 with memcache surely I’d only get between 15 and 20 without it.
Well, guess what, nerd. Not so much. To my amazement, the result came in around 36 requests per second without memcache. Not only that, CPU consumption by the fast-cgi threads was reduced and their memory consumption was totally normal. Beyond that, load on the database server didn’t budge.
It’s almost like memcache is penalizing me. Things got a little weirder when I commented out the code to pull some objects out of memcache but left others in. The results got down to 10 per second, which is nearly unbearable. I wish I had a conclusive summary to give but right now my thinking is that the overhead of connecting to memcache and hydrating objects is slower than just getting the data from the database. Or maybe I’m just overusing memcache – storing and retrieving too many small objects for example.
So for the meantime I’m running without memcache, despite the hours and hours of work I put in to integrate it, and all the hopes and dreams of the children.