More on tuning. For web servers, particularly those which are Linux based, we’re always wary of what we call “the magic number” for response times for a typical web page.
Nathan recently referred to response times and speed of service, quoting Mark Fletcher of bloglines saying how the speed of their service is indirectly proportional to the exponential of the load. Of course there are usability issues relating to slow response times as well, but in a client server/web environment, they’re also affecting availablity.
This is what we refer to as the magic number, and is the amount of time it takes to return a page from a web server, at which point the server will gain load exponentially.
It can take a simple background job, a bottlenecked database request, request overloading, or some other system process to slow response time to a certain point, at which time most users decide that their browser has failed to load the page. Their immediate response is to click stop and/or refresh, at which point the load on the box almost immediately doubles. This doubling of load effectively halves the magic number, and users will now only tolerate half the time they waited the first time, before they again hit refresh. Other spin off effects include opening another browser window and trying that as well, perhaps trying to get to the failing page via another page on the site, or by getting their work mates to try the same page to see if they also have the problem.
Once response time hits the magic number, you are no longer in control trying to tune the system, you are just focussing on recovery. I’ve seen people try to tune in these conditions, but they’re really wasting their time. Sure, you can obtain some useful diagnostics before restarting the box, but you’re still no longer tuning anything.
While being a good indicator for excessive load, the magic number is also a good benchmark for tuning excercises. The highest priority for performance tuning a system, is to prevent it from hitting the magic number, and of course this is almost impossible to do if the system has already lept past it.
(Originally posted to Synop weblog)