Checklist for Scaling a Web Application

After reading Todd Hoff’s list of scaling lessons learned, I decided to put together my own list of scaling tricks. These were all learned the hard way as well, and will scale you up to thousands of concurrent users. It’s worth noting that many of these don’t cost anything beyond programmer-time, and take less time than ordering/waiting for more hardware. You may still need to throw more hardware at the problem eventually, but these will make sure you’re effectively using the hardware you have.

One code path

Make sure you have an abstract data access layer that ALL queries go through - you can then reroute/cache them as needed without rewriting your entire app every time. You will have to do this eventually, and it’s easiest to do FIRST. If your framework has an ORM, use it.

Set up different servers for different purposes

Make sure your DB servers are separate machines from your webservers. This seems obvious, but often isn’t the case.

Serve static files (images, css, etc) from a separate webserver or CDN - this leaves more CPU for actual work.

Deploying code

  • deploy your code to each webserver’s local filesystem - don’t use NFS or shared drives. This prevents NFS from fucking you, and is worth rewriting your deployment system (you do have a deployment system, right?)
  • set up file caches (esp. smarty cache) per-machine, NOT on network/share. This avoids write-conflicts and loading delays. Write a script that will clear the caches on all machines on command.

Database setup

Set up your database in a master-slave setup with at least 2 slaves - this will open up a lot of scaling options.

Put any columns with fulltext indexes in separate tables (basic sharding), and do secondary writes in your code to update them. This speeds up all queries to the original table, reduces the chances of table corruption, and lets you return less data on many queries.

Use InnoDB for pretty much any table that doesn’t have fulltext indexes. This speeds up all queries to the tables by using row-level rather than table-level locking (there are exceptions, of course, but those depend on your app)

Send all DB writes to the master. This allows you to scale reads and leaves the master time to handle all of the writes. Send some reads to the master - specifically read-after-writes to confirm queries/get IDs/etc. This should be an option passed to your data access layer.

Failovers

Set up a hot backup server that can be set as the master on short notice. TEST THIS FAILOVER!

In-memory caching

Set up a shared memcached or a similar technology to cache DB query results and user info - this avoids redundant read queries and can serve 100x as many queries per second as mysql. Seriously.

Write changes to the memcached cache as well as to the DB - this avoids re-reads.

Set up multiple memcached servers, and shard your cached data between them. Redundancy is good, and will let you scale memcached.

Do less DB writing

This seems obvious, but do less DB write queries. Change your code to do batch updates to a table at once instead of one at a time. This might mean using your ORM a bit less.

Background processing

Move things to batch scripts/cron jobs. This lets you have single-processes doing heavy lifting instead of having them delaying page loads.

Move batch scripts/cron jobs to a separate server and slave database.

Consider using a job queue for background work - this keeps your servers from overloading and lets you monitor health based on # of waiting jobs rather than on CPU usage. It also lets you have lots of copies of the script emptying the same queue.

Summary

These are obviously not the only ways to scale a website, and are actually just the tip of the iceberg, but they cover 90% of the problems that you’re likely to run into while getting started with scaling.