Getting Good IO from Amazon's EBS

2009-07-29 00:23:52 +0000

The performance characteristics of Amazon’s Elastic Block Store are moody, technically opaque and, at times, downright confounding. At Heroku, we’ve spent a lot of time managing EBS disks, and recently I had a very long night trying to figure out how to get the best performance out of the EBS disks, and little did I know I was testing them on a night when they were particularly ornery. On a good day, an EBS can give you 7,000 seeks per second and on a not so good day will give you only 200. On those good days you’ll be singing its praises and on the bad days you’ll be cursing its name. What I think I stumbled on that day was a list of techniques that seem to even out the bumps and get decent performance out of EBS disks even when they are performing badly.

Under perfect circumstances a totally untweaked EBS drive running an ext3 filesystem will get you about 80kb read or write throughput and 7,000 seeks per second. Two disks in a RAID 0 configuration will get you about 140kb read or write and about 10,000 seeks per second, this represents the best numbers I’ve been able to get out of an EBS disk setup as it seems to be saturating the IO channel on the EC2 instance (this makes sense as it would be about what you’d expect from gigabit ethernet). However, when the EBS drives are NOT running their best, which is often, you need a lot more tweaking to get good performance out of them.

The tool I used to benchmark was bonnie++, specifically:

bonnie++ -u nobody -fd /disk/bonnie

Saturating reads and writes was not very hard, but seeks per second – which is CRITICAL for databases – was much more sensitive and is what I was optimizing for in my tests.

In my tests I build raids. I’ve been using mdadm raid 0:

mdadm --create /dev/md0 --metadata=1.1 --level=0 ...

Each EBS disk is claimed to be a redundant disk to begin with so I felt safe just striping for speed.

Now, I just need to take a moment to point something out. Performance testing on EBS is very hard. The disks speed up and slow down on their own. A lot. Telling when your tweak is helping vs it just being luck is not easy. It feels a bit like trying to clock the speed of passing cars with a radar gun from the back of a rampaging bull. I fully expect to find that some of my discoveries here are just a mare’s nest, but hopefully others will prove enduring.

After testing, what I found surprised me:

  • More disks are better than fewer. I’ve had people tell me that performance maxed out for them at 20 to 30 disks. I could not measure anything above 8 disks. Most importantly, lots of disks seem to smooth out the flaky performance of a single EBS disk that might be busy chewing on someone else’s data.
  • Your IO scheduler matters (but not as much as I thought it would). Do not use noop. Use cfq or deadline. I found deadline to be a little better but YMMV.
  • Larger chunk sizes on the raid made a (shockingly) HUGE difference in performance. The sweet spot seemed to be at 256k.
  • A larger read ahead buffer on the raid also made a HUGE difference. I bumped it from 256 bytes to 64k.
  • Use XFS or JFS. The biggest surprise to me was how much better XFS and JFS performed on these moody disks. I am used to seeing only minimal performance enhancements to disk performance when using them but something about the way XFS and JFS group reads and writes plays very nicely with EBS drives.
  • Mounting noatime helps but only by about 5%.
  • Different EC2 instance sizes, much to my surprise, did not make a noticeable difference in disk IO.
  • I was not able to reproduce Ilya’s results where a disk performed poorly when newly created but faster after being zeroed out with dd (due to lazy allocation of sectors).

I’ve included my notes from that day below. I was not running tests three times in a row and taking the standard deviation into account (although I wish I had), and these aren’t easy to reproduce because it’s been a while since the EBS drives were having such a bad day.

Scheduler FS Disks Settings Seq Block W Seq Block RW Seq Block R Random Seeks/s
deadline ext3 1 60K 25K 50K 216
deadline ext3 24 125k 17k 20k 1296
deadline ext3 24 stride=16 125k 18k 20k 866
deadline ext3 24 noatime,stride=16 124k 18k 19k 1639
cfq ext3 24 noatime,stride=16,blockdev —setra 65536 /dev/md0 124k 31k 38k 3939
deadline ext3 24 noatime,chunksize=256,stride=64,blockdev —setra 65536 /dev/md0 126k 37k 44k 1720
cfq ext3 24 noatime,chunksize=256,stride=64,blockdev —setra 65536 /dev/md0 124k 34k 44k 4560
cfq ext3 24 noatime,chunksize=256,stride=64,blockdev —setra 393216 /dev/md0 126k 35k 43k 1860
cfq ext3 24 noatime,chunksize=256,blockdev —setra 65536 /dev/md0 129k 34k 43k 1285
cfq ext3 24 noatime,chunksize=256,blockdev —setra 65536 /dev/sd* 125k 35k 44k 2557
cfq ext3 16 noatime,chunksize=256,stride=64,blockdev —setra 65536 /dev/md0 125k 40k 48k 2770
noop ext3 16 noatime,chunksize=256,stride=64,blockdev —setra 65536 /dev/md0 124k 38k 47k 2504
deadline ext3 16 noatime,chunksize=256,stride=64,blockdev —setra 65536 /dev/md0 125k 41k 46k 1886
cfq xfs 16 noatime,chunksize=256,blockdev —setra 65536 /dev/md0 126k 62k 93k 7428
deadline xfs 16 noatime,chunksize=256,blockdev —setra 65536 /dev/md0 118k 63k 92k 10723
cfq xfs 16 noatime,chunksize=512,blockdev —setra 65536 /dev/md0 122k 63k 92k 10099
cfq xfs 16 noatime,chunksize=512,blockdev —setra 131072 /dev/md0 116k 64k 99k 9664
deadline xfs 16 noatime,chunksize=512,blockdev —setra 131072 /dev/md0 118k 66k 99k 6396
cfq xfs 24 noatime,chunksize=256,blockdev —setra 65536 /dev/md0 117k 62k 89k 7657
cfq xfs 8 noatime,chunksize=256,blockdev —setra 65536 /dev/md0 117k 62k 91k 3059
deadline xfs 8 noatime,chunksize=256,blockdev —setra 65536 /dev/md0 139k 63k 85k 10403
deadline xfs 8 noatime,chunksize=256,blockdev —setra 32768 /dev/md0 124k 60k 82k 9308
deadline xfs 4 noatime,chunksize=256,blockdev —setra 65536 /dev/md0 88k 48k 77k 1133
deadline xfs 12 noatime,chunksize=256,blockdev —setra 65536 /dev/md0 119k 67k 90k 8590
cfq xfs 12 noatime,chunksize=256,blockdev —setra 65536 /dev/md0 85k 64k 91k 10340
cfq ext2 8 noatime,chunksize=256,blockdev —setra 65536 /dev/md0 141k 51k 51k 3242
deadline xfs 8 noatime,chunksize=256,blockdev —setra 65536 /dev/md0 112k,112k,115k 67k,61k,61k 85k,83k,86k 9568,8541,8339
deadline jfs 8 noatime,chunksize=256,blockdev —setra 65536 /dev/md0 135k,138k,85k 66k,64k,33k 92k,87k,79k 9785,10109,8615
blog comments powered by Disqus