What the heck is an IOP (and why do I care)? Disk math, and does it matter?

I’ll start by answering the title question first.  IOP is an acronym standing for Input Output Operation.  It does seem like it should be IOO, but that’s just not the way it worked out.

A related bit of trivia, we generally talk either about total IOPs for a given task, or we talk about a rate – IOPs per second typically, noted as IOPS.

With that the Wikipedia portion of today’s discussion is complete.   Let’s move on to why we care about IOPs.

Most frequently the topic comes up in terms of either measuring a disk system’s performance, or attempting to size a disk system for a specific workload or loads.  We want to know not how much throughput a given system needs, but how many discrete reads and writes it’s going to generate in a given unit of time.

The reason we want to know is that a given storage system has a discrete number of IOPS it can deliver.  You can read my article on Disk Physics to get a better understanding of why.

In the old days this was mostly a math problem.   We knew that a 7.2K drive would deliver 60-80 IOPS, a 10K drive would deliver 100-120, and a 15K drive would give us 120-150 IOPS.   We also knew that we had to deal with RAID penalties associated with write operations to storage arrays.  Typical values were 1 IO penalty for RAID1 and 10, and 4 for RAID5 and 50.

The idea here was fairly simple.  If I needed a disk subsystem that would give me 1500 IOPS read, then I needed 10 15K drives to do that (1500/150 = 10).   If I needed 1500 IOPS write in a RAID10 comfit, then I needed 20 15K drives ((1500 + (1500 * 1))/150 = 20).   The same 1500 IOPS write in a RAID5 config took more spindles because of the RAID penalties but it was also easily calculated as 50 drives ((1500+(1500*4))/150 = 50).

That last by the way is how come database vendors have always asked that their logs be placed on RAID1 or RAID10 storage.  When writing to RAID5 storage it’s necessary to read the entire RAID stripe, recalculate, and re-write it.  Thus the 4 penalties.

The math got a bit more complicated when we had a mix of reads and writes.   What we have to do there is to calculate the read and write portions separately and then add the result together.  Suppose we had a workload of 3000 IOPS, where 50% was read and 50% was write.  Thus we’d have 1500 IOPS read and 1500 IOPS write.   On a RAID10 system we’d need 10 drives to satisfy the reads, and 20 drives to satisfy the writes.   A total of 30 drives then is needed to satisfy the whole 3000 IOPS workload.

Those were the old days when we could pretty easily look at a disk subsystem and calculate how much performance it should deliver.  Modern disks however have changed the rules some.

How did they change the rules?   Well, basically they have a way of making IOPs disappear.

Consider for a moment NetApp’s WAFL configuration.   WAFL works by caching write operations to an NVRAM on the controller, and telling the application that the IO is complete.   No physical IO operation has actually taken place.  Now, thus far this sounds like a write back cache, but here’s the difference.  WAFL doesn’t just perform a “lazy write” of the cached data, it actually waits until it has a series of writes which need to be written to the physical disks, and then it looks for a place on disk where it can write all of those blocks down at once in sequence.  Thereby taking perhaps 4 or 10 (or more) physical IOPs and combining them into one.   WAFL actually takes this a step further by looking for places on disk where it doesn’t have to read the stripe before writing it in an attempt to also avoid paying the RAID write penalties.  This last is the reason WAFL performance degrades as the disk array becomes very full; it becomes harder to find unused space.

Another example of vanishing IOPs is Nimble’s CASL filesystem that expands on what WAFL does by doing two additional things.  First, it compresses all the data as it comes into the array, which further reduces the number of IOPs necessary to write the data.  Second CASL is based around the idea of having very large FLASH memory based caches so that physical IOPs to spinning disk can be avoided for reads.   The net of this being that write IOPs are reduced and read IOPs are nearly eliminated completely.    In testing done by Dan Brinkman while he was at Lewan, a Nimble array with 12 7.2K disks was clocked at over 18,000 IOPS.  We know that the physical disks were capable of no more than 960 IOPS (80 * 12 = 960).  This is a testament to how effective CASL is at reducing physical IOPs.

A third example of IO reduction is what Atlantis Computing does in their Ilio and USX products when dealing with persistent data (in-memory volumes is a topic for another day).   Atlantis takes the idea of caching and compression further still by adding inline data deduplication, wherein data is evaluated before being written to determine if an identical block has already been written.   If it’s an identical block then no physical write is actually performed for the block, and the Filesystem pointer for that block is merely updated to reflect an additional reference.    Atlantis caches the data (reads and writes) in RAM or on FLASH as well to further reduce physical IO operations.

The extreme case of this is the all-flash storage array (or subsystem), which is available from many vendors these days (Compellent, NetApp, Cisco, Atlantis, VMware vSAN, all offer all flash options and there are many more options as well).   All flash arrays eliminate physical disk IO by eliminating the physical disks. They’ve made the FLASH cache tier so large that there is no longer any need to store the data on a spinning drive.  There is still an upper bound for these arrays but it’s tied to controllers and bandwidth rather than the physics of the storage medium.

So what’s the net of all this?

The first part is that storage has gotten smarter and more efficient by making better use of CPU’s and memory.  Letting them deliver higher performance and better data density with fewer spinning drives.

The second part of the answer is that the old-school disk math around how many IOPS you need and how many spindles (spinning disks) will be required is largely obsolete.  Unless you’re building an old-school storage array or using internal disks in your server the storage is probably doing something to reduce and/or eliminate physical disk IOPs on your behalf.  Making the idea that you can judge the performance of the storage by the number and type of drives is uses pretty much false.   A case of not being able to judge the book by its cover.

You’ll need to discuss your workload with your storage vendor and determine how the array is going to handle your data and then rely on the vendor to size their solution properly for your need.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s