Fusion IO: Software Defined Storage

Originally Posted Dec 14, 2012

This week I was very privileged to go to Salt Lake City to Fusion IO headquarters and get a deep dive on their technology and how it differentiates from other competitors in the high speed, low latency storage market. (Which is really starting to just be the general storage market these days instead of something niche.) It was super neat to go there with a bunch of my buddies from Cisco and I can’t thank them enough for having us and treating us so go. This meeting was brought about primarily because Fusion IO and Cisco have introduced a new Mezzanine Fusion IO card for the B series blades. Specifically: The B200 M3 and B420 M3. The card is the same as the ioDrive2 but in Cisco blade server form factor. This first version is 758GB drive. We had a great time and and learned a ton.

Fusion IO’s main product is branded as ioMemory. Its marketed as a new memory tier. The co-founder, David Flynn had the idea of taking Flash Memory, putting it on a PCI card, slap a basic controller on it and putting it into the server. The server would then see this as a hard drive. By not using legacy protocols of SAS or SATA and using their own protocol with software, they were able to get IO latency down to microseconds from milliseconds. Couple this with flash memory and it translates to more IOPS, which means applications that normally have to wait for disk reads and writes can do it on orders of magnitudes faster. One of the examples they cited said that customers were getting 20 times the performance with these drives compared to using standard disk drive arrays. (Not 20% better, but 20 x) From that above linked Wikipedia article it shows that fastest 15k RPM hard drive will only get around 200 IOPS. Compare that to a Fusion IO card that same article shows 140,000 IOPS. (My notes also say that they are getting 500,000 IOPS, I’m not sure which is correct but the idea is that its blazing fast.)

If you aren’t familiar with the state of the data center, and I commented on this in myVMworld 2012 post, storage is one of the biggest problem. Numerous blogs, articles, and talks show that storage is the biggest bottle neck and the largest expense in the data center. Duncan Epping commented at VMworld on the topic of performance that “The network is usually blamed but storage is usually the problem”. There is a famine happening in our data centers. Applications are starved for data. They are waiting and waiting to get their data from disks. Applications today are like a bunch of hungry children at home whose moms went to the store to get food and are all stuck in traffic with other moms taking a long time to make that round trip. Storage IO performance has not kept up with the spectacular rate that processing power has improved over the last decade or so.

What we have been doing for the last 10 years (me personally and others) is designing storage systems that we think will meet the performance requirements. When we get up and running we soon find that the storage system doesn’t meet the performance needs so we throw more disks at them until it does. Soon we have tons more capacity than we need and a bigger footprint. I’m not alone in this. This is standard practice. Commenting on this, Jim Dawson, the Vice President of world wide sales wrote for all of us to see: DFP = RIP. This he said means: Disks for performance is dead. He also mentioned that his customers when he was at 3PAR were adding so many disks for performance that they asked him to make smaller disk sizes because they didn’t need capacity, they needed performance.

Flash Memory to the rescue

The reason you are probably hearing so much about flash memory now and not before is because the price of flash memory has fallen below the price of DRAM (the kind of memory that when you pull the power power it, it forgets everything that was in it). Flash memory, specifically NAND flash, is the flash that’s used in Fusion IO, SSDs, SSD arrays, and pretty much everything you see out there that’s called flash storage. This type of memory when you pull the power doesn’t forget which bits were flipped to ones or zeros. NAND flash are the building blocks for nearly all the fast storage you’ve been hearing about. From people making USB thumb drives, SSDs, PCI SAS, Violin, or Texas Memory Systems (now IBM) and make arrays with them using their own controllers, they’re all using NAND flash.

The difference is how the flash is accessed. SSDs go through the SAS or SATA controllers that add significant over head. That makes it slower since those are legacy protocols used for hard drive technology. But if you have one in your mac book pro like I hope to have soon, then you are not complaining and its just fine. Most of the Flash storage solutions out there are based on using SAS/SATA protocols to access flash storage: Nimble storage, whiptail, etc. Its more simple to develop because the protocols are already defined and they can concentrate on value add at the top, like putting more protocols or better management tools in it.

Fusion IO has two advantages over these technologies. First, since they are on the PCI bus, they are closer to the processor so its much faster. Second, they don’t have the overhead of a controller translating older protocols. There’s a driver that sits on the OS that manages it all. Since they don’t go through the standard protocols they can also add better monitoring tools and even add more on top of that to innovate cool solutions. (ioTurbine is an example of this that I’ll get to in a minute)

Fusion IO secret sauce

The ioDrive2 card is main product. Its a PCIe card with a bunch of 25nm NAND flash chips on it. We had this amazing Fusion IO engineer named Bob Wood come in and talk to us about how it works. He schooled us so hard I thought I was back incollege. We were worried we were going to get more marketing but in the words of @ciscoservergeek: Our expectations were amazingly surpassed.

Flash memory has what’s called an Erase Block. This is the smallest atomic unit that can be written. As flash gets smaller having 3 or more electrons leave, or somehow get disturbed will cause the erase block to flip a bit and be wrong. The controller software is then always looking to make sure things are still the way they should be.

A standard fusion IO card is built in with about 20% of spare capacity that is used for when erase blocks get contaminated or flipped too many times. Bob equated it to standing on top of a mountain and being struck by lightning. There’s only so many times you can be struck by lightning and still go on. (Apparently NAND flash can handle it more than humans). When one of these erase blocks is retired, the card draws from the 20% pool. In addition, other erase blocks are reserved for features to handle more error checking. More official information on this “Adaptive Flashback” is here.

I asked then: So if I have a 750GB card, do I only get to see 600GB of space? No, the 20% overhead plus other reserved pools is in addition to the 750GB, so you will see that much capacity. I imagine that the raw capacity is probably from 900GB to 1TB.

Bob told us that the design of the card is the classic engineering tradeoff design and finding the ultimate efficiency. You have to worry which NAND flash you use, multiple suppliers, price/ performance, how much you can fix in software, how much you need to make sure you are error checking vs speed, capacity vs. features, etc. It sounded like a fun multivariable calculus problem.

The other thing that was cleared up to me was the nature of the product. DoesFusion IO make one product thats a hard drive and the other one a memory cache? No. Physically, its one product. But you can license software to give it more features. You’ll hear messaging of ioTurbine and DirectCache from them. Those marketing terms describe software functions you can put on top of the ioDrive2 by licensing software. ioTurbine is for VMs and DirectCache is for bare metal. Its essentially makes the card act as memory cache for the VM or physical machine.

And this is where I suspect Fusion IO will continue to innovate: Software on the NAND flash. How to make it more useful and do more things.

Fusion IO Tradeoffs

Like every technology, there are tradeoffs and no single technology is going to solve all your data center needs. Isn’t that why we pay architects so much money? To gather all these great technologies and choose the best ones to meet the needs? Anyway, here are some tradeoffs:

Price: Its no mystery that Fusion IO drives aren’t super cheap. You can buy at least 2 very nice servers for the price of the card, but that may not solve your IO problem. But if you look at it that you can instead buy Fusion IO rather than some supped up disk array, then it might actually be cheaper. In fact, they showed a case studies where it over 70% cheaper than getting big storage arrays.

Redundancy and HA: If you have one card in the server, that’s a single point of failure. Now granted there are no moving parts, so the MTBF goes up, but still you are putting lots of eggs in one basket. If you have a modern application where redundancy is in the software then this isn’t going to be a problem for you. For the legacy apps ran in most data centers Fusion IO talked to us about several different solutions you could use to do HA. A lot of this sounded like what we used to do with xCAT to make it HA. We’d use DRBD and Steeleye and those were the same things we were told about by Fusion IO.

Now there’s no reason you can’t buy two or more of these cards and put them in the same server and then just use software to RAID them together, but you’re not going to be able to do that in a B200 M3. Further more, you’ll want to sync blocks between drives. Fusion IO recognizes that people want this and that’s why ioN is a product that I think we’ll see lots more from. (more on that in a second)

Capacity vs. Performance: 750GB drive is not too far away from the 1TB drives I can put in my servers. Fusion IO told us about an Oracle survey where 56% of the big data clusters had less than 5TB of capacity. That doesn’t sound like big data does it? But big data isn’t really so much about size of the file as it is to gaining insight into lots of transactions and data points where each individual record can be quite small. And in that game, performance is everything. So even though you can’t get as much capacity on the Fusion IO drives, you can hopefully get the working set on there. They showed examples where entire databases were run off the cards. They also showed that in tiered storage designs the cards form yet another (or alternative?) tier by keeping most recently used data closer to the processor.

Shared Storage is still in vogue: Most of the customers I work with have a shared SAN that all the servers have access to. Fusion IO cards are directly attached to individual servers. Fusion IO addresses this with its ioN product which is essentially a shared block storage device created with standard servers and Fusion IO cards. ioN then presents itself as an iSCSI or Fibre Channel Target. It can be used in conjunction with a SAN as a storage accelerator.

The trends we have been hearing about lately show that distributed storage in commodity servers is the future. Indeed, Gary one of the presenters mentioned that as well. That would work very well for Fusion IO. But this requires software. Software Defined Storage. (see what I did there?) Either something like Hadoop, Lustre, GPFS NSD could work on this today but probably not in the way people want for generic applications. ioN right now only supports up to 3 servers. (Sounds like VMware’s VSA doesn’t it?) I think this technology shows great promise, but its not going to be able to replace the SAN in the data center right now.

TL;DR

Fusion IO is having tremendous success in the market place. I like the Cisco andFusion IO partnership because it adds to Cisco’s storage portfolio partnerships and gives Cisco UCS users more options.

The thing that got me most excited was the ioN product. By allowing the common man to build your own Violin memory / Texas Memory systems array with commodity servers, we’re getting more choices in how we do our storage. It still has a bit to go before it can really replace a traditional storage array. It doesn’t have snapshotting, de-duplication, and all those other cool features that your traditional storage has. But just imagine:

– What if you could add SSDs and Spinning drives to your commodity servers along with the Fusion IO cards and ioN allowed you to use that as well?

– What if you could then had software that could do that auto tiering of putting most used data at the fastest Fusion IO cards?

– What if you added Atlantis iLIO for the deduplication in software to get that feature into ioN?

All of this points to one trend: Software is the data center king. Its got to have a good hardware design underneath, but when it comes down to it, Fusion IO is faster because its software is more efficient.

TL;DR on the TL;DR

Software defined everything is the king: but even the king needs a solid hardware architecture.