{"id":669,"date":"2012-12-17T23:12:42","date_gmt":"2012-12-18T05:12:42","guid":{"rendered":"http:\/\/benincosa.com\/blog\/?p=669"},"modified":"2014-11-19T11:24:33","modified_gmt":"2014-11-19T17:24:33","slug":"fusion-io-software-defined-storage","status":"publish","type":"post","link":"https:\/\/benincosa.com\/?p=669","title":{"rendered":"Fusion IO: Software Defined Storage"},"content":{"rendered":"<p><!--?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?-->Originally Posted Dec 14, 2012<\/p>\n<p>This week I was very privileged to go to Salt Lake City to\u00a0<strong>Fusion<\/strong> <strong>IO<\/strong> headquarters and get a deep dive on their technology and how it differentiates from other competitors in the high speed, low latency storage market.\u00a0\u00a0(Which is really starting to just be the general storage market these days instead of something niche.)\u00a0\u00a0It was super neat to go there with a bunch of my buddies from Cisco and I can\u2019t thank them enough for having us and treating us so go.\u00a0\u00a0This meeting was brought about primarily because\u00a0<strong>Fusion<\/strong> <strong>IO<\/strong> and Cisco have introduced a new Mezzanine\u00a0<strong>Fusion<\/strong> <strong>IO<\/strong> card for the B series blades.\u00a0\u00a0Specifically:\u00a0\u00a0The B200 M3 and B420 M3.\u00a0\u00a0The card is the same as the ioDrive2\u00a0\u00a0but in Cisco blade server form factor.\u00a0\u00a0This first version is 758GB drive.\u00a0\u00a0We had a great time and and learned a ton.<\/p>\n<p><strong>Fusion<\/strong> <strong>IO<\/strong>\u2019s main product is branded as ioMemory.\u00a0\u00a0Its marketed as a new memory tier.\u00a0\u00a0The co-founder, David Flynn had the idea of taking Flash Memory,\u00a0\u00a0putting it on a PCI card, slap a basic controller on it and putting it into the server.\u00a0\u00a0The server would then see this as a hard drive.\u00a0\u00a0By not using legacy protocols of SAS or SATA and using their own protocol with software, they were able to get\u00a0<strong>IO<\/strong> latency down to microseconds from milliseconds.\u00a0\u00a0Couple this with flash memory and it translates to more<a href=\"http:\/\/en.wikipedia.org\/wiki\/IOPS\"> IOPS<\/a>, which means applications that normally have to wait for disk reads and writes can do it on orders of magnitudes faster.\u00a0\u00a0One of the examples they cited said that customers were getting 20 times the performance with these drives compared to using standard disk drive arrays.\u00a0\u00a0(Not 20% better, but 20 x)\u00a0\u00a0<a href=\"http:\/\/en.wikipedia.org\/wiki\/IOPS\">From that above linked Wikipedia article<\/a> it shows that fastest 15k RPM hard drive will only get around 200 IOPS.\u00a0\u00a0Compare that to a\u00a0<strong>Fusion<\/strong> <strong>IO<\/strong> card that same article shows 140,000 IOPS.\u00a0\u00a0(My notes also say that they are getting 500,000 IOPS, I\u2019m not sure which is correct but the idea is that its blazing fast.)<\/p>\n<p>If you aren\u2019t familiar with the state of the data center, and I commented on this in my<a href=\"http:\/\/benincosa.com\/blog\/?p=643\">VMworld 2012 post<\/a>, storage is one of the biggest problem.\u00a0\u00a0Numerous blogs, articles, and talks show that storage is the biggest bottle neck and the largest expense in the data center.\u00a0\u00a0Duncan Epping commented at VMworld on the topic of performance that \u201cThe network is usually blamed but storage is usually the problem\u201d.\u00a0\u00a0 There is a famine happening in our data centers.\u00a0\u00a0Applications are starved for data.\u00a0\u00a0They are waiting and waiting to get their data from disks.\u00a0\u00a0Applications today are like a bunch of hungry children at home whose moms went to the store to get food and are all stuck in traffic with other moms taking a long time to make that round trip.\u00a0\u00a0Storage\u00a0<strong>IO<\/strong> performance has not kept up with the spectacular rate that processing power has improved over the last decade or so.<\/p>\n<p>What we have been doing for the last 10 years (me personally and others) is designing storage systems that we think will meet the performance requirements.\u00a0\u00a0When we get up and running we soon find that the storage system doesn\u2019t meet the performance needs so we throw more disks at them until it does.\u00a0\u00a0Soon we have tons more capacity than we need and a bigger footprint.\u00a0\u00a0I\u2019m not alone in this.\u00a0\u00a0This is standard practice.\u00a0\u00a0Commenting on this, Jim Dawson, the Vice President of world wide sales wrote for all of us to see:\u00a0\u00a0DFP = RIP.\u00a0\u00a0This he said means:\u00a0\u00a0Disks for performance is dead.\u00a0\u00a0He also mentioned that his customers when he was at 3PAR were adding so many disks for performance that they asked him to make smaller disk sizes because they didn\u2019t need capacity, they needed performance.<\/p>\n<p><strong>Flash Memory to the rescue<\/strong><\/p>\n<p>The reason you are probably hearing so much about flash memory now and not before is because the price of flash memory has fallen below the price of DRAM (the kind of memory that when you pull the power power it, it forgets everything that was in it).\u00a0\u00a0Flash memory, specifically NAND flash, is the flash that\u2019s used in\u00a0<strong>Fusion<\/strong> <strong>IO<\/strong>, SSDs,\u00a0\u00a0SSD arrays, and pretty much everything you see out there that\u2019s called flash storage.\u00a0\u00a0 This type of memory when you pull the power doesn\u2019t forget which bits were flipped to ones or zeros.\u00a0\u00a0NAND flash are the building blocks for nearly all the fast storage you\u2019ve been hearing about.\u00a0\u00a0From\u00a0\u00a0people making USB thumb drives, SSDs, PCI SAS, Violin, or Texas Memory Systems (now IBM) and make arrays with them using their own controllers, they\u2019re all using NAND flash.<\/p>\n<p>The difference is how the flash is accessed.\u00a0\u00a0SSDs go through the SAS or SATA controllers that add significant over head.\u00a0\u00a0That makes it slower since those are legacy protocols used for hard drive technology.\u00a0\u00a0But if you have one in your mac book pro like I hope to have soon, then you are not complaining and its just fine.\u00a0\u00a0Most of the Flash storage solutions out there are based on using SAS\/SATA protocols to access flash storage: Nimble storage, whiptail, etc. Its more simple to develop because the protocols are already defined and they can concentrate on value add at the top, like putting more protocols or better management tools in it.<\/p>\n<p><strong>Fusion<\/strong> <strong>IO<\/strong> has two advantages over these technologies.\u00a0\u00a0First, since they are on the PCI bus, they are closer to the processor so its much faster.\u00a0\u00a0Second, they don\u2019t have the overhead of a controller translating older protocols.\u00a0\u00a0There\u2019s a driver that sits on the OS that manages it all.\u00a0\u00a0Since they don\u2019t go through the standard protocols they can also add better monitoring tools and even add more on top of that to innovate cool solutions.\u00a0\u00a0(ioTurbine is an example of this that I\u2019ll get to in a minute)<\/p>\n<p><strong><strong>Fusion<\/strong> <strong>IO<\/strong> secret sauce<\/strong><\/p>\n<p>The ioDrive2 card is main product.\u00a0\u00a0Its a PCIe card with a bunch of 25nm NAND flash chips on it.\u00a0\u00a0We had this amazing\u00a0<strong>Fusion<\/strong> <strong>IO<\/strong> engineer named Bob Wood come in and talk to us about how it works.\u00a0\u00a0He schooled us so hard I thought I was back in<a href=\"http:\/\/berkeley.edu\/index.html\">college<\/a>.\u00a0\u00a0We were worried we were going to get more marketing but in the words of @ciscoservergeek: Our expectations were amazingly surpassed.<\/p>\n<p>Flash memory has what\u2019s called an Erase Block.\u00a0\u00a0This is the smallest atomic unit that can be written.\u00a0\u00a0As flash gets smaller having 3 or more electrons leave, or somehow get disturbed will cause the erase block to flip a bit and be wrong.\u00a0\u00a0The controller software is then always looking to make sure things are still the way they should be.<\/p>\n<p>A standard\u00a0<strong>fusion<\/strong> <strong>IO<\/strong> card is built in with about 20% of spare capacity that is used for when erase blocks get contaminated or flipped too many times.\u00a0\u00a0Bob equated it to standing on top of a mountain and being struck by lightning.\u00a0\u00a0There\u2019s only so many times you can be struck by lightning and still go on.\u00a0\u00a0(Apparently NAND flash can handle it more than humans).\u00a0\u00a0When one of these erase blocks is retired, the card draws from the 20% pool.\u00a0\u00a0In addition, other erase blocks are reserved for features to handle more error checking.\u00a0\u00a0More official information on this<a href=\"http:\/\/www.fusionio.com\/blog\/adaptive-flashback\/\"> \u201cAdaptive Flashback\u201d is here<\/a>.<\/p>\n<p>I asked then:\u00a0\u00a0So if I have a 750GB card, do I only get to see 600GB of space?\u00a0\u00a0No, the 20% overhead plus other reserved pools is in addition to the 750GB, so you will see that much capacity.\u00a0\u00a0I imagine that the raw capacity is probably from 900GB to 1TB.<\/p>\n<p>Bob told us that the design of the card is the classic engineering tradeoff design and finding the ultimate efficiency.\u00a0\u00a0You have to worry which NAND flash you use, multiple suppliers, price\/ performance, how much you can fix in software, how much you need to make sure you are error checking vs speed, capacity vs. features, etc.\u00a0\u00a0It sounded like a fun multivariable calculus problem.<\/p>\n<p>The other thing that was cleared up to me was the nature of the product.\u00a0\u00a0Does<strong>Fusion<\/strong> <strong>IO<\/strong> make one product thats a hard drive and the other one a memory cache?\u00a0\u00a0No.\u00a0\u00a0Physically, its one product.\u00a0\u00a0But you can license software to give it more features.\u00a0\u00a0You\u2019ll hear messaging of ioTurbine and DirectCache from them.\u00a0\u00a0Those marketing terms describe software functions you can put on top of the ioDrive2\u00a0\u00a0by licensing software.\u00a0\u00a0ioTurbine is for VMs and DirectCache is for bare metal.\u00a0\u00a0Its essentially makes the card act as memory cache for the VM or physical machine.<\/p>\n<p>And this is where I suspect\u00a0<strong>Fusion<\/strong> <strong>IO<\/strong> will continue to innovate: Software on the NAND flash.\u00a0\u00a0How to make it more useful and do more things.<\/p>\n<p><strong><strong>Fusion<\/strong> <strong>IO<\/strong> Tradeoffs<\/strong><\/p>\n<p>Like every technology, there are tradeoffs and no single technology is going to solve all your data center needs.\u00a0\u00a0Isn\u2019t that why we pay architects so much money?\u00a0\u00a0To gather all these great technologies and choose the best ones to meet the needs?\u00a0\u00a0Anyway, here are some tradeoffs:<\/p>\n<p><strong><em>Price<\/em><\/strong>: Its no mystery that\u00a0<strong>Fusion<\/strong> <strong>IO<\/strong> drives aren\u2019t super cheap.\u00a0\u00a0You can buy at least 2 very nice servers for the price of the card, but that may not solve your\u00a0<strong>IO<\/strong> problem.\u00a0\u00a0But if you look at it that you can instead buy\u00a0<strong>Fusion<\/strong> <strong>IO<\/strong> rather than some supped up disk array, then it might actually be cheaper.\u00a0\u00a0In fact, they showed a case studies where it over 70% cheaper than getting big storage arrays.<\/p>\n<p><strong><em>Redundancy and HA<\/em>: <\/strong>If you have one card in the server, that\u2019s a single point of failure.\u00a0\u00a0Now granted there are no moving parts, so the MTBF goes up, but still you are putting lots of eggs in one basket.\u00a0\u00a0If you have a modern application where redundancy is in the software then this isn\u2019t going to be a problem for you.\u00a0\u00a0For the legacy apps ran in most data centers\u00a0<strong>Fusion<\/strong> <strong>IO<\/strong> talked to us about several different solutions you could use to do HA.\u00a0\u00a0A lot of this sounded like what we used to do with xCAT to make it HA.\u00a0\u00a0We\u2019d use DRBD and Steeleye and those were the same things we were told about by\u00a0<strong>Fusion<\/strong> <strong>IO<\/strong>.<\/p>\n<p>Now there\u2019s no reason you can\u2019t buy two or more of these cards and put them in the same server and then just use software to RAID them together, but you\u2019re not going to be able to do that in a B200 M3.\u00a0\u00a0Further more, you\u2019ll want to sync blocks between drives.\u00a0\u00a0<strong>Fusion<\/strong> <strong>IO<\/strong> recognizes that people want this and that\u2019s why ioN is a product that I think we\u2019ll see lots more from.\u00a0\u00a0(more on that in a second)<\/p>\n<p><strong>Capacity vs. Performance: <\/strong>750GB drive is not too far away from the 1TB drives I can put in my servers.\u00a0\u00a0<strong>Fusion<\/strong> <strong>IO<\/strong> told us about an Oracle survey where 56% of the big data clusters had less than 5TB of capacity.\u00a0\u00a0That doesn\u2019t sound like big data does it?\u00a0\u00a0But big data isn\u2019t really so much about size of the file as it is to gaining insight into lots of transactions and data points where each individual record can be quite small.\u00a0\u00a0And in that game, performance is everything.\u00a0\u00a0So even though you can\u2019t get as much capacity on the\u00a0<strong>Fusion<\/strong> <strong>IO<\/strong> drives, you can hopefully get the working set on there.\u00a0\u00a0They showed examples where entire databases were run off the cards.\u00a0\u00a0They also showed that in tiered storage designs the cards form yet another (or alternative?) tier by keeping most recently used data closer to the processor.<\/p>\n<p><strong>Shared Storage is still in vogue<\/strong>: Most of the customers I work with have a shared SAN that all the servers have access to.\u00a0\u00a0<strong>Fusion<\/strong> <strong>IO<\/strong> cards are directly attached to individual servers.\u00a0\u00a0<strong>Fusion<\/strong> <strong>IO<\/strong> addresses this with its ioN product which is essentially a shared block storage device created with standard servers and\u00a0<strong>Fusion<\/strong> <strong>IO<\/strong> cards.\u00a0\u00a0ioN then presents itself as an iSCSI or Fibre Channel Target.\u00a0\u00a0It can be used in conjunction with a SAN as a storage accelerator.<\/p>\n<p>The trends we have been hearing about lately show that distributed storage in commodity servers is the future.\u00a0\u00a0Indeed, Gary one of the presenters mentioned that as well.\u00a0\u00a0That would work very well for\u00a0<strong>Fusion<\/strong> <strong>IO<\/strong>.\u00a0\u00a0But this requires software.\u00a0\u00a0Software Defined Storage.\u00a0\u00a0(see what I did there?) Either something like Hadoop, Lustre, GPFS NSD could work on this today but probably not in the way people want for generic applications.\u00a0\u00a0ioN right now only supports up to 3 servers.\u00a0\u00a0(Sounds like VMware\u2019s VSA doesn\u2019t it?)\u00a0\u00a0I think this technology shows great promise, but its not going to be able to replace the SAN in the data center right now.<\/p>\n<p><strong>TL;DR<\/strong><\/p>\n<p><strong>Fusion<\/strong> <strong>IO<\/strong> is having tremendous success in the market place.\u00a0\u00a0I like the Cisco and<strong>Fusion<\/strong> <strong>IO<\/strong> partnership because it adds to Cisco\u2019s storage portfolio partnerships and gives Cisco UCS users more options.<\/p>\n<p>The thing that got me most excited was the ioN product.\u00a0\u00a0By allowing the common man to build your own Violin memory \/ Texas Memory systems array with commodity servers, we\u2019re getting more choices in how we do our storage.\u00a0\u00a0It still has a bit to go before it can really replace a traditional storage array.\u00a0\u00a0It doesn\u2019t have snapshotting, de-duplication, and all those other cool features that your traditional storage has.\u00a0\u00a0But just imagine:<\/p>\n<p>&#8211; What if you could add SSDs and Spinning drives to your commodity servers along with the\u00a0<strong>Fusion<\/strong> <strong>IO<\/strong> cards and ioN allowed you to use that as well?<\/p>\n<p>&#8211; What if you could then had software that could do that auto tiering of putting most used data at the fastest\u00a0<strong>Fusion<\/strong> <strong>IO<\/strong> cards?<\/p>\n<p>&#8211; What if you added Atlantis iLIO for the deduplication in software to get that feature into ioN?<\/p>\n<p>All of this points to one trend:\u00a0\u00a0Software is the data center king.\u00a0\u00a0Its got to have a good hardware design underneath, but when it comes down to it,\u00a0<strong>Fusion<\/strong> <strong>IO<\/strong> is faster because its software is more efficient.<\/p>\n<p><strong>TL;DR on the TL;DR<\/strong><\/p>\n<p>Software defined everything is the king: but even the king needs a solid hardware architecture.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Originally Posted Dec 14, 2012 This week I was very privileged to go to Salt Lake City to\u00a0Fusion IO headquarters and get a deep dive on their technology and how it differentiates from other competitors in the high speed, low latency storage market.\u00a0\u00a0(Which is really starting to just be the general storage market these days&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[155,992],"tags":[991,998,993,156],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/benincosa.com\/index.php?rest_route=\/wp\/v2\/posts\/669"}],"collection":[{"href":"https:\/\/benincosa.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/benincosa.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/benincosa.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/benincosa.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=669"}],"version-history":[{"count":4,"href":"https:\/\/benincosa.com\/index.php?rest_route=\/wp\/v2\/posts\/669\/revisions"}],"predecessor-version":[{"id":673,"href":"https:\/\/benincosa.com\/index.php?rest_route=\/wp\/v2\/posts\/669\/revisions\/673"}],"wp:attachment":[{"href":"https:\/\/benincosa.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=669"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/benincosa.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=669"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/benincosa.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=669"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}