We recently came across an interesting article by our good friend, George Crump of Storage Switzerland, which appeared in Storage Newsletter earlier this month. The article, “SSD Can Now Achieve HDD Price Parity” discusses the ability for SSDs to achieve price parity with HDDs today by abandoning HDD form factors in favor of less-expensive, proprietary designs. Crump makes a number of interesting points in his article; however, I believe he is overlooking a couple of key ideas, as well as making incorrect assumptions about data center adoption, $/IOPS calculation, and the impact of waiting for the HDD/SSD exchange negating the advantages of the SSD investment.
First and foremost, data center storage infrastructure will never become predominantly solid-state based. If there was reason to consider the question of a 100% SSD data center, there would already be 100% 15K RPM HDD data centers. The reality is that there aren’t. Data centers will not migrate to a “high-SSD density data center” because there is no need. Current storage system architectures allow near-equivalent levels of performance at much lower costs using a blend of devices with tiered levels of performance. However, I agree that if $/GB was at parity for SSDs and HDDs then there’d be no reason not to use 100% SSDs, but that is an unrealistic expectation. Ultimately, the nature of the underlying technologies would prohibit $/GB parity and storage system architectures will continue to take advantage of the higher costs of performance in cost-effective ways.
Another point that I would like to expand upon from Crump’s article is the $/IOPS calculation. Crump makes the assumption of a “homogeneous” HDD universe. If all HDD spindles were the same, $/IOPS could indeed be argued to be a “rationalization” on the part of the SSD supplier. The reality is that all high-performance data centers are composed of tiers of storage. This is due in large part to the fact that HDDs have for a long time been segregated on a $/IOPS basis themselves. A 15K RPM HDD costs much more in $/GB than a 7200 or 5400 RPM HDD. Why do customers pay that premium? Customers pay that premium because the 15K RPM HDD delivers a better value in $/IOPS. To mitigate the costs of these more expensive spindles, the “bulk store” components are the lowest $/GB HDDs the data center operator can use. In this context, the SSD is just another “tier” on the performance ladder, a step above 15K RPM HDDs.
The question is not “When will SSDs replace all HDDs?” The question is “When will SSDs replace all upper-tier (10K/15K) HDDs?” The answer is that we are right around the corner of seeing data centers highly filled with top-storage-tier SSD.
A significantly lower number of SSD drives will deliver the same level of performance of a larger array of HDDs. This is a fundamental tenant. If the HDD array’s capacity is underutilized due to short stroking, there is no ambiguity, the $/GB ratio will be in favor of the SSD. What cannot be ignored are the peripheral costs of ownership – floor space, power, cooling, etc. Peripheral costs are real costs that weigh more heavily against the HDD configuration. Especially for large data centers, peripheral costs are significant. Even if the $/GB ratio doesn’t outright favor the SSD, SSDs can still come out ahead when the TCO is calculated.
My last point that I would like to contest Crump on is his belief that the performance impact of waiting for the exchange of HDD to SSD can negate the advantages of the SSD investment altogether. The factor of merit is the cache hit ratio. Data accesses are rarely totally random. Humans are the ones who are using these systems, and human factors dictate that access patterns have a fair degree of uniformity – think Google search. There will always be hot and cold data, which is why we cache. We can expect that we will continue to cache effectively until human nature or the objectives of our databases change.
One fundamental consideration that was ignored throughout the article was the reality that the more parts in a single “device,” the sooner it can be expected to fail. Appliance devices have very high component counts. To address the implicitly lower reliability of these systems, they must include a fair amount of redundancy. Even so, when an appliance fails, the entire appliance will need replacement. Distributing the reliability factor across a larger number of smaller devices is likely to prove much more economical in the long term. The aggregated “failure rate per GB” (MTBF) would be the same, but the costs of replacing individual “HDD-form-factor” SSDs is substantially lower, both in terms of the actual HW and the often-overlooked-but-arguably-more-important repair time (MTTR). Not to mention that a larger percentage of the data center goes off-line when an appliance fails, than when a single SSD fails.
In the end, the article brings up several good points. Crump’s arguments, at a glance, may seem like obvious truths but after taking a closer look at the points made in the article, one would see that the rationalization is actually less valid than first meets the eye.
- Randy Cohen, Principal Engineer, SMART Storage Systems