Contributed by Alexander Best, Director Technical Business
Development, DataCore Software
The ongoing hype about data deduplication for online storage inspired me to have a look at it from a slightly different angle. The promises given by some vendors to save 50% capacity or more especially made my internal alarm bells ring about some aspects of modern storage architecture.
DataCore projects in the last 5 to 10 years show that capacity constraints are seldom the major problem, but why is that?
This appears to be because of our thin-provisioning technology, which allows the user to present much more virtual capacity than he/she really has available in physical disks. The possibility to oversubscribe the storage reduces the average storage overhead from about 10% to 15% (which in traditional thick-provisioned SANs easily consumes up to 60% of the total capacity). Therefore, thin provisioning delivers 45% to 50% of net capacity savings on average.
Interestingly enough, deduplication promises to deliver similar results, but the major difference between thin provisioning and deduplication is the way capacity saving is achieved.
Thin provisioning deflates the LUNs and eliminates the “hot air“ in individual storage LUNs. The technology only stores user data on the storage devices, so blank spaces no longer occupy disk space. Any block of user data has its own individual location in the backend storage devices and keeps this location occupied as long as the data exists. This leads to predictable performance over long periods of time. If data gets deleted, thin provisioning is able to detect the free block areas and detach the blocks from the virtual LUNs without downtime. By doing so, an inflated LUN is easily deflated later in the background while the data is accessed.
Deduplication, on the other hand, detects redundant user data and stores the redundant data sets only once, while leaving a pointer at the previous block position pointing back to the stored data block. The technology is sometimes called single instancing, because only a single instance of the data image is held on disk. Applications accessing the data are redirected by an intelligent linking mechanism to the individual storage blocks that hold the deduplicated data sets. The application sees the data through a kind of filter which lets the total data set appear like individual blocks, while the access is redirected to a consolidated set of blocks.
Modifications (write updates) to a deduplicated data block require copying the data block into a separate location, modifying the pointer tables and finally overwriting the data at the newly created location. So block overwrites lead to duplicated (inflated) data which later on may be deduplicated (deflated) again. This logic has some built-in complexity and requires ongoing monitoring and reorganization of the data content on disk. The process eats up a significant number of CPU cycles on the storage controller when done in real-time. Additionally, deduplicated online storage requires a significant number of IOPS to perform the permanent block-level reorganization caused by everyday operations, but these IOPS steal resources from applications accessing the storage.
The concept of deduplication sounds like the perfect model to solve “world storage hunger” and reduce storage capacity to what is essentially needed. Unfortunately, the laws of physics and the current development in disk technology contradict this thinking. Solutions for backup and archiving do not necessarily make a lot of sense for online, high-performance data sets. Online deduplication causes some major issues which you don’t learn about by reading nice marketing spec sheets and listening to sales presentations. We don’t focus here on data integrity questions (hash collisions) and limited efficiency (compressed file systems). We just concentrate on the performance impacts of deduplicated data sets.
To explain what I mean, I first have to introduce some baseline numbers to make it clearer. Magnetic storage (spinning disk) development over the past several years has delivered constantly increasing capacity per spinning disk, but stable rotational speeds, access times and transfer rates.
Disk Type | Rotational Speed | Throughput | IO Performance | Latency |
ATA 3.5“ 7k | 7,200 rpm | 50 MB/s | 80 IO/s | 12ms |
FC 3.5“ 10k | 10,000 rpm | 60 MB/s | 100 IO/s | 6 ms |
FC 3.5“ 15k | 15,000 rpm | 100 MB/s | 160 IO/s | 4 ms |
SAS 2.5“ 10k | 10,000 rpm | 100 MB/s | 160 IO/s | 4ms |
SAS 2.5“ 15k | 15,000 rpm | 140 MB/s | 220 IO/s | 3ms |
The values in the above table are average values. Depending on vendor and spindle cache capacity the individual values may vary a bit. In general, the numbers are pretty much defined by physical limitations of the magnetic hard disk technology. In the past 20 years the above numbers have not changed much, but capacity per spindle has exploded. Disk drives in the last years of the 20th century had capacities of 4 GB to 36 GB, but today we see capacities of up to 1.2 TB for SAS and 10 TB for S-ATA/NL-SAS drives. Looking at this capacity growth, a new picture appears in the table below.
Disk Type | Capacity 2000 | IO/GB 2000 | Capacity 2010 | IO/GB 2010 | Capacity 2017 | IO/GB 2017 |
ATA 3.5“ 7k | 120 GByte | 0.66 | 2 TByte | 0.04 | 10 TByte | 0.008 |
FC 3.5“ 10k | 36 GByte | 2.78 | 600 GByte | 0.17 | ./. | ./. |
FC 3.5“ 15k | 18 GByte | 8.89 | 300 GByte | 0.53 | ./. | ./. |
SAS 2.5“ 10k | ./. | ./. | 300 GByte | 0.53 | 1.2 TByte | 0.13 |
SAS 2.5“ 15k | ./. | ./. | 146 GByte | 1.51 | 600 GByte | 0.37 |
Doesn’t this look alarming? Especially when we think about the fact that disk capacity is going to increase even further. An application with a capacity requirement of 200 GB, which in 2000 required 12 spindles in an average RAID-5 disk array, in 2017 fits on a fraction of an array of 3 spindles. Doing the math reveals that the same application today only gets about 8% of the IO performance compared to what was available in 2000.
These numbers get even more dramatic when you think about deduplication running on this already more than 240% slower storage architecture. The concurrent access to deduplicated data patterns reduces the IO resources by an additional 50%, given the fact dedupe really gets to 50% savings. This is because each spindle has to serve about double the number of IO requests per GB when any data block is found to be redundant to cope with the linked IO architecture. Furthermore, monitoring redundant data sets and required reorganization tasks inside the disk unit take up their own additional quantum of IO resources.
To summarize, the remaining IO performance available to applications today would be less than 20% of the 2000 baseline. The ever-increasing computing power makes this situation worse, and the gap between storage and compute processing gets wider and wider.
On the surface, a way out of this dilemma appears to be Solid State Disk (SSD) technology. The achievable IO and transfer rates sound unbelievable. SSDs have no moving parts and so they can really do it. Compared to classical spinning disk media this is a dream come true. Latencies of <0.2ms and performance beyond 100,000 IOPS are a reality. Anything goes! But there’s a catch – yes, at least two of them.
First of all, SSD technology is very pricy per GB and so you find SSD still only in homeopathic doses in current storage implementations. Second, and even more disconcerting, is the fact that SSDs based on NAND-flash die over time. Writing to NAND-flash cells wears them out. Each individual cell can be overwritten about 10,000 times (lower-cost MLC technology) or 100,000 times (expensive SLC technology). Depending on the application write profile, the intelligence of the wear- leveling algorithm of the SSD, and the cell type, it takes more or less time until the SSD is going to fail and cause write errors. For deduplication and the associated write-cache usage, SSD is no real solution.
So, deduplication definitely is not the key to efficient and performance-oriented storage management. A possible solution to solve today’s major storage issues is thin provisioning in combination with a method for intelligent data placement. This intelligent data placement technology is called Automatic Data Tiering, or in short, auto tiering.
DataCore thin-provisioned storage pools are made up of storage devices of different speeds and sizes, potentially from different array and storage manufacturers. The storage administrator classifies which technology makes up which tier. Depending on access frequency and a rich policy set, the data blocks associated with an individual virtual disk are essentially spread out across some or all tiers of a tiered storage pool. On-demand changes in the access profile or administrative instructions may influence how data is stored over time, and cause blocks to move from tier to tier. Hot data bubbles up to faster premium-priced tiers, while cold spots are moved to slower, lower-cost tiers. Now, it is possible to use the strength of SSD where it is needed, but compensate for its weaknesses (capacity and price) by only using them for data that really deserves it. The technology works on a sub-LUN level, and so allows hybrid storage layouts on the individual LUN, for example a database volume. A single SSD is able to substitute up to 120 spinning disks. Allocating them dynamically offers a major benefit on total storage efficiency, cost per GB and data center footprint (rack space and carbon dioxide) without any performance compromise.