Permabits and Petabytes
OEM Data Optimization Solutions for Next Generation Storage
Compression and Dedupe: Business Value and Data Safety
Published by Tom Cook | Filed under Tom Cook, CEO
Many people are saying our Albireo embedded OEM deduplication changes the storage landscape. I am gratified by the response to Albireo by analysts (and here) the press (and here), by recent OEM adoption. Albireo is becoming a standard in data optimization because the product provides maximum business value without data safety compromise. Albireo also works well in combination with compression and over the past several months we’ve often been asked about the relative benefits of compression and deduplication by storage vendors as they consider these complementary data optimization choices. I wanted to share with you our view of these two complementary technologies and how they measure up in two vital areas: OEM business value and enterprise data safety.
First let’s look at the basics of compression.
Compression works only in file based storage compressing each file, but does not function across files.
Compression identifies redundant data across a very small window, usually 64 KB.
Compression produces data reduction rates at most 2X for most data types.
Compression alters the underlying data structures and requires compression and decompression of data.
Compression operates in the data path and impacts read/write performance as a ‘bump in the wire’ (kudos to Storwize for their work to improve performance).
Compression is a potential single point of failure for data retrieval.
Given these attributes, compression can provide a level of data optimization for NAS systems, but the only safe way to implement compression is as an embedded feature in the storage software - deployed, owned and managed by the storage vendor (not OEMed or deployed as a third-party stand-alone appliance in the read/write path). Embedded compression is starting to take hold with NAS OEMs (EMC and IBM speculation) and we expect to see widespread future adoption. Again, Albireo works well with compression, so we support this incremental data optimization move.
And now let’s look at Albireo embedded data deduplication. (See Jered Floyd’s detailed blog on the key attributes for a high performing data optimization solution here.) Here’s how Albireo stacks up:
Albireo supports block, file and unified storage architectures.
Albireo dedupes data across the entire storage pool, up to petabytes of data.
Albireo produces 3.75-100X data reduction for typical enterprise data types.
Albireo doesn’t alter the underlying data structures.
Albireo operates out of the data path with no impact to read and write performance.
Albireo operates as an inline, parallel or post-process operation and is never a failure point for the storage system.
So let’s summarize the comparison of Albireo embedded data dedupe and compression technologies in terms of data safety and business value.
| Albireo Dedupe | Compression | ||||
| Data Safety Impact | |||||
| Alters Data |
NO |
|
YES |
||
| In Data Path |
NO |
|
YES |
||
| Requires De-/Re-Hydration |
NO |
|
YES |
||
|
|
|
|
|||
| Business Value Impact |
|
|
|
||
| Optimizes Block |
YES |
|
NO |
||
| Optimizes File |
YES |
|
YES |
||
| Optimizes Unified |
YES |
|
YES |
||
| Reduction Range |
3.75-100X |
|
2X |
||
Albireo deduplication outperforms compression for data reduction by an order of magnitude and insures enterprise class data safety. Superior business value and data safety - that is Albireo data optimization for primary storage.
In my next post, I’ll discuss how combining Albireo embedded dedupe and traditional compression provides best of class data optimization.



June 21st, 2010 at 8:39 pm
[...] than leave a lengthy comment on Tom Cook’s blog post from Friday Compression and Dedupe: Business Value and Data Safety (and from a marketing perspective, Friday’s are bad days to post blogs – especially in the [...]
June 22nd, 2010 at 1:04 pm
Tom, I took a few minutes to reply to Steve K’s response to your post. My response can be read here: http://www.thestoragealchemist.com/marketing-fud-and-doing-what-you-do-best/
To be fair, I’m going to respond briefly to yours as well.
Intrafile (traditional) compression can be applied to files (and collections of files using MFC) much larger than 64KB (in fact, GBs and up). Where did you get the 64?
I’m not sure where you get the 2x figure either. Are you claiming 2X efficiency in the context of a data center or an individual file? I ask because we both know traditional lossless compression is capable of far better than 2X with many file types.
I have no idea what you mean by “Compression alters the underlying data structures and requires compression and decompression of data.” That makes no sense at all. Lossless compression and dedupe are virtually identical in their handling of commonality. Both use dictionaries and both require the reconstitution of files from those dictionaries. The major difference is that one is intrafile while the other is interfile.
Only compression on-the-fly is executed “in the data path” so I am not sure why you brought that up.
You also wrote that “Compression is a potential single point of failure for data retrieval.” In what sense (compared to dedupe)?
Mind you, I am a big fan of interfile compression and believe it should be embedded (in storage) and used along with intrafile compression.
June 22nd, 2010 at 5:10 pm
Good points Joe, thanks. Please see Jered’s blog for the technical comments. We agree on the advantages of embedded compression and dedupe. I’ll take it a step further, as a read path process, compression should be owned, managed and deployed by the storage vendor. Otherwise, data lock-in with a third party technology is an issue. Albireo embedded deduplication is out of the read path, so there is never data lock-in for customers of our OEM partners.
P.S. – I presume this is very supportive of the future Storwize business model.
June 23rd, 2010 at 7:08 am
Tom,
Thanks for the comments on my blog. I also thought I would add a note here as well.
I was thinking about this last night. We spoke before about making sure we are always looking out for the customer. I have to agree with Joseph on his take about some of your technical points and I just wondering if you were going to add some color to your table or work with your technical marketing team to update it?
Thanks sir!
Steve