<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>

<channel>
	<title>Permabits and Petabytes &#187; Jered Floyd, CTO</title>
	<atom:link href="http://blog.permabit.com/index.php?feed=rss2&#038;cat=3" rel="self" type="application/rss+xml" />
	<link>http://blog.permabit.com</link>
	<description>OEM Data Optimization Solutions for Next Generation Storage</description>
	<pubDate>Fri, 03 Sep 2010 00:56:20 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Compression and Dedupe: Redux</title>
		<link>http://blog.permabit.com/index.php/2010/06/compression-and-dedupe-redux/</link>
		<comments>http://blog.permabit.com/index.php/2010/06/compression-and-dedupe-redux/#comments</comments>
		<pubDate>Tue, 22 Jun 2010 21:27:28 +0000</pubDate>
		<dc:creator>Jered Floyd</dc:creator>
		
		<category><![CDATA[Jered Floyd, CTO]]></category>

		<category><![CDATA[CTO]]></category>

		<category><![CDATA[dedupe]]></category>

		<category><![CDATA[deduplication]]></category>

		<category><![CDATA[primary storage]]></category>

		<guid isPermaLink="false">http://blog.permabit.com/?p=867</guid>
		<description><![CDATA[Yesterday The Storage Alchemist at Storwize posted a complaint about Tom&#8217;s discussion of compression and deduplication. We certainly aren&#8217;t savaging compression technologies &#8212; I think perhaps it&#8217;s clearer to consider our points not so much as a criticism of compression, but as a list of concerns regarding bump-in-the-wire optimization appliances. We absolutely agree with Steve [...]]]></description>
			<content:encoded><![CDATA[<p>Yesterday The Storage Alchemist at Storwize <a href="http://www.thestoragealchemist.com/marketing-fud-and-doing-what-you-do-best/" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.thestoragealchemist.com');">posted a complaint</a> about Tom&#8217;s <a href="http://blog.permabit.com/index.php/2010/06/compression-and-dedupe-business-value-and-data-safety/" >discussion of compression and deduplication</a>. We certainly aren&#8217;t savaging compression technologies &#8212; I think perhaps it&#8217;s clearer to consider our points not so much as a criticism of compression, but as a list of concerns regarding bump-in-the-wire optimization appliances. We absolutely agree with Steve the Alchemist that data compression and data deduplication are two technologies that complement one another well &#8212; we use both in our <a href="http://www.permabit.com/products/data-center-series.asp" >Enterprise Archive Value NAS</a> and <a href="http://www.permabit.com/products/cloud-storage.asp" >Cloud Storage</a> offerings , and we make it possible for our partners to compress (if they so choose) when using our <a href="http://www.permabit.com/albireo/deployment.asp" >Albireo SDK</a>.</p>
<p>I&#8217;ll comment on his technical concerns. <span id="more-867"></span></p>
<p>Compression and deduplication are very similar in that they identify and eliminate redundant data, but the scope of this duplicate identification is vastly different. Traditional compression works on a small window of data and with short duplicate segments so that the compression tables fit efficiently in a very small amount of memory. Storwize may not be using a 64 KB window, but I imagine the order of magnitude is about right&#8230; and that&#8217;s not a criticism of their technology at all. In fact, the way Storwize manages data in chunks so that they can maintain performance is very clever.</p>
<p>Calling deduplication lossy is nonsense; both compression and dedupe replace redundant data with references to other instances of that data, just at different scales as I note above. Unlike Ocarina&#8217;s NFO, which frighteningly throws away actual content, both dedupe and traditional compression return the original bitstream.  Tom&#8217;s point was that Albireo embedded dedupe leverages existing file and block system concepts to make those references so no interaction with our software is required on read, while a compression appliance modifies the data format before it reaches the storage array, which creates data lock-in. Take away the appliance, and the storage is full of uninterpretable data. That&#8217;s a concern for storage vendors and users alike.</p>
<p>As to the chart, when you look at this as &#8216;embedded dedupe&#8217; vs. &#8216;appliance-mediated compression&#8217;, you can see why Tom says that appliance compression alters the data, and Albireo dedupe does not require &#8216;rehydration&#8217;.  As for &#8216;optimizes block&#8217;, I haven&#8217;t yet seen Storwize&#8217;s block optimzation products, so I can&#8217;t comment, but I do wonder how they make the space saved to compression available to the user? We agree that savings are absolutely data dependent. In general, deduplication alone offers more savings than compression alone, and both together give the best results by far. Perhaps we can work together to ensure Albireo and Storwize yield optimal results?</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.permabit.com/index.php/2010/06/compression-and-dedupe-redux/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Albireo - Storage Optimization Realized</title>
		<link>http://blog.permabit.com/index.php/2010/06/albireo-storage-optimization-realized/</link>
		<comments>http://blog.permabit.com/index.php/2010/06/albireo-storage-optimization-realized/#comments</comments>
		<pubDate>Wed, 09 Jun 2010 14:45:41 +0000</pubDate>
		<dc:creator>Jered Floyd</dc:creator>
		
		<category><![CDATA[Jered Floyd, CTO]]></category>

		<category><![CDATA[dedupe]]></category>

		<category><![CDATA[Dedupe2.0]]></category>

		<category><![CDATA[deduplication]]></category>

		<category><![CDATA[primary dedupe]]></category>

		<category><![CDATA[primary storage]]></category>

		<category><![CDATA[primary storage deduplication]]></category>

		<guid isPermaLink="false">http://blog.permabit.com/?p=824</guid>
		<description><![CDATA[In my last post, I gave the history of Albireo and I mentioned that we came to recognize seven key attributes that are absolute requirements for an integrated primary deduplication solution.
First, Albireo supports block, file and also new unified or converged storage platforms.  By addressing all types of primary storage, we avoid leaving huge amounts [...]]]></description>
			<content:encoded><![CDATA[<p>In <a href="../../../../../index.php/2010/06/a-star-is-born/">my last post</a>, I gave the history of Albireo and I mentioned that we came to recognize seven key attributes that are absolute requirements for an integrated primary deduplication solution.</p>
<p>First, Albireo supports <strong><a href="http://www.permabit.com/albireo/architecture.asp" >block, file and also new unified or converged storage platforms</a></strong>.  By addressing all types of primary storage, we avoid leaving huge amounts of users&#8217; data unoptimized.  Additionally, next generation storage platforms put block and file data on the same underlying storage, and Albireo makes it possible to identify and deduplicate data across both.<span id="more-824"></span></p>
<p>Next, we uniquely provide the ability <strong>to scale deduplication across a pool of storage many petabytes in size</strong>, instead of limiting deduplication to smaller islands of a few terabytes.  This is critical to delivering high rates of deduplication.</p>
<p>Further, Albireo delivers <strong>sub-file, content aware deduplication</strong>.  Whole file single instancing just doesn&#8217;t cut it for common primary data files, like Office documents or virtual system images. Albireo can identify optimal boundaries in a variety of file types and then deduplicate segments as small as the storage can support.  This delivers industry-leading deduplication efficiency to our partners.</p>
<p>As I explained in my last post, Albireo is successful because it is an <strong>embedded, integrated solution</strong>.  Integrating directly with primary storage vendor&#8217;s technology, it <strong>leverages all their existing R&amp;D</strong>.  We also provide the capability for Albireo to be <strong><a href="http://www.permabit.com/albireo/deployment.asp" >integrated as inline, post-process, or parallel deduplication</a></strong>, whichever matches the underlying storage platform the best.  This means that there are no rough edges where features of the underlying storage are lost; the deduplication is transparent and automatic.  Users may not even know that their storage is using Albireo for deduplication, except for the levels of savings and performance far beyond what anyone has seen before.</p>
<p>Finally, because Albireo is delivered as a software tool kit, it is <strong>integrated outside of the storage read path</strong>.  Our technology solves the hardest parts of deduplication, namely sub-file duplicate identification, and then leverages existing vendor file and block system metadata for eliminating duplicates.  Because of this, read operations only look at the file or block metadata without the need to consult our indexes, meaning we have no impact on performance, functionality, or data integrity.  Even if our software is turned off, all user data remains accessible.  This is completely unique in the industry.</p>
<p>I&#8217;m extremely excited to be able to now talk publicly about Albireo and the deduplication benefits it provides to existing primary storage technologies.  We&#8217;ve been focused on this for the past year, and the success of our partners&#8217; integration efforts has confirmed the ease of integration and technological power of the Albireo toolkit.  Through our partners users will be using this soon, and I&#8217;m sure that they&#8217;ll be pleased with the results.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.permabit.com/index.php/2010/06/albireo-storage-optimization-realized/feed/</wfw:commentRss>
		</item>
		<item>
		<title>A Star is Born</title>
		<link>http://blog.permabit.com/index.php/2010/06/a-star-is-born/</link>
		<comments>http://blog.permabit.com/index.php/2010/06/a-star-is-born/#comments</comments>
		<pubDate>Tue, 08 Jun 2010 14:43:32 +0000</pubDate>
		<dc:creator>Jered Floyd</dc:creator>
		
		<category><![CDATA[Jered Floyd, CTO]]></category>

		<category><![CDATA[dedupe]]></category>

		<category><![CDATA[Dedupe2.0]]></category>

		<category><![CDATA[deduplication]]></category>

		<category><![CDATA[primary deduplication]]></category>

		<category><![CDATA[primary storage]]></category>

		<category><![CDATA[primary storage deduplication]]></category>

		<guid isPermaLink="false">http://blog.permabit.com/?p=819</guid>
		<description><![CDATA[In his post, Tom wrote about the top three things we heard from customers about deduplication. Given the wildfire success of deduplication for backup storage, everyone now wants deduplication to optimize primary storage, but nobody is willing to sacrifice performance, functionality, or safety.  This is absolutely sensible - deduplication should be a valuable, cost-saving feature and [...]]]></description>
			<content:encoded><![CDATA[<p>In his <a href="../../../../../index.php/2010/06/left-lane-driving-and-primary-storage-optimization/">post</a>, Tom wrote about the top three things we heard from customers about deduplication. Given the wildfire success of deduplication for backup storage, everyone now wants deduplication to optimize primary storage, but nobody is willing to sacrifice performance, functionality, or safety.  This is absolutely sensible - deduplication should be a valuable, cost-saving feature and not a tradeoff against other core functionality. Nobody has been able to deliver this - until <a href="http://www.permabit.com/albireo/albireo-overview.asp" >Albireo</a>.<span id="more-819"></span></p>
<p>Permabit has been in the deduplication business since 2000, more than ten years, and we&#8217;ve learned a great deal about both technology and customer requirements. In fact, we&#8217;ve explored delivering deduplication to the OEM storage vendor market for some time, after including it for many years in our <a href="http://www.permabit.com/products/data-center-series.asp" >Enterprise Archive</a> product. If you&#8217;re not familiar with it, our Enterprise Archive is a complete stack solution for efficient value-tier storage, delivering our own file interfaces, file system, <a href="http://www.permabit.com/products/rain-ec.asp" >RAIN-EC</a> data protection, and hardware.  When talking with tier 1 storage vendors we were told many times, &#8220;we&#8217;ve invested millions in our file systems and data protection; yours is great, but we really just want deduplication.  Can you give us just that?&#8221;</p>
<p>For a long time I, along with the rest of the industry, thought the answer was &#8220;no&#8221;. When our engineers explored just providing dedupe we ended up with complex &#8220;bump-in-the-wire&#8221; appliances that sat in front of the storage and treated it almost like JBOD, masking performance and functionality, and jeopardizing integrity through data lock-in to the solution. We didn&#8217;t find this acceptable, and refused to try and sell it.  Others weren&#8217;t as resolute and have tried to bring solutions like this to market and have found it challenging and less than rewarding.</p>
<p>Then, a bit over a year ago, I had an idea. What if we could provide a development kit that delivered the core technologies in deduplication and integrated into the vendor&#8217;s existing storage stack, rather than sitting outside it? We could avoid competing with our partners on functionality, and eliminate the concerns that Tom explained so clearly. That&#8217;s what <a href="http://www.permabit.com/albireo/albireo-overview.asp" >Albireo</a> is - the core technologies that make fast, scalable deduplication possible, packaged in a way that they can be integrated into any storage vendor&#8217;s stack in a matter of days to weeks. Our engineering team took this idea, extracted several years of deduplication effort from our core research, enhanced it further and packaged it as a complete SDK, all in under a year.</p>
<p>I named this project Albireo, after the most visible <a href="http://en.wikipedia.org/wiki/Albireo" onclick="javascript:pageTracker._trackPageview('/outbound/article/en.wikipedia.org');">binary star system</a>. I thought this captured the basic idea of the product; a technology that works alongside an existing storage stack to deliver a powerful deduplicating storage solution that looks and acts as a single product. And of course, it also makes multiple data instances appear as one.</p>
<p>In the process of developing Albireo we learned seven key requirements for an integrated, deduplication solution. Be sure to read my next post to find out how these have all been critical to the success of Albireo with our partners.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.permabit.com/index.php/2010/06/a-star-is-born/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Primary Storage Deduplication is the Future</title>
		<link>http://blog.permabit.com/index.php/2009/10/primary-storage-deduplication-is-the-future/</link>
		<comments>http://blog.permabit.com/index.php/2009/10/primary-storage-deduplication-is-the-future/#comments</comments>
		<pubDate>Tue, 06 Oct 2009 21:20:41 +0000</pubDate>
		<dc:creator>Jered Floyd</dc:creator>
		
		<category><![CDATA[Jered Floyd, CTO]]></category>

		<category><![CDATA[CTO]]></category>

		<category><![CDATA[dedupe]]></category>

		<category><![CDATA[deduplication]]></category>

		<category><![CDATA[primary storage]]></category>

		<guid isPermaLink="false">http://blog.permabit.com/?p=514</guid>
		<description><![CDATA[Until now, I&#8217;ve chosen to stay out of the little tempest in a teapot that&#8217;s going on over at Chuck Hollis&#8217; blog, but it doesn&#8217;t seem to be quieting down.  He basically says that Data dedupe has no place on primary storage, which flies in the face of where the dedupe market is going&#8230; [...]]]></description>
			<content:encoded><![CDATA[<p>Until now, I&#8217;ve chosen to stay out of the little <a href="http://chucksblog.emc.com/chucks_blog/2009/09/a-quick-note-on-primary-data-dedupe-and-io-density.html" onclick="javascript:pageTracker._trackPageview('/outbound/article/chucksblog.emc.com');">tempest in a teapot</a> that&#8217;s going on over at Chuck Hollis&#8217; blog, but it doesn&#8217;t seem to be quieting down.  He basically says that <a title="Data Dedupe" href="http://www.permabit.com/products/sdr.asp" >Data dedupe</a> has no place on primary storage, which flies in the face of where the dedupe market is going&#8230; but it&#8217;s not a bad position to take when you&#8217;re company makes a lot of money off of very expensive primary storage. <span id="more-514"></span></p>
<p>Their biggest NAS competitor took the bait, and NetApp jumped into the fray. In between the vitrol some good points are made about why Chuck is wrong.  For example, if deduplication is increasing access to common blocks it means that you&#8217;ll be seeing much better cache efficiency,  which will offset additional load on the drives.  The &#8220;boot storms&#8221; he talks about with many virtual machines hosted on the same storage are actually less likely to occur with deduplication than without!</p>
<p>Now <a href="http://blogs.hds.com/hu/2009/10/i-agree-with-chuck-on-data-dedupe.html" onclick="javascript:pageTracker._trackPageview('/outbound/article/blogs.hds.com');">Hu Yoshida has weighed in on Chuck&#8217;s side</a>, but then laid out the much more reasonable view that virtualization, tiering and dynamic provisioning are critical in an environment with costly top-tier primary storage.  That&#8217;s correct, but it doesn&#8217;t mean that deduplication at that top tier isn&#8217;t a huge win as well.</p>
<p>Deduplication has seen its first successes in the D2D backup space, where it&#8217;s easy to get a lot of deduplication due to the data patterns and traditional backup schedule.  Applying deduplication beyond backup is hard, because the opportunities for deduplication are fewer and further between, and so these D2D backup devices have <a href="http://blog.permabit.com/?p=390" >never been able to address archive or primary storage effectively</a>. That doesn&#8217;t mean dedupe is bad for primary, it just means that it&#8217;s harder to do.</p>
<p>At Permabit, we consider dedupe for backup to be Dedupe 1.0, and the future for innovation is in Dedupe 2.0, which includes dedupe for primary and cloud storage. We host a forum over at <a href="http://www.dedupe2.com/" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.dedupe2.com');">Dedupe 2.0</a> to discuss this further, and recently released our Permabit Cloud Storage product to address these new customer needs.  I can&#8217;t give too much detail, but we&#8217;re constantly at work making our deduplication technology available to ever broader markets.</p>
<p><a title="Dedupe" href="http://www.permabit.com/products/sdr.asp" >Dedupe</a> for primary is a huge win for the storage consumer, but it&#8217;s taken us nearly a decade of extensive technology and patent development to solve the scalability and speed challenges needed for that market.  I think it&#8217;s no coincidence that the two voices denouncing primary dedupe the most, HDS and EMC, has no products to offer that include a feature which will soon become a customer requirement.</p>
<p>If you&#8217;re going to be at <a href="http://snwusa.com/" onclick="javascript:pageTracker._trackPageview('/outbound/article/snwusa.com');">Storage Networking World</a> next week and would like to hear more on primary dedupe, <a href="http://tanejagroup.com/" onclick="javascript:pageTracker._trackPageview('/outbound/article/tanejagroup.com');">Arun Taneja</a> is moderating a panel, &#8220;Primary Storage: The New Frontier for <a title="Data Deduplication" href="http://www.permabit.com/products/sdr.asp" >Data Deduplication</a>&#8220;.  I&#8217;ll be there, along with Val Bercovici from NetApp, Carter George from Ocarina, and Peter Smails from Storwize.  It should be a lively discussion!  Perhaps Chuck will stop by?</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.permabit.com/index.php/2009/10/primary-storage-deduplication-is-the-future/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Deduplication and Encryption</title>
		<link>http://blog.permabit.com/index.php/2009/08/deduplication-and-encryption/</link>
		<comments>http://blog.permabit.com/index.php/2009/08/deduplication-and-encryption/#comments</comments>
		<pubDate>Fri, 28 Aug 2009 17:03:28 +0000</pubDate>
		<dc:creator>Jered Floyd</dc:creator>
		
		<category><![CDATA[Jered Floyd, CTO]]></category>

		<category><![CDATA[archiving]]></category>

		<category><![CDATA[dedupe]]></category>

		<category><![CDATA[deduplication]]></category>

		<category><![CDATA[Encryption]]></category>

		<category><![CDATA[enterprise data archive]]></category>

		<category><![CDATA[enterprise storage]]></category>

		<guid isPermaLink="false">http://blog.permabit.com/?p=499</guid>
		<description><![CDATA[I spend a lot of my time talking with Permabit customers and more and more recently I have heard questions on proper use of encryption in their storage environments.  Lost customer data is a huge risk to businesses, and risk often directly translates to cost, either from legal penalties or in cleanup.  For [...]]]></description>
			<content:encoded><![CDATA[<p>I spend a lot of my time talking with Permabit customers and more and more recently I have heard questions on proper use of encryption in their storage environments.  Lost customer data is a huge risk to businesses, and risk often directly translates to cost, either from legal penalties or in cleanup.  For example, companies which handle credit card data have been scrambling to comply with the <a href="https://www.pcisecuritystandards.org/" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.pcisecuritystandards.org');">PCI Data Security Standards</a>, and still in the news we hear about horrors like the <a href="http://news.yahoo.com/s/ap/20090817/ap_on_re_us/us_hacker_charges" onclick="javascript:pageTracker._trackPageview('/outbound/article/news.yahoo.com');">theft of 130 million credit cards numbers</a>.  Encryption is all about obscuring data, but deduplication is about seeing through and eliminating duplicates within your data.  How can these coexist?<span id="more-499"></span></p>
<p>They can, but it depends a lot on when and how you encrypt your data.  The challenge is in balancing when during your information lifecycle your data is encrypted, and how it is handled.  If you encrypt data high up the stack, in the application, then it&#8217;s more likely to be end-to-end secure, but cannot be easily shared with other applications in your environment.  If you encrypt data lower down the stack, in the storage, it can be easily shared but you must be careful how it is protected in transit.  In many environments, different cryptographic implementations are appropriate for different kinds of data.</p>
<p>Broadly, there are two areas in which to consider encryption implementation &#8212; transport encryption and data encryption.  Transport encryption is where all communication between two applications or servers is encrypted, delivering a secure communications channel.  This means that both data and control commands are encrypted; this is important because it means that an eavesdropper cannot see your data and also cannot tell what sort of things you are doing with that data.  With transport encryption, the data on the other side of the connection is generally handled in unencrypted form, so this does not protect against data leaks or maliciousness within the application.  <a href="http://en.wikipedia.org/wiki/Https" onclick="javascript:pageTracker._trackPageview('/outbound/article/en.wikipedia.org');">HTTPS</a> is a common example of use of transport security.</p>
<p>Data encryption, on the other hand, is where the individual pieces of information being processed are encrypted.  These may be processed by the application in an encrypted form, stored into a database, or written to disk.  An untrusted application, like perhaps a storage system, can be handed data that has been encrypted without concern that the information will be leaked.  An encrypted archive file on disk would be an example here.  In some ways you could consider tape encryption as either or both types of security.  The tape is carrying data and control information between two applications (or two runtime instances of the same application), so it could be considered a form of &#8220;transport&#8221; security, but if you consider the third-party handling your tapes as part of your storage infrastructure, it&#8217;s more like a form of data encryption.</p>
<p>When selecting a product that incorporates encryption one thing to consider is the encryption algorithm used; luckily, this is an easy choice.  Only use products that implement AES, the <a href="http://en.wikipedia.org/wiki/Advanced_Encryption_Standard" onclick="javascript:pageTracker._trackPageview('/outbound/article/en.wikipedia.org');">Advanced Encryption Standard</a>.  Anything else is unlikely to be (or remain) secure&#8230;. the only possible exception would be Triple DES, but it is a cipher that is showing its age.</p>
<p><strong>Recommendations for Deduplication</strong></p>
<p>If you would like to combine <a href="http://www.permabit.com/products/sdr.asp"  title="deduplication">deduplication</a> and encryption, at some level the storage stack must have access to the unencrypted data so that it can identify duplicates.  For a system with standard protocol interfaces, such as NFS and CIFS, this means not performing data encryption within your application on data that you want to deduplicate.  This doesn&#8217;t mean not to use encryption at all, however.</p>
<p>Data that must be kept very secure and is also unlikely to deduplicate, such as credit card numbers, can be safely encrypted by the application.  For protection of the remaining data, you must select a system that both delivers transport encryption between your application and the storage, and also includes data encryption internally to protect data on disk.  Surprisingly, there aren&#8217;t many options available today.</p>
<p>Several drive vendors have begun to incorporate full-disk encryption (FDE) into their disks, and storage array vendors are just beginning to make use of this.  This means that the data on such drives is protected against theft or loss of the drives, but the weakness is still that the device the drives are in must have the keys to unlock them.  That means if someone walks away with a server and its disks, all bets are off.  FDE drives are compatible with deduplication, though, because any deduplication activities are happening at a higher level in the storage system.</p>
<p><a href="http://www.permabit.com/products/data-center-series.asp" >Permabit Enterprise Archive</a> supports <a href="http://www.permabit.com/products/privacy-access.asp" >encryption</a> at a number of different layers.  This includes encryption to the client, encryption during replication, and encryption on disk.  Permabit always uses the AES cryptographic algorithm, as mentioned above.</p>
<p>First and foremost, transport encryption is used wherever possible. If the application protocol (i.e. NFS, CIFS) supports an encrypted connection, we will deliver that.  Unfortunately, this is not widely available today, with CIFS supporting secure authentication and some recent versions supporting secure transport.</p>
<p><a href="http://www.permabit.com/products/replication.asp" >Replication</a> is always performed over an encrypted channel as well, even if the data being transported is already encrypted.  This ensures that customer data is not replicated to an attacker outside your firewall that has surreptitiously tried to intercept your replication data stream.  Additionally, because the transport channel is encrypted, an eavesdropper cannot tell anything about the sort of data being replicated, such as file sizes.</p>
<p>Finally, Enterprise Archive can optionally encrypt data on disk so that it is protected against theft or loss of the hardware.  This option can be configured on a volume by volume basis. If someone were to walk away with one (or even all) of the hard drives in an Enterprise Archive install, they would not be able to make sense of any of the data.  This offers strong protection against data theft from the equipment.</p>
<p>Permabit&#8217;s on-disk encryption offers additional protection beyond what full-disk hardware encryption offers, because of how encryption keys are handled.  In a system with FDE disks, encryption keys must be stored on the server so as to unlock the disks when they are powered on.  While this means that a stolen disk is of no use, a stolen server necessarily contains the keys that will unlock its disks.  In Enterprise Archive, data encryption happens at the access node layer, significantly reducing the vulnerability profile of the keys.  Because application transports like NFS require cleartext data, the access nodes must have access to the data encryption keys.  For encrypted volumes, they encrypt and decrypt data as it flows from and to the storage nodes in the system.  This means that storage nodes never have the keys necessary to decrypt the data that they hold and protect, a significant separation of responsibility.</p>
<p>Overall, <a href="http://www.permabit.com/products/sdr.asp" >deduplication</a> and encryption are compatible, but to use them together you must take care on where you apply the encryption.  For data to be deduplicated, encryption must take place within the storage system, and not at an application or gateway layer.  To ensure data security, make sure information is always encrypted in flight whenever possible, especially during replication or backup.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.permabit.com/index.php/2009/08/deduplication-and-encryption/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Next Steps for Dedupe: Dedupe 2.0 and the Cloud</title>
		<link>http://blog.permabit.com/index.php/2009/06/next-steps-for-dedupe-dedupe-20-and-the-cloud/</link>
		<comments>http://blog.permabit.com/index.php/2009/06/next-steps-for-dedupe-dedupe-20-and-the-cloud/#comments</comments>
		<pubDate>Thu, 25 Jun 2009 20:51:47 +0000</pubDate>
		<dc:creator>Jered Floyd</dc:creator>
		
		<category><![CDATA[Jered Floyd, CTO]]></category>

		<guid isPermaLink="false">http://blog.permabit.com/?p=460</guid>
		<description><![CDATA[A few weeks ago, Tom wrote about the end of the road for Dedupe 1.0.   There&#8217;s no question that Data Domain has won the Dedupe 1.0 game &#8212; that of deduplication for D2D backup &#8212; and while we&#8217;re still waiting to see which of EMC or NetApp gains the right to acquire them, [...]]]></description>
			<content:encoded><![CDATA[<p>A few weeks ago, Tom <a href="http://blog.permabit.com/?p=413" >wrote about the end of the road for Dedupe 1.0</a>.   There&#8217;s no question that Data Domain has won the Dedupe 1.0 game &#8212; that of <a title="deduplication" href="http://www.permabit.com/products/sdr.asp" >deduplication</a> for D2D backup &#8212; and while we&#8217;re still waiting to see which of EMC or NetApp gains the right to acquire them, we all know that both bidders have lost at the price they&#8217;re paying.</p>
<p>Disk-based backup is a huge market, don&#8217;t get me wrong, but it&#8217;s tiny compared with the amount of disk out there for archive and primary storage.  Not only that, but as I wrote about in <a href="http://blog.permabit.com/?p=77" >&#8220;Cutting Costs with Enterprise Archive&#8221;</a>, deduplication for disk-based backup is based on &#8220;doing it wrong&#8221; &#8212; intentionally writing the same data again and again even though there&#8217;s no good reason for having configured your backup software to do so.  <a title="dedupe" href="http://www.permabit.com/enewsletters/summer-09.asp" >Dedupe</a> for backup was always a marketing game, one that Data Domain&#8217;s team excelled at winning.</p>
<p>You might note that we&#8217;ve never pursued this crowded D2D market head-on; instead we&#8217;ve focussed on where more data lives and nobody else has done a good job at addressing the much harder technical problems: Dedupe 2.0, archive and primary storage.  If you&#8217;ve been to our website recently, you may have noticed that we&#8217;ve launched a new product in the <a title="Dedupe 2.0" href="http://www.permabit.com/dedupe2/5-reasons.asp" >Dedupe 2.0</a> space: <a href="http://permabit.com/products/cloud-storage.asp" >Permabit Cloud Storage</a>.<span id="more-460"></span></p>
<p>A few months ago, <a href="http://blog.permabit.com/?p=60" >I made some snide comments</a> about the &#8220;cloud&#8221; storage term.  I&#8217;m not back-pedaling here, I still think that &#8220;cloud&#8221; is really &#8220;Storage as a Service 2.0&#8243;, but I do believe infrastructure is now mature enough that renting a home for your bits is now a viable thing to do.</p>
<p>The trap that I pointed out in my last message on cloud, and the reason I think EMC&#8217;s Atmos is misguided, is the problem of API proliferation.  There are now dozens of simple, RESTful APIs out there competing for application adoption, and each locks you into a particular set of service providers.  These standards will eventually coalesce and evolve, but in the meantime cloud providers are aggressively competing on the simplicity and power of their APIs&#8230; all while using pricey storage on the back end.  Permabit Cloud Storage delivers massively scalable storage for these service providers that provides all the rich functionality they need to offer new web APIs, while staying agnostic on these interfaces by internally delivering conventional NFS and CIFS.  Multiple service providers now use Permabit Cloud Storage to offer cloud services to their customers, freeing them from concerns over reliability, privacy and security.</p>
<p>Over at Storage Switzerland, <a href="http://web.me.com/georgeacrump/Site/Articles/Entries/2009/3/23_Can_Cloud_Storage_be_the_Solution_to_Data_Explosion.html" onclick="javascript:pageTracker._trackPageview('/outbound/article/web.me.com');">Joseph Ortiz</a> lays out the case for why cloud archive storage makes sense for small-to-medium business that have rapidly growing storage requirements.  Permabit Cloud Storage provides infrastructure that allows service providers to deliver unbeatable pricing to these customers, as well as scalable systems that can be deployed as an internal corporate cloud.  This is just one part of Permabit&#8217;s Dedupe 2.0 strategy &#8212; deduplication beyond the realm of just backup.  There&#8217;s still more excitement to come.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.permabit.com/index.php/2009/06/next-steps-for-dedupe-dedupe-20-and-the-cloud/feed/</wfw:commentRss>
		</item>
		<item>
		<title>All Deduplication is Not the Same</title>
		<link>http://blog.permabit.com/index.php/2009/05/all-deduplication-is-not-the-same/</link>
		<comments>http://blog.permabit.com/index.php/2009/05/all-deduplication-is-not-the-same/#comments</comments>
		<pubDate>Thu, 07 May 2009 02:09:17 +0000</pubDate>
		<dc:creator>Jered Floyd</dc:creator>
		
		<category><![CDATA[Jered Floyd, CTO]]></category>

		<category><![CDATA[archiving]]></category>

		<category><![CDATA[CTO]]></category>

		<category><![CDATA[dedupe]]></category>

		<category><![CDATA[deduplication]]></category>

		<category><![CDATA[primary storage]]></category>

		<category><![CDATA[RAIN-EC]]></category>

		<category><![CDATA[Value Tier]]></category>

		<guid isPermaLink="false">http://blog.permabit.com/?p=390</guid>
		<description><![CDATA[In a short few years deduplication technologies have become commonplace, and in not too much longer will become a requirement for any new storage purchase.  When everyone offers deduplication, are they all the same?  Definitely not!  Most of the deduplication technologies out there only perform in very specific use cases like backup, [...]]]></description>
			<content:encoded><![CDATA[<p>In a short few years deduplication technologies have become commonplace, and in not too much longer will become a requirement for any new storage purchase.  When everyone offers deduplication, are they all the same?  Definitely not!  Most of the deduplication technologies out there only perform in very specific use cases like backup, and cannot handle general purpose data storage. <span id="more-390"></span></p>
<p>The biggest dichotomy in deduplication is dedupe for backup, be it VTL or disk-to-disk (D2D), and dedupe for general purpose primary or archive data storage.   These uses are very different.  In the case of dedupe for backup the system doesn&#8217;t have to work as hard to find commonalities in data so it can &#8220;cheat&#8221; a bit when looking for identical data.  This is largely because dedupe for backup depends on telling customers to keep on going with their same old inefficient backup schedules of regular full backups, even though those backups are now going to spinning disk.  It&#8217;s really easy to get <a href="http://blog.permabit.com/?p=39" >20x dedupe when you tell the customer to store the exact same data 20 times!</a></p>
<p>At the heart of any deduplication system is an index of data that has been seen before, catalogued by some form of &#8220;data fingerprint&#8221;.  (Permabit <a href="http://blog.permabit.com/?p=6" >uses SHA-256, the only fingerprint algorithm currently allowed for Federal data security</a>.)  As new data comes in the system must rapidly fingerprint the data and determine if it has been seen before.  Doing this quickly and efficiently is the core of any deduplication engine, and is an extremely hard problem to solve for hundreds of terabytes or more of data.</p>
<p>How does backup dedupe cheat?  Dedupe systems for backup, like those from NEC, IBM, Sepaton and others, depend on something called <strong>temporal locality</strong> &#8212; a technical way of saying that because these are backup images, data that was written together before is likely to be written together again.  A full backup of a system is going to look a whole lot like the last full backup of that same system, or a full backup of a very similar system.</p>
<p>Back to the deduplication index: Because these backup systems know that they&#8217;re going to see the same data over and over they don&#8217;t try to keep an index of all the data stored.  Instead they break it up into smaller chunks based on time, covering only a handful of terabytes of data, or maybe a dozen backups.  When they see a new backup stream start they look and see if it matches any of those small indexes &#8212; if it does, then they read in that index and use only it for deduplication.  This catches a lot of duplicate data, but misses anything that was already seen in a different period.  In practice, it works well enough for backup&#8230;. but, it only works well enough for backup.</p>
<h3>Dedupe for Archive and Primary</h3>
<p>For archive and primary data storage, on the other hand, the storage system needs to search hard to find duplicate data, looking for not just whole files that are the same but also smaller parts of files that are duplicated as different versions of that file.  This is a much harder problem because to get any real efficiency the system must compare against all other data in the system in detail, hundreds of terabytes or more of data.  Without inspecting all data in the system, significant deduplication will be unlikely.</p>
<p>As you might imagine, maintaining such a deduplication index of this scale is an extremely challenging prospect.  Most vendors that claim to server archive and primary storage take the easy way out &#8212; they just don&#8217;t scale.  Data Domain, for example, tops out at around 30 terabytes of disk, and NetApp won&#8217;t let you dedupe outside of a single 16 TB WAFL volume.  You&#8217;re not likely to get much deduplication there!  Others, like NEC&#8217;s HydraStor, will apply the backup-specific approach I describe above and happily charge you primary storage costs for archive storage with minimal if any deduplication.  Bolt-on solutions like Ocarina are really only just recompressing your JPEG images more efficiently, and require custom reader software to go back and decompress them later.</p>
<p>Only <a href="http://permabit.com/products/sdr.asp" >Permabit&#8217;s Scalable Data Reduction (SDR)</a> delivers truly scalable <a title="Data Deduplication" href="http://www.permabit.com/products/sdr.asp" >data deduplication</a> for non-backup data.   Our advanced grid architecture allows us to protect data far better than RAID by use of our <a href="http://permabit.com/products/rain-ec.asp" >RAIN-EC data protection</a>, and also allows us to distribute the deduplication indexing problem.   This means that Permabit Enterprise Archive can efficiently index and dedupe across hundreds of terabytes of disk &#8212; all real-time.  In just thousandths of a second we identify if each incoming chunk has ever been seen before, something that no other vendor on the market today can claim within an order of magnitude.</p>
<p>Don&#8217;t get stuck with a deduplication solution that doesn&#8217;t perform.  Before making your next storage purchase decision, ask your vendor how broad the scope of their deduplication is.  Do they dedupe sub-file?  Do they dedupe against <em>all</em> previously written data?  What are the limits on the amount of <em>unique</em> data that they can store?  The answers may surprise you.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.permabit.com/index.php/2009/05/all-deduplication-is-not-the-same/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Protecting Against Data Rot</title>
		<link>http://blog.permabit.com/index.php/2009/04/protecting-against-data-rot/</link>
		<comments>http://blog.permabit.com/index.php/2009/04/protecting-against-data-rot/#comments</comments>
		<pubDate>Wed, 01 Apr 2009 21:23:38 +0000</pubDate>
		<dc:creator>Jered Floyd</dc:creator>
		
		<category><![CDATA[Jered Floyd, CTO]]></category>

		<guid isPermaLink="false">http://blog.permabit.com/?p=316</guid>
		<description><![CDATA[Last week one of my favorite technology journalists, David Pogue at the New York Times, wrote about the problem of data rot.  (I wish my web videos were as funny as his.)  He spoke with Dag Spicer of the Computer History Museum about the challenges they face trying to restore data from old [...]]]></description>
			<content:encoded><![CDATA[<p>Last week one of my favorite technology journalists, David Pogue at the New York Times, wrote about <a href="http://pogue.blogs.nytimes.com/2009/03/26/should-you-worry-about-data-rot/" onclick="javascript:pageTracker._trackPageview('/outbound/article/pogue.blogs.nytimes.com');">the problem of data rot</a>.  (I wish my web videos were as funny as his.)  He spoke with Dag Spicer of the <a href="http://www.computerhistory.org/" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.computerhistory.org');">Computer History Museum</a> about the challenges they face trying to restore data from old media. <span id="more-316"></span> </p>
<p>I found it interesting, but not surprising, that they have better luck reading very old media versus newer failing media.  While this might seem counterintuitive at first, I think it makes a lot of sense.   Old data storage mechanisms were very simple to understand once they were invented and built.  I remember but ten years ago when disk drives didn&#8217;t do any sort of automatic remapping of defective blocks &#8212; instead the drive came with a label identifying the blocks that tested bad, and you had to enter this into your computer so it knew to avoid them.  Today, that&#8217;s encoded on the drive and much more sophisticated software would be required to reconstruct data from just the raw media.</p>
<p>Similarly, the magnetic domains on today&#8217;s storage are hundreds of thousands times smaller than 20 to 30 years ago. With today&#8217;s read technology we can recover media that has suffered significant damage, but there&#8217;s just so much less damage modern media can sustain and still be readable. </p>
<p>The Computer Museum&#8217;s problems are largely related to abandoned media that has been unmaintained and ignored, though, and Spicer doesn&#8217;t give hope to modern users who are still writing and preserving data.  For users today the situation is not nearly so dire!  Media migration and data preservation can be automated, as with the future-proof media migration built in to every <a href="http://www.permabit.com/products/enterprise-archive.asp" >Permabit Enterprise Archive</a> system. </p>
<p>I last wrote about this in the first article in my data preservation series, <a href="http://blog.permabit.com/?p=62" >No Silver Bullet: Archive Challenges</a>.  Organizations like the National Archives and Records Administration (NARA) recommend copying archive data to new, modern media every three to five years; this solves both the danger of media degration through media refresh, and the danger of media obsolescence through technology refresh.  With large archive data sets this means that media refresh is occurring on a nearly continuous basis &#8212; there&#8217;s always some data on media reaching the end of its lifecycle.  </p>
<p>Addressing this manually is a time-consuming, error-prone and painful process, as discussed in Pogue&#8217;s interview.  That&#8217;s why we&#8217;ve automated it in our systems.  New storage nodes can be added at any time with the latest and greatest technology, and older nodes can be removed as they reach the end of their life.  All data movement is handled automatically, and data is never at risk during any of these internal migrations.  This ensures that, regardless of how many petabytes of data you have, you&#8217;ll never run into the bit rot problem with a Permabit Enterprise Archive.</p>
<div style="float: right"><a href="http://www.tinfoil.com/" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.tinfoil.com');"> <img src="http://www.tinfoil.com/co-b.jpg" alt="" /> </a></div>
<p>An entirely separate issue that Pogue doesn&#8217;t talk about in his interview with Spicer is the problem of logical readability.  Just because you can get the raw data back doesn&#8217;t mean you can make sense of it.  You may not have an application that can read it anymore!  This would be like playing one of <a href="http://www.tinfoil.com/cylinder.htm" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.tinfoil.com');">Edison&#8217;s robust wax cylinders</a> only to find that it contained a message in a dead language!  More on how to solve that in <a href="http://blog.permabit.com/?p=65" >No Silver Bullet: Logical Readability</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.permabit.com/index.php/2009/04/protecting-against-data-rot/feed/</wfw:commentRss>
		</item>
		<item>
		<title>The Patent Truth</title>
		<link>http://blog.permabit.com/index.php/2009/03/the-patent-truth/</link>
		<comments>http://blog.permabit.com/index.php/2009/03/the-patent-truth/#comments</comments>
		<pubDate>Tue, 17 Mar 2009 21:36:18 +0000</pubDate>
		<dc:creator>Jered Floyd</dc:creator>
		
		<category><![CDATA[Jered Floyd, CTO]]></category>

		<guid isPermaLink="false">http://blog.permabit.com/?p=304</guid>
		<description><![CDATA[Yesterday we announced details on five patents that we were recently awarded.  Being one of the very first companies to develop technology for data deduplication we have had an extensive portfolio of patent filings, however the US Patent Office has been so swamped with work that these are only now being issued, many of [...]]]></description>
			<content:encoded><![CDATA[<p>Yesterday we <a href="http://permabit.com/pressreleases/permabit-patent-releases.asp" >announced details on five patents that we were recently awarded</a>.  Being one of the very first companies to develop technology for data deduplication we have had an extensive portfolio of patent filings, however the US Patent Office has been so swamped with work that these are only now being issued, many of them eight years after filing.  It&#8217;s been very exciting to see these finally pop out the other end of the patent system, and we&#8217;re looking forward to many more finally making it through in the coming year.</p>
<p>Patents are written in a strange dialect of English colloquially known as &#8220;patentese&#8221;, so it&#8217;s hard to casually tell what they&#8217;re about &#8212; this is especially true of patents in high-tech.  It&#8217;s easy to read a patent that&#8217;s for a better mousetrap (even if it&#8217;s titled &#8220;Mechanism for detecting presence, dispatching and retaining murine pests&#8221;), but what is &#8220;Storage system for randomly named blocks of data&#8221; or &#8220;History preservation in a computer storage system&#8221; about?  Allow me to provide a bit of a secret decoder ring. <span id="more-304"></span></p>
<p>
<h3>Records Retention</h3>
<p>We have a bunch of new patents, so I&#8217;ll focus on two of the most interesting ones.  The most recent is <a href="http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&#038;Sect2=HITOFF&#038;p=1&#038;u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&#038;r=1&#038;f=G&#038;l=50&#038;co1=AND&#038;d=PTXT&#038;s1=7,478,096.PN.&#038;OS=PN/7,478,096&#038;RS=PN/7,478,096" onclick="javascript:pageTracker._trackPageview('/outbound/article/patft.uspto.gov');">US Patent No. 7,478,096, &#8220;History preservation in a computer storage system&#8221;</a>. If you click through you&#8217;ll see a lot of confusing wording, but the place to begin is at the start.  There you&#8217;ll see &#8220;a method by which a disk-based distributed data storage system is organized for protecting historical records of stored data entities,&#8221; followed but a list of steps and components to make it clearer what we&#8217;re talking about.</p>
<p>From very early on, Permabit has included <a href="http://www.permabit.com/products/permabit-worm-technology.asp" >advanced features for records retention</a> in Permabit Enterprise Archive.  Things like WORM storage and retention policy management solve critical business problems such as <a href="http://www.permabit.com/solutions/compliance.asp" >complying with government regulations</a> including SEC rule 17a-4 and FDA 21 CFR Part 11, both of which require data integrity and enforced retention of records. Until recently, such records were retained electronically only on write-once optical disk, or write-once tape.  These solutions didn&#8217;t provide the accessibility required today to meet litigation discovery requests or to perform data mining operations.  </p>
<p>Permabit Enterprise Archive was one of the very first products to allow enforced WORM retention of records on magnetic disk, and we filed key patents on the enabling technologies.  This &#8216;096 patent protects our grid-based storage technologies that enforce records retention, be it through a file share or an object interface. Multiple versions of files can be stored over time, and even different versions can be protected and preserved.</p>
<p>
<h3>Scalable Deduplication</h3>
<p>The other patents I&#8217;ll discuss today protect some of the technologies we&#8217;ve developed that deliver <a href="http://www.permabit.com/products/sdr.asp" >Scalable Data Reduction</a>, our efficient, in-line deduplication mechanism.  The relevant patents here are <a href="http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&#038;Sect2=HITOFF&#038;p=1&#038;u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&#038;r=1&#038;f=G&#038;l=50&#038;co1=AND&#038;d=PTXT&#038;s1=7,457,800.PN.&#038;OS=PN/7,457,800&#038;RS=PN/7,457,800" onclick="javascript:pageTracker._trackPageview('/outbound/article/patft.uspto.gov');">US Patent No. 7,457,800</a>, and it&#8217;s partner &#8216;813. The description here is a little less transparent, &#8220;storage system for randomly named blocks of data.&#8221; What&#8217;s this have to do with deduplication?</p>
<p>To understand, you need a very quick overview of how hash-based deduplication works.  When data is being written into Permabit Enterprise Archive, our SDR technologies break that data up into smaller chunks for deduplication purposes.  We then must very quickly identify if any of those chunks have been previously stored.  We do this by taking a content fingerprint, or cryptographic hash &#8212; today that is the <a href="http://en.wikipedia.org/wiki/SHA_hash_functions" onclick="javascript:pageTracker._trackPageview('/outbound/article/en.wikipedia.org');">SHA-256 hash function</a>, the integrity of which I <a href="http://blog.permabit.com/?p=6" >discussed in a previous blog entry</a>.  Then we look and see if we already have a chunk with the same fingerprint.</p>
<p>It turns out that it&#8217;s really hard to do this when you have lots of data, and that&#8217;s the fundamental limit to scaling for archive deduplication systems.  If you have a 100 terabytes of information you might have 10 billion different chunks, each with a 32-byte long name.  That&#8217;s 320 gigabytes just of names!  You can&#8217;t use an old-fashioned file system to store those and check, as it will take many seconds just to walk through the directory structure&#8230; that&#8217;s why companies like NetApp <a href="http://blog.permabit.com/?p=14" >have to have background processes that just can&#8217;t keep up</a>.  Even a traditional database can&#8217;t do this quickly because all the names are evenly (randomly) distributed, so you can&#8217;t predictively cache part of the list.  Think of it like a dictionary &#8212; just because you looked up &#8220;<a href="http://en.wiktionary.org/wiki/aardvark" onclick="javascript:pageTracker._trackPageview('/outbound/article/en.wiktionary.org');">aardvark</a>&#8221; doesn&#8217;t make it more likely you&#8217;ll look up another word beginning with &#8220;A&#8221; next.</p>
<p>To solve this problem companies like Data Domain <a href="http://www.usenix.org/events/fast08/tech/full_papers/zhu/zhu_html/index.html" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.usenix.org');">have used computer science data structures called &#8220;Bloom filters&#8221;</a>.  These make it possible to scale to tens of terabytes, but break down as you go higher, so we developed our own technologies to scale out further, the &#8220;delta indexes&#8221; described in our patent.</p>
<p>This is one way in which we can scale dedupe beyond what other vendors can even dream of today.  The other component is our decentralized grid architecture.  I&#8217;ll explain how that helps SDR in a future post.</p>
<p>
<h3>The Patent Pipeline</h3>
<p>As I said up top, many of these patents were filed in the very early days of the company and are just finally making their way through the patent office now.  Well fewer than half of our filings have completed the process to date, so I expect to have more good news to share in the future!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.permabit.com/index.php/2009/03/the-patent-truth/feed/</wfw:commentRss>
		</item>
		<item>
		<title>End of the RAID</title>
		<link>http://blog.permabit.com/index.php/2009/02/end-of-the-raid/</link>
		<comments>http://blog.permabit.com/index.php/2009/02/end-of-the-raid/#comments</comments>
		<pubDate>Tue, 10 Feb 2009 15:20:00 +0000</pubDate>
		<dc:creator>Jered Floyd</dc:creator>
		
		<category><![CDATA[Jered Floyd, CTO]]></category>

		<guid isPermaLink="false">http://76.12.37.208:8079/?p=142</guid>
		<description><![CDATA[To kick off the new year, we recently released our Storage Predictions for 2009.  We&#8217;ve received a lot interest in this list since we released it, and I personally have been asked about prediction number 3, &#8220;RAID will Hit a Data Dead End&#8221;.  Allow me to explain. 
For this prediction, we say:
RAID Nears [...]]]></description>
			<content:encoded><![CDATA[<p>To kick off the new year, we recently released our <a href="http://www.permabit.com/pressreleases/2009-data-storage-predictions.asp" >Storage Predictions for 2009</a>.  We&#8217;ve received a lot interest in this list since we released it, and I personally have been asked about <a href="http://www.permabit.com/media/predictions-results.asp#result3" >prediction number 3</a>, &#8220;RAID will Hit a Data Dead End&#8221;.  Allow me to explain. <span id="more-142"></span></p>
<p>For this prediction, we say:</p>
<blockquote><p><strong>RAID Nears Retirement.</strong> As multi-tiered storage continues to evolve, SANs will become more complex, unified networks will emerge, and as newer and larger drive technologies such as 1 TB drives take root, RAID as a data protection technology will become irrelevant.  Advanced data protection schemes based on Erasure Coding technology for long term reliable data storage will take hold putting additional pressure on legacy solutions depending on RAID.</p></blockquote>
<p>RAID is a technology that has served us well, but there are two ways in which it fails to scale going forward.  Most importantly, RAID technologies today have serious problems with large capacity drives, like the 1 and 1.5 TB drives shipping now.  These problems will only become more pronounced with the 2 TB drives soon to be available.</p>
<p>First, RAID has an issue with the bit error rates on high-capacity drives, a problem I discuss in detail in our video <a href="http://www.permabit.com/videos/raid45/permabit-raid45.asp" >&#8220;The Trouble with RAID&#8221;</a>.  The bit error rate is the rate at which a drive will fail to read a block.  These failures are not due to  complete spindle failures, but due to the statistical encodings used to store bits into magnetic domains on the drive.  Drives don&#8217;t incur the penalty of read-after-write to verify the data written, so sometimes they manage to write data that cannot be read later despite the sophisticated error correcting codes used to protect the data on disk.</p>
<p>As I <a href="http://blog.permabit.com/?p=10" >explain in an earlier post</a>, the bit error rate of the drives can be catastrophic for RAID.  In a RAID 4 or 5 rebuild it is necessary to read every bit off all the remaining disks.  There&#8217;s a high probability this may not be possible with high-capacity drives.  In RAID 6, the same problem occurs in the event of a double failure.</p>
<p>This very problem was raised at the recent Gartner Data Center Conference, in <a href="http://agendabuilder.gartner.com/lsc27/WebPages/SessionDetail.aspx?EventSessionId=804" onclick="javascript:pageTracker._trackPageview('/outbound/article/agendabuilder.gartner.com');">&#8220;The Enterprise Storage Scenario&#8221; by Roger Cox and Dave Russell</a>.  RAID is not a technology that is going to survive with higher and higher capacity drives, and enterprises must look to technologies like advanced erasure coding to meet data protection requirements.</p>
<p><a href="http://www.permabit.com/products/data-center-series.asp" >Permabit Enterprise Archive</a> protects against this pitfall within our <a href="http://www.permabit.com/products/rain-ec.asp" >RAIN-EC storage architecture</a>.  By recording additional recovery information we can rebuild from up to 8K of unreadable data without having to fail a drive or recover from another location.  Even in the event of multiple failures you&#8217;re still protected against the bit error rate, something that RAID can&#8217;t do.</p>
<p><img class="alignright size-full wp-image-147" title="car-falling-sml" src="http://permabit.wordpress.com/files/2009/02/car-falling-sml.jpg" alt="car-falling-sml" width="400" height="266" /></p>
<p>The second problem RAID faces are increased rebuild times.  While drive capacities continue to grow exponentially, drive read performance does not.  The read rate is dependent upon drive spindle speed and linear bit density.  Large capacity drive spindles aren&#8217;t spinning any faster, with all of them in the 5400 or 7200 RPM class.  Bit density is going up, but read rates only improve with the square-root of the rate at which capacity increases, because capacity gains from increases in two dimensions (around the disk and across it) while read rate gains from only one.</p>
<p>This means that RAID rebuilds take unacceptably long times on high capacity drives.  Consider a rebuild at 25 MB/s for a set of 2 TB drives &#8212; this will take more than 22 hours!  Can your data be without protection for nearly a full day?</p>
<p>Permabit Enterprise Archive&#8217;s RAIN-EC architecture helps here as well.  While in a RAID system a group of drives constitute a set, RAIN-EC distributes data in a more sophisticated manner.  The recovery information for data on one drive is spread evenly across all the other drives in the system.  This means that in the event of a drive failure all the other drives participate in the reconstruction process, and each drive is only responsible for a small portion of the recovery.  Thus, the rebuild rate goes up with each additional drive in a RAIN-EC system.  With RAID, adding more drives always makes the rebuild rate go down (or stay the same).</p>
<p>That&#8217;s the pressure from the high capacity side, but RAID arrays, at least for disk, have serious pressure from the other side too &#8212; Solid State Drives (SSD).  SSDs massively outperform low capacity 10K and 15K RPM drives, and within 18 months they&#8217;ll be at an equivalent price.  Additionally, STEC tells me that <a href="http://www.stec-inc.com/product/zeusiops.php" onclick="javascript:pageTracker._trackPageview('/outbound/article/www.stec-inc.com');">their Zeus SSDs</a> have bit error rates as low as 1 in 10^17, which protects in rebuilds significantly better than the equivalent 15K RPM drives.</p>
<p>Given reliability concerns when using high capacity disk drives and the end of road in view for 15K RPM performance-oriented disk,  RAID arrays are being squeezed from both sides.  High performance systems will continue to use similar technology on SSD, but archive systems require more advanced technologies for the future, and the future, as always, is sooner than you think.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.permabit.com/index.php/2009/02/end-of-the-raid/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
