An Active Archive Strategy Can Help Solve More than Just Archiving

June 17th, 2020 by Floyd Christofferson, CEO of Strongbox Data Solutions

Some storage vendors like to present their customers with two choices when it comes to managing their data infrastructure, particularly for environments that are in the hundreds of TB or multi PB scales:  1) customers can either use what such vendors disparage as “very complicated, labor-intensive”  multi-tiered, multi-cloud, multi-vendor storage choices, or 2) they encourage customers to consolidate everything into what is often claimed as an “easier” single-vendor data storage replacement solution, which may still include multiple platforms from that single vendor.

There are pros and cons to both approaches. The first choice focuses on the “risky” unknowns of managing inevitable data movement across different storage silos or multiple vendor platforms.  The second vendor-centric approach claims to make this easier by providing an end-to-end solution from their own technology stack. The often unspoken reality of the vendor-centric approach is that this may also bring with it a higher cost, the risk of vendor lock-in, and limitations on which technologies may be available. Both approaches tend to be storage-centric in approach, with the data management requirements either considered entirely independently by other stakeholders, or not considered at all.

Often an underlying premise when designing storage infrastructure is that the data need not ever be moved off system. That’s a misleading assumption.  The use of data has become globalized. A distributed workforce, partners, and collaborators are too often not near the stored data.  Distance creates unacceptable response times due to higher latencies. When that data needs to be accessed, used, modified, or archived, files inevitably need to be moved.  And of course, an increasingly large share of data has a long shelf life and will typically outlast any storage platform it is housed on today. This only results in even more data movement.

No one enjoys moving hundreds of terabytes or petabytes between storage systems.  It’s incredibly time consuming, fraught with potential errors, and adds a heavy burden on storage administrators and users alike. It is never easy and is typically very labor-intensive, especially when the migration must bridge different storage types or vendors. What’s worse, traditional migrations typically result in downtime, which is unacceptable in a 24×7 world.

There’s a reason most storage vendors charge approximately 30% of the cost of new storage systems for professional services just to move the data from old storage to new during a tech refresh.  This process adds operational and other costs, and directly impacts user access to data. It is not uncommon that this has a direct business impact as a result, caused by interruptions to data access for users and applications. This is why storage vendors paint data movement/migration as a complicated, risky, fork-lift upgrade, as opposed to staying within that vendor’s ecosystem and buying more of the same.

The problem is, it is distorted and misleading to assert that data management and data placement strategies should primarily be a function of the storage systems. Obviously, this choice is good for the storage vendor because it can lock customers in their storage ecosystem. But this is upside down, placing the value on the storage platform, and not on the data. And it incorrectly assumes the vendor’s preferred storage system is adequate for all storage use cases, from extremely high performance to deep archiving, over time.

Storage is always a mix of performance, capacity, scalability, functionality, access protocols, and total cost.  There are generally three storage use case classifications. High performance high cost, low performance low cost, and mid-level performance acceptable cost. Most storage systems fall into the mid-level performance class that attempts to balance performance against capacity, scalability, functionality, and cost, instead of leaning heavily in one direction or another. This is where the vast majority of storage is sold and used, including NAS, object, tape, and cloud storage. It’s the bulge in the bell curve.

So, what does this have to do with devising an active archiving strategy as part of your primary data environment?

A key principle of devising an active archive strategy is the acknowledgement that there is no one-size-fits-all storage platform, and that file movement is inevitable between different storage types over the life cycle of the data. In addition, such data movement may be driven by many different concerns, including storage cost, retention policies, use case, data protection rules, etc. Compare this with the fundamental concept of an active archive strategy, which is that all your data should be available all the time, across any storage type based upon whatever criteria you decide. If you think about it, those are not conflicting principles, but in fact are two sides of the very same coin.

When you combine those two concepts (flexibility of data movement, with persistent access to all data all the time), you can see how by approaching your storage strategy with the mindset of creating an active archive, you’ve effectively adopted a data-centric approach that places a top priority on automating data placement throughout the lifecycle of the digital assets, and not focused just on where the data is stored today. By automating a strategy that ensures that data can seamlessly and automatically move from one type of storage to another for active archiving purposes, you’ll have also ensured that migration of data across storage silos for other reasons such as tech refresh or basic tiering can be simplified, and never again fall into the black hole of disruptive and costly migration efforts as before.

The end result is that when active archiving strategies are employed holistically in this way, they become a cornerstone of good data governance practices that benefit the entire organization well beyond just the active archive use case. It is fundamentally a data-driven, or data-centric approach that helps ensure that you are making the most of your storage platform purchases, while providing your data users seamless access across any storage type, including the choice you make for the archive portion of the environment. And in the bargain, you’ll have not only improved the experience of your users by making all the data accessible at all times, but also you’ll have significantly reduced the complexity for IT infrastructure managers as they manage their storage systems over time.






Alliance Members & Sponsors