Is Using Backup Software for Archiving a Good Idea?
A common problem in IT environments today is the fact that backup and archive are often conflated, resulting in too many organizations trying to use backup software to handle archive requirements. In fact, the last decade has shown that dedicated archiving products have struggled to gain widespread adoption. Archive is often perceived as difficult and disruptive, and something that is important for the long term that can be deferred for another time. Backup is seen as an immediate need that is about protecting today’s work. To illustrate the situation, let’s take the example of analog or ‘paper’ archiving, and digital archiving.
Companies have long-established processes for managing their paper archives. Therefore, transforming this data to a digital form would require time and add costs that organizations typically are not ready to invest in, and often require a complete change of their current workflows. The result is that backup is often used as a short-term solution to a long-term problem, despite the fact that backup solutions are poor substitutes for an effective digital archive strategy.
Today, we are facing a consumerization effect as enterprises, product administrators, and product end-users desire more easy-to-use archiving systems that can be self-service, or require minimal or no IT intervention. Consequently, “old school” archiving products are no longer cost-effective enough, and are too complex to implement and maintain, particularly in heterogeneous storage environments. In addition, digital archives require an understanding of multiple business and user data flows. A true policy-based enforcement for such traditional systems is required at the enterprise management layer, which often are monolithic, and inflexible. The result is that such systems were costly to implement and created disruption to users, which made it difficult for top management and justify. The result is that enterprises have to manage archive differently.
Historically, via their backup software, they managed to create an archive job in order to move data from expensive primary storage to lower cost media. However, enterprises now understand the drawbacks to this approach. First, if end users wish to retrieve data from their archive, they have to ask their IT service to move back the required information. However, the Service Level Agreement (SLA) to do such a job is not in alignment with the users’ needs and requires days to get any information back. Secondly, administrators are constrained to stick to an application/product even if they want to migrate to another one. To do so, they have to migrate all “backup” data, resulting in a lot of headaches and time. And lastly, due to compliance reasons, it is hard to verify which data has actually been archived. For instance, on which media does the data reside, in what state are the data and media, can we retrieve the data, and how we can make sure that the data will still be available for the duration required?
These are complex questions and situations that more and more customers are facing. So, how can we reduce these risks today and resolve these issues? We have observed through our experience the following opportunities:
Reducing Vendor Lock-in
Avoid using software which is linked to a single vendor to avoid lock-in. Using a storage system that can manage data copies on LTFS tape is a good choice. LTFS is a way to write data on tapes in a file system, in an open format which allows access to anybody having a LTO drive (LTO5 and higher) to read any LTFS tape.
Online Data and Metadata Catalogs
Using systems that provide online and searchable catalogs of the contents stored and archived gives users the visibility they need. With this view, they can access their data as soon as they need it. Another advantage is that due to compliance requirements, it is necessary to verify that the data has been copied to dedicated archive tapes.
Hybrid Storage and Active Archival System
The ability to virtualize a file system built on hybrid storage including flash/disk/LTFS tapes provides an attractive option. Typically, 80 – 90% of your data is rarely accessed, whereas 10 – 20% is regularly used. By using a hybrid solution, this data would be transferred to the right place in order to reduce costs while being easily accessible at any time. Hybrid storage helps to reduce the TCO cost of the solution. Solutions exist to store massive data at a scale of hundreds of terabytes for a CAPEX cost of $400-$600 per TB, and an OPEX cost of $0.75 per month per TB! This leads to significantly low investment and operating costs. Now imagine the same calculation using only disk storage.
This becomes even more compelling when automation techniques are used, which use metadata-driven control to link this type of storage and archiving solution with business applications.
It is important to understand the opportunities of a hybrid storage and active archive system. Companies should not be afraid of making changes to their archival system, as this approach would give them the chance to remove complexity and provide them with significant added value while successfully dealing with unprecedented data growth.