Why are Two Thirds of Organizations Failing to Backup and Archive Correctly?

September 14th, 2020 by Rich Gadomski, Tape Evangelist, Fujifilm Recording Media, U.S.A., Inc.

You would think, by now, that backup best practices would have moved into the same category as filling up the tank before a long drive or looking each way before crossing the street. But a new study indicates that most organizations continue to get it fundamentally wrong. How? By continuing to backup long-inactive data that should have been archived instead of remaining in the backup schedule.

The 2020 Active Archive Alliance survey found that 66% of respondents were still using backup systems to store archive data. What’s wrong with that?

  • It greatly lengthens backup windows: Repeatedly backing up unchanging archive data wastes storage resources and adds time to the backup process
  • As data sets grow, a failure to distinguish between backup and archiving becomes increasingly expensive in terms of disk space
  • Even those offloading backups to cheap cloud resources are still running up a large bill over time by unnecessarily backing up cold data
  • Archiving, on the other hand, frees up expensive capacity by moving less frequently used data to more cost-effective storage locations.

Clearing Up Backup Confusions

One of the underlying reasons for this is a confusion between backup and archiving. Backup provides a copy of organizational data for use in recovery from a data loss incident, cyberattack or disaster. These days, it is generally copied onto disk or tape and either retained there or relayed to the cloud. A key point is that backup only copies data, leaving the source data in place. It is also used to restore lost or deleted files rapidly.

Archiving is a different concept entirely. Rather than copying data, it moves data classed as inactive to a more cost-effective tier of storage such as economy disk or tape. This frees up space on higher-tier storage systems such as fast disk or flash. In addition, it shortens the backup window and offers permanent and long-term protection from modification or deletion of data.

Some of the reticence towards offloading cold data to archiving systems is due to perceptions about speed of response. Obviously, finding a lost file on disk or cloud-based backup is faster than retrieving it from an archive, particularly if that archive is in a remote vault. This becomes more of an issue when there is a poor distinction made between hot, warm and cold data. If they are not defined correctly, problems can result, and archiving can get a bad name.

According to Fred Moore, President at Horison Information Strategies, the probability that data will be accessed again drops off rapidly after one month. It typically falls below 1% around the 100-day mark. It might be 90 days or less for some organizations or as much as 120 days in others. But in any case, the vast bulk of data quickly becomes forgotten, yet can’t be deleted for compliance reasons or due to its future potential value. It is pointless to backup up those files night after night or week after week. Even incremental backup software has to scan through all those inactive files to determine if any changes have been made – this ties up time and resources.

Active Archiving Speeds Retrieval

To make an archive really work in the modern world, there is a need for intelligent data management software that can leverage metadata and automate the movement of files according to user-defined policy. This ensures that data ends up on the right storage platform at the right time in its product lifecycle. With unstructured data volumes mushrooming, manual grooming and classification of information is not feasible.

There is a need for timely retrieval of archived material and better management of data as it moves along its lifecycle from hot to warm to cold. The development of what is known as an active archive fulfills this need. An active archive is a sophisticated address to archiving that allows for automated data management, ensuring that data remains online and easily accessible without IT department intervention. Storage infrastructure and media costs are optimized as flash is reserved for hot data, disk for warm data and typically tape for cold data.

Under this approach, end-user access is transparent regardless of which pool of storage the data resides in. Regulatory compliance issues are taken care of. Rapid search, retrieval and analytics can be done, regardless of where the data sits. This is possible as a virtual file system manages data between storage systems and media

types based on user-defined policies.

It is possible to establish an active archive without tape, but it isn’t the wisest course. Disk and cloud costs skyrocket and scalability quickly becomes an issue. What really makes such a system function best is the use of a robotic tape library to keep storage costs ultra-low while offering good retrieval speeds and high reliability.

Rude Budget Awakening

Moore notes that at least 60% of all data can be classified as archival. That number could exceed 80% by 2025, he said. Anyone continuing to backup or archive all data onto disk is destined for a rude budget awakening. The answer is the right strategy to move inactive, but potentially valuable archival data to the right storage tier smoothly and rapidly while containing costs.

The hardware, software and management components of an active archive, including modern automated tape systems, facilitate such a strategy. Any organization dealing with hundreds of terabytes or more of data, therefore, should seriously consider the active archive concept and find out more about the capabilities of modern tape systems.

For more information:

Active Archive and the State of the Industry 2020: A New Age Dawns for Digital Archives
Archival Data Storage: Managing the Archival Upheaval by Horison Information Strategies


Alliance Members & Sponsors