Backup Less, Archive First
The Active Archive Alliance turns 10 years old this year. As a founding partner, QStar Technologies is pleased that the Active Archive concept is even more relevant today, than it was 10 years ago. But why is that?
Today, many organizations use tiered storage more than ever before. NVMe based primary storage provides amazing performance but due to expense is purchased in relatively small capacities. This ensures that right from the beginning organizations know they can only use NVMe as short-term storage. SATA based SSD adds another layer (tier) with good performance and then comes several distinct classes of hard disk, with much higher capacity and lower cost per TB, some of which is frequently used for archive.
Hard disk-based archive technologies can be accessible through a file system (SMB Share or NFS Mount) or perhaps using Cloud Objects (S3 or proprietary APIs), but many have a common theme, data stored there should not need a third-party backup product to protect the data. In my opinion this is the key differentiator between storage and archive. Content is protected using replication or erasure coding, such that if some data is lost, the archived content can still be retrieved AND that content can be recreated by an internal mechanism to bring the archive store back to 100% availability.
The same can be achieved with removable media such as LTO tape, but at even lower cost per TB. Content is replicated so data on media is identical. If media is damaged for any reason, it can be removed and discarded because there is an identical copy which can take its place. That media is then copied to ensure that 100% availability remains. If required, for very little additional outlay, third or fourth copies can be made, and potentially stored offsite for DR protection.
The bane of almost all IT Administrators is backup and restore, as the process takes massive amounts of staff time to operate and test restore operations, so defaulting to a solution that needs no backup surely is a smart idea. Traditionally data “falls” down the hierarchy as it is used less frequently, but with an Active Archive “Archive First” approach, data is initially stored to the most secure platform (the archive) and is then proliferated to the appropriate tier through caching.
A cache is designed to always be 100% full – for the most frequently used data. Older data is automatically superseded by new, more relevant data, safe in the knowledge that the data is stored in the archive.
This is exactly the way most hospital IT solutions work as patient images MUST be kept by law for many years. Images are immediately archived and then retrieved as needed to a fast cache for diagnosis and doctor referral. The same applies to M&E film or sports TV production. Raw footage is an asset of the production company and therefore has value. This content is written immediately to the archive and only the required elements are retrieved to fastest storage – when needed. Oil and gas exploration, seismic studies, satellite and space telescope systems all work in this way due to the huge capacities involved, so this is not a new concept.
By storing content this way there are fewer steps for data to be corrupted, copying files from one storage location to the next and to the next – as content falls down the hierarchy. Data can also be better protected from ransomware as users and applications do not necessarily have direct contact with the “real” data – they are interacting with a cached copy. This also significantly reduces the need for backup as only versions of changing files need to be protected, until they are committed to the archive store.
This solution works well when the archive can be accessed as a single massive file system (Global Namespace) or object store. Having multiple archive destinations is inefficient as it creates siloes, perhaps with the same data, but making searching for content much more difficult. For disk-based archive solutions we have seen recent growth in scale-up, scale-out NAS environments supporting trillions of files and Exabytes of potential capacity under a single share or mount using node-based architecture. The same applies to Object Storage systems that also use node-based architectures to store objects rather than files. Some of the manufacturers of these systems are partners in Active Archive Alliance.
Tape archives have traditionally used server or appliance-based architectures and although large, they have not been able to provide the same massive single file systems that scale-up, scale-out NAS does. Until now…
QStar Technologies is nearing completion of a node-based tape archive software solution, creating an environment that can support Exabytes of capacity and trillions of files – which we are calling Global ArchiveSpaceTM. This tape-based archive can present tens of thousands of media / slots through a single share, mount or S3 bucket (archive with one and read with another) and can use tens or hundreds of tape drives as shared resources.
So, consider archiving as much data as you can, as early as you can, to better secure your data and significantly reduce your need for backup.