Data Management, Object Storage and Tape Technology are Increasingly Important for Active Archives
The continued exponential growth of data volumes, especially of unstructured data, will remain the dominant challenge for data storage. Most of the data growth is driven by unstructured data like audio/video data across verticals such as manufacturing, life sciences and entertainment.
Intelligent data and storage management software will meet this challenge by integrating different storage technologies (flash, disk, tape) and architectures (file systems, object storage), so that data is stored in the most appropriate storage class according to its use and purpose. As data becomes inactive at an ever-increasing rate, an economical, technology-independent active archive tier will be of central importance.
This raises the question of which storage concepts and technologies are suitable for integrating an active archive tier into the storage infrastructure.
For an active archive to fulfill its purpose, it must not be considered as an isolated component, but it must be integrated into the storage infrastructure. HSM/ILM or tiering concepts have existed for decades. They have matured to serve this purpose. What is important here is to make corresponding approaches work with modern concepts such as object-based storage. Object storage systems are basically designed for large volumes of inactive data. The cost per terabyte is usually very attractive, thanks to the use of many large high-density hard drives with efficient erasure coding. A good size to start with is typically somewhere in the three-digit TB range. But in the age of big data, analytics and IoT, HDD-based object storage systems fill up quickly. With more and more data being saved to HDD-based object storage systems, the question arises as to whether active and inactive objects should both be stored on the same class of system. The hyperscalers (e.g. AWS and Microsoft Azure) are showing the way, offering their customers a wide range of different classes with data being transferred automatically in line with their life cycle policy. The key is to find a storage technology that is more cost-effective for storing inactive data than hard disk drives. This is undoubtedly tape.
Tapes are a future-proof storage medium for countering the challenges of rising data volumes, storing unstructured data at high speeds and acceptable cost levels. Additionally, tapes provide an “air-gap” to protect data, e.g. against ransomware.
The next question is how to actually store object data on tapes. HDD-based object storage systems do not offer a direct connection to tape libraries. In any case, it is not advisable to copy objects to a file system before storing them on tape. The risk of information loss arises from a number of different technical factors. Instead, the S3 objects themselves must be saved directly to tape. This requires data management solutions that receive data via an S3 interface and then write it to tape. The S3 RESTful API provided by data management software offers the prerequisite to write very high data volumes to many tape drives and to read from them at the same time very quickly. Additionally, a tape-based object storage software can support erasure coding procedures for tapes to ensure data security.
In addition to tiering inactive objects from HDD-based object storage to tape, there are other use cases for tape-based object storage as active archive. Users and applications can directly use the standardized S3 interface to store data in the long-term on tapes for archiving purposes. The management software provides WORM protection and retention management to meet compliance requirements. In conjunction with a file-based HSM/ILM solution, tape-based S3 object storage can be ideally integrated as an archive tier. Inactive files that waste expensive primary storage space are automatically offloaded to tape and archived by the HSM/ILM software. This approach saves money at the primary storage tier while meeting archiving requirements.