AI Demands High Performance: What Can an Active Archive Offer?

June 16th, 2025 by Active Archive Alliance

AI requires the ultimate in performance: a vast number of CPUs and GPUs tightly packed in highly dense racks that applications can access information with zero latency. It demands huge amounts of power to operate the data center, as well as liquid cooling designs that can keep chips and equipment cool. That, in turn, requires the construction of a new generation of data centers that are purpose-built to serve AI workloads. But AI also requires fast access to ever-larger amounts of archived data

“We are seeing significantly larger archive requirements, especially in High Performance Computing (HPC) and AI,” said David Thomson, Senior Vice President of Sales and Marketing at QStar Technologies, during the recent Active Archive Alliance video conference, AI Needs Active Archive.

He laid out the archiving requirements for HPC and AI:

 

*Active archives must be able to work with many PBs and even EBs of data.
*When that data is not being accessed, it needs to be parked on the lowest-cost tier of storage available.
*It should avoid using large amounts of cached disk storage as this kind of solution can greatly increase cost.
*For tape to work well with AI at the appropriate level of performance, many tape drives are needed, orchestrated to work seamlessly in parallel. Many tape drives can be addressed as a single pool using a multi-drive configuration option or as a redundant array of independent tapes (RAIT).
*High reliability is vital – critical data must be protected with options for media mirroring or replicating to a second system such as the cloud.

Take the case of the 100Gbit+ networks often used by AI applications to eliminate bottlenecks. By using a parallel arrangement of tape drives, 25 LTO9 drives would saturate this network at native drive performance of 400 MB/sec per tape drive.  Thomson advocates multi-node (2 to 64) tape systems with a global namespace as a good match to modern AI needs. Furthermore, by setting up systems that can utilize multiple protocols for access (SMB, NFS, and S3), the largest tape libraries from all vendors could be used or combined to serve all AI workloads.

David Boland, Vice President of Cloud Strategy at Wasabi, added that active archives have to be fully integrated into the overall AI data pipeline. AI workloads, after all, cover a wide range of use cases, and more are steadily being introduced. They include hyper-personalization, recognition, patterns and anomalies, goal-driven systems, autonomous systems, predictive analytics and decision-making, and conversation and human interaction. Current applications include self-driving cars, advanced robotics, drones, knowledge discovery, search and data mining, segmentation, clustering, and content generation – the list goes on.

Regardless of the use case or workload, however, all must follow a multi-step AI pipeline to be effective. There is no point in having the highest level of performance in one area only to be defeated by bottleneck after bottleneck as the data traverses along the pipeline. Of course, different types of data might have variations in their pipelines as they may have to make use of different hardware, software, and other elements.

Imagine if the data ingestion step of the AI pipeline was slow. The fastest AI engine in the world would be greatly slowed down.

“As the data moves along the pipeline from ingestion to raw data, data transformation, model training, model validation, production inferencing, and archiving, each step must contribute to the whole – and all should be streamlined to support AI performance and accuracy needs,” added Boland.

Clearly, then, different storage media and systems would be required at different points on the journey. High-performance file systems would be required in the training and validation steps. Active archive systems would fit in between the production and archiving steps of the pipeline, able to provide sufficient performance to support rapid data retrieval and the needs of inferencing.

Within such a pipeline, archiving technology would support compliance and regulation. Looking ahead, it is likely that it will eventually need to record every AI decision and every piece of data input into the model as part of an audit path.

 

Alliance Members & Sponsors