2020 Vision: Data Storage and Active Archive Predictions

January 13th, 2020 by Meredith Bagnulo

Over the last several years, technology vendors have introduced new analytics and big data technologies that allow organizations to tap the value of massive volumes of archival data. At the same time, new applications such as the Internet of Things (IoT), artificial intelligence, and machine learning are putting pressure on today’s data storage infrastructures.

These and other data growth challenges facing the digital archive market are expanding the role of active archives in the data management lifecycle. Members of the Active Archive Alliance recently shared their 2020 predictions for data storage and active archives.

Here are some of the top trends to watch:

IoT and AI Generate Demand for Active Archive
“The exploding Internet of Things (IoT) market with billions of installed devices will continue to expand in 2020, especially as 5G networks start to proliferate. Artificial intelligence (AI) tools will provide the analytics power to derive value from all of the big data generated by countless IoT devices. As a result, the combination of IoT data and AI will have a profound impact on the need for storage. Organizations will want to maintain access to data sets for longer periods of time to support a continuous cycle of data ingest, analytics, and inference. A dramatic increase in cost-effective and efficient storage capacity in the form of active archives will be demanded to support this model.” – Rich Gadomski, Vice President of Marketing, Fujifilm Recording Media U.S.A.

On-Premise Cloud Will Drive Change in How File Systems are Used
“Virtualization providers are finally partnering with cloud providers to unify the experience between on-premise and off-premise cloud services. This trend is just starting now and will drive large pivots in the way we approach on-premise storage solutions. We expect to see smaller file systems that are localized in the equivalent of a “VPC” rather than larger shared file systems that we traditionally manage. Suddenly enterprises will go from dozens of large NAS shares to thousands of smaller siloed “software-defined” storage volumes. This trend drives the need for solutions that can provide visibility and data intelligence across these new storage silos to provide better cost management with tiering and other metadata-driven automation, and to eliminate over-provisioning and operational complexity.” – Floyd Christofferson, CEO, Strongbox Data.

Scale-out File Storage for On-Premise Unstructured Data Active Archives
“Modern file storage solutions deliver performance and economics in a single-tier solution managed by intelligent caching.  Object storage is not the best fit for on-premises customers seeking simplicity to deliver to performance applications and to retain cost-efficiency. It was developed as a precursor to webscale technology and as the storage medium for web technologies. It was meant to be great for datasets that approach the exabyte data level and are geographically distributed. In addition, object storage was easy to place on inexpensive hardware and became the default option for archive on premises. This ease had nothing to do with the nature of objects but had to do with the relatively scalable systems that could handle lots of unstructured data on less costly hardware. In 2020, we believe the on-premises object storage market will evaporate and will become wholly file-based.” – Molly Presley, Global Product Marketing Director, Qumulo.

Healthcare Systems Will Engage in Enterprise-Wide Data Management Planning
“In the healthcare space, M&A activity will continue to spur system transitions. Rather than treating legacy data as an afterthought, we’ll see more healthcare systems engaged in strategic and proactive enterprise-wide data management planning and embracing true information governance and data stewardship. Increasing demands for discrete data will drive the adoption of sophisticated archiving solutions that support a variety of reporting tools to meet emerging trends, including population health initiatives.” – Julie Fogel, Director of Marketing, MediQuant.

Realizing the True Nature of Active Archive Data
“The key difference between primary data and archived data is how often the data is used and how often it changes. Primary data storage needs high performance, snapshot-based backup solutions. Active archive data needs NO Backup, as active archive solutions protect data through multiple copies, perhaps on multiple technologies, perhaps using encryption and often using cloud or removable tape media for disaster recovery purposes. Identifying when data should be considered for “archive” is critical and AI will play an increasingly important part in this over the next decade. Once identified though, a flexible archive gateway solution is required. One that is not “point-in-time” but changes as archive content changes from very active, to less active and finally to important data which is very infrequently accessed. Content once archived will not be moved from one archive tier to another (creating an opportunity for data corruption). Higher performance, more expensive copies on disk-based storage will be deliberately lost (deleted) over time while retaining longer term, lower cost copies typically on tape, which will continue to be the safest and most economical storage option available for many years to come.”-  Dave Thomson – Senior Vice President, Sales and Marketing, QStar Technologies Inc.

Cloud Egress Cost Will Start to Drive End-User Strategies
“Cloud storage was all the rage in 2019, but some of the larger users of these storage-as-a-service offerings are discovering that not having a local copy is the driving factor in their storage cost. As a result, this is forcing them to take a closer look at hybrid cloud, especially having a local copy while still utilizing the cloud as the second copy. This allows for a local restore and usage of data without egress costs and a cloud-based copy for DR and sharing.” – Matt Starr, CTO of Spectra Logic.

Machine Learning & Predictive Analysis Will Further Impact Data Ingestion and Expression
“In 2020, we expect the speed, quality and predictability of data transformation and load processes to be further automated. As its models are exposed to big data sets, machine learning will advance the ingestion, indexing, and identification of information by identifying reliable and meaningful patterns. Once organized and archived, data can then be expressed in a variety of ways through predictive analysis and interoperability. In healthcare, the application of chatbots and algorithms to active archives of legacy medical records could yield new ways to discover, communicate, research, diagnose and treat illness.”  — Shannon Larkin, Vice President Marketing & Business Development, Harmony Healthcare IT.

Artificial Intelligence Will Help Rein in Unstructured Data Volumes
“The challenge of rising unstructured data volumes is stimulating emerging data management technologies which use artificial intelligence to boost performance and hardware and software capacity.” – Herve Collard, VP of Marketing at Atempo. 

Tiering of Data Leveraging Device, Media and Fabric Innovation, Will Expand Not Contract
“There will continue to be strong exabyte growth in read-centric applications in the data center, from AI, ML, and big data analytics to a variety of business intelligence and accessible archive workloads. These at-scale use cases are driving a diverse set of performance, capacity and cost-efficiency demands on storage tiers, as enterprises deliver increasingly differentiated services on their data infrastructure. To meet these demands, data center architecture will continue advancing toward a model where storage solutions will be consistently provisioned and accessed over fabrics, with the underlying storage platforms and devices delivering to a variety of SLAs, aligned with specific application needs. And while we certainly expect to expand the deployment of TLC and QLC flash in these at-scale, high-growth workloads for higher performance use cases, the relentless demand for exabytes of cost-effective, scalable storage will continue to drive strong growth in capacity enterprise HDD.” – Phil Bullinger, SVP and GM, Western Digital.

As we embark on a new year, the Active Archive Alliance is poised to bring together the best technologies and solutions to help organizations more effectively manage and access their data over the long term. Here’s to making data management more efficient and successful in 2020 and beyond!

Alliance Members & Sponsors