Data Storage Utilization Modeling for Sustainability

August 24th, 2022 by Shawn Brume, Tape Evangelist and Strategist at IBM

According to Wikipedia carbon neutrality is a state of net zero carbon dioxide emissions. More often the term climate neutral tends to reflect the broader inclusiveness of other greenhouse gases outside of carbon dioxide, even though carbon dioxide is the largest component contributing to global warming. Neutrality is either achieved through carbon offsetting or through the reducing of emissions overall, but most often this is achieved through a combination of the two. The biggest focus for the information technology sector, is through data center CO2 emissions reductions.

Data centers are focusing on CO2e emissions reductions by improving the efficiency of energy production through renewable resources and through lower impacting energy creation methodologies and processes. Data centers are also focusing on how to reduce the energy that is consumed for compute power and storage of data, the two primary drivers of energy consumption in the data center.  The focus on energy reduction and the efficiency of shared resources in the cloud environment which represents the largest data centers in the world, is very well understood. What has been less focused on is the storage of data and the growing impact that data storage has on the overall CO2e contribution.

The majority of data center storage utilized in modern cloud environments has focused on high performance flash for data processing and production workloads, as well as HDDs for large volumes of storage and easy connectivity to the infrastructure. The use of HDDs was largely founded on the use of object storage and ease of connectivity with reasonable performance for most client infrastructures. Today however, data storage or the need for data storage is growing at such a rapid rate that there is a need to embrace new methods processes and hardware which support long term retention of data which never goes away and continuously grows, along with the ability to start with a much lower carbon footprint than HDDs or flash.

As most modern cloud data centers were built on a structure supporting HDD, methodologies of tiering data out of the HDD infrastructure into lower carbon impacting storage was never planned for utilization. As a result, large data centers have been slow to adopt technologies that are new to them, like tape. Most of this hesitation derives from long held beliefs that tape was difficult to use and didn’t integrate with file systems and object storage interfaces as many blogs on the active archive have demonstrated, this just isn’t true there are many software stacks and APIs as well as the availability to create interfaces when customization is required.  Moving data to and from tape is about setting expectations for clients and for workloads. A great example of how expectations can be set can be seen on AWS pricing page in relation to the Amazon deep glacier storage infrastructure.  How does this tie into climate neutral data storage infrastructures? It is a simple answer; start out with products that have lower embedded carbon footprints, use less energy, and have a lower total CO2e lifecycle.

What we have just discussed is where tape storage for digital data is most complimentary. Tape products are developed and deployed at a much lower carbon footprint than HDD or flash. The important thing to remember is that data must be placed in the right place at the right time. We cannot compare HDD access time to flash access time or to tape access time, the fact is all these data storage mediums are important in an overall strategy of sustainability. Keeping in mind that sustainability really means enabling a system that is meeting the needs of the present without compromising the ability of future generations to meet their needs. Consumption of data storage is expected to continue to grow at a rate of 15% to 35% compound annual growth rate for the next 10 years. By 2030 the world could be storing between 9 and 20 times more data than is currently retained. This massive increase in consumption must be accompanied by a change in how we store data our expectations in accessing that data without sacrificing the need to retain that data.

In a study conducted by IBM in 2022 that utilized publicly available data, a comparison of large-scale digital data storage deployments demonstrated that a large scale 10 petabyte Open Compute Project (OCP) Bryce Canyon HDD storage had 5.1 times greater CO2e impact than a comparable enterprise tape storage solution. this was based on a 10-year data retention lifecycle using modern storage methodologies. the energy consumption of HDD over the life cycle along with the need to refresh the entire environment at Year 5 drives a significant portion of CO2 emissions. While the embedded carbon footprint is 93% lower with tape infrastructure compared to the HDD infrastructure.

As the desire to consume more resources continues to grow globally, planning of IT infrastructures will need to begin with a lower carbon footprint expectation. This means starting every project with an assessment of the embedded carbon of a product being deployed along with the overall sustainability assessment for an entire product lifecycle. Have no doubt that in the future and today meeting sustainability requirements will have a real cost in the consumption of resources related to digital infrastructure. In the near future, corporations and companies supporting IT infrastructures should be prepared to see carbon footprint impact charges or taxes imposed upon them based on their global impact.

Active archives of data that include proper tearing of data placing the right data in the right place at the right time will be critical to reduce cost and enable future generations to derive value from the data that’s produced every year. More adoption of technologies that greatly reduce carbon emissions from day one will be assessed and utilized in the data centers of the future in much higher levels than they are today.

Alliance Members & Sponsors