Blogs

Workflow Step One: Archive?

A new paradigm for big media and post-production workflows in a high resolution world

Ever tried standing on your head to give you a new perspective on a problem? It sounds a bit silly but you never know, it might just do the trick. This unorthodox method seems like what a few media & entertainment storage architects have begun doing to help solve their problem of exploding storage due to higher resolution, stereo imagery, and frame rates (4K, 5K, 8K, 3D, 48 fps, 60fps). Architects are turning their workflow on its head starting with the last step, first.

Software-Defined Active Archive Storage

Active archive is a relatively recent concept, promoted since a decade ago, with several real commercial products available. In a nutshell, an active archive solution offers users and applications easy access to archive data via a file interface that is often “mounted” on the user system or application server. It provides users access to archive data, much like other product systems that access data via a file system, and doesn’t change the user experience between production and archive systems. It exists in many different flavors of such approaches, some still based on tape with intelligent front-ends, and some others that extend object storage solutions with file interfaces via gateway or native implementation.

The Business Value of Combining Disk and Tape for Active Archive

A recently published research paper from analyst group ESG reported that more than 82% of tape-using respondents surveyed anticipate increasing or maintaining their organization’s use of tape technology for long term data retention.  So why does tape remain so popular for archive in general and active archive in particular? The answer lies in the on-going business value of the technology.

Is the Time Right for an Active Archive?

In an analog world, an archive is where information is put to rest. It is put on the shelf and there it stays, gathering dust. But in a digital world, all information should be online and accessible to satisfy both the immediate and long-term needs. This is particularly important as more and more companies are seeking to extract added value from their legacy data, whether by monetizing it through repurposing to other uses or by gaining business intelligence from Big Data analytics. 

Clean or dirty, is that really the question?

I recently read “Clicking Clean, How Companies are Creating the Green Internet,” a very interesting report by Greenpeace published this past April. The report reviews the “clean” vs. “dirty” power usage by many of the Internet giants like Amazon, Google, Apple, eBay and others to run their vast data centers. The report shows what percentage of their power is from clean sources, such as solar or wind, vs. what percentage is from dirty sources like coal, gas or nuclear fired power plants. The report rates each company with grades ranging from “A” to “F” based on their renewable energy efforts.

But regardless of the type of energy used by these company’s data centers, an even bigger question might be: how do they reduce their energy consumption in the first place?

Why Copies of Data on Disk Alone Is Not a Good Active Archive Strategy

The father of theoretical computer science, Alan Turing, once said, “We can only see a short distance ahead, but we can see plenty there that needs to be done.” The same sentiment holds true in enterprise IT planning, considering that the average company keeps data for 15 years and some data requires indefinite retention. Unstructured data now represents the majority of data being stored, and this is exacerbated by the fact that more than 70% of disk capacity is mis-used[1]. So how do storage managers meet these challenges with decreasing annual budgets and the cost of storage representing between 33 – 70% of every dollar spent on IT?

Permanent Active Archives and the Cloud

In many industries archived data is considered to be the lifeblood of an organization. Broadcast and media, life sciences, oil and gas exploration, research institutes all create data that must be archived indefinitely. The information they create has significant value to each organization and so preserving that data in an active permanent archive environment makes good sense.

Storing archive data in the cloud, either using private or public cloud, is becoming a popular choice, particularly for long-term archives. Private cloud providers typically use an object storage solution that offers self-managing, self-healing technology, automatically recreating data on new media, should copies degrade. They continue to work as long as you keep adding new media to the storage pool. Public clouds typically offer similar functionality. They remove the responsibility for adding more media to a third party. As long as the user pays their monthly bill, data will be preserved forever.

Permanent Active Archives and the Cloud

In many industries archived data is considered to be the lifeblood of an organization. Broadcast and media, life sciences, oil and gas exploration, research institutes all create data that must be archived indefinitely. The information they create has significant value to each organization and so preserving that data in an active permanent archive environment makes good sense.

Storing archive data in the cloud, either using private or public cloud, is becoming a popular choice, particularly for long-term archives. Private cloud providers typically use an object storage solution that offers self-managing, self-healing technology, automatically recreating data on new media, should copies degrade. They continue to work as long as you keep adding new media to the storage pool. Public clouds typically offer similar functionality. They remove the responsibility for adding more media to a third party. As long as the user pays their monthly bill, data will be preserved forever.

However, due to the demise of some well-known public cloud providers (Iron Mountain Digital and Nirvanix for example), users of cloud are strongly advised by analyst organizations such as Gartner Group to create a cloud exit strategy before signing. Gartner Group published a guide in 2013 called “Devising a Cloud Exit Strategy: Proper Planning Prevents Poor Performance.” In addition, Henry Baltazar, senior analyst at Forrester Research said “one of the most significant challenges of cloud storage is the difficulty of moving large amounts of data from a cloud.”

To my knowledge, it has not happened yet, but inevitably a private cloud / object storage vendor will exit the market, stop trading or stop supporting their product at some time in the future. It is therefore equally important to plan for this eventuality. Take for example the difficulties and expense for users of the now “end of life” EMC Centera. A user can expect to pay around $2,000 per TB to migrate data out of Centera, and avoid bringing data back to their primary applications and re-archiving them to a new archive environment.

Active archive solutions can help in two ways:

1)      Incorporating an active archive file gateway solution with object storage or cloud separates applications from their archived data, allowing for simpler and more efficient migrations to take place controlled by the gateway. From the users / applications perspective, nothing has changed; data is moved from one archive technology to another, in background, without impacting the user workflow.   

Of course, the counter argument is that all we have done is move the problem of cessation from the object storage / cloud provider to the gateway provider.  Therefore the second method is perhaps preferable for permanent archives.

2)      Create hybrid active archives using object storage / cloud and a second store to low cost media (like tape) using an industry standard media format (like LTFS).

In this way, for a low additional cost, all archive data is preserved on a long-lasting, application independent storage medium, which offers a fast and efficient method of getting archive data into a new environment. Tape is very fast in reading/writing large data sets. This removes the need to migrate data OUT of anything, just stop the cloud service once the data has been written to the new archive.

There are, of course, other considerations for an active permanent archive, such as tape to tape migrations to keep old data on old tapes readable and using non-application specific file formats to ensure data can be read on new applications in the future, but those are topics for another blog.

It is a Wrap! NAB is Over but the Lessons Continue to Resonate

I recently spoke to a colleague who is a 20+ year veteran of attending the NAB Show. He has been through the ups and downs of the industry and show, and he told me that this year, he felt an energy at NAB that he has not felt in some time – the aisles were full with visitors coming to the booth more energetic than ever. I am a bit of a “veteran” myself, attending my first year in 1998. I appreciated my colleague’s feelings and would agree with him too, that the crowd felt alive. At this year’s show, the one common theme on everyone’s mind was archiving – what’s the best way to manage, archive and use data generated by media and entertainment.

Excitement Around Storage

The National Association of Broadcasters (NAB) is the world's largest electronic media show covering filmed entertainment and the development, management and delivery of content across all mediums. This conference is also the 3rd largest annual conference held in Las Vegas (behind The Consumer Electronics Show (CES) and World of Concrete).

This year, attendance jumped to 96,000 attendees—comparable to a small city’s entire population. And what an exciting event this turned out to be!