Blogs

The Battle of Backup, Archive and Data Growth

The battle between IT resources and unstructured data is fierce.  Whether constrained by budget, bandwidth or policy, storing expansive volumes of documents, photos, video and other media, while keeping up with backups and other necessary maintenance, has some pros pinned. But, new technology and methods of data archiving empower these once-pinned pros to take control of their data and put it in its place.

Pardon me, are you using that?

Sounds like a simple enough question, yet many system administrators do not have the answer to this question when assessing how “active” their data is.  There are many studies from research organizations like Gartner, IDC and others that point to a fact that most administrators know intuitively: much of the data residing on their high performance primary storage is not being used or accessed.  In addition to the inherent waste associated with this, administrators are being tasked to continuously backup this ever increasing amount of inactive data using a backup process that makes no sense. . .making copy after copy of the same unchanged data while: 1) using up CPU’s, 2) increasing storage demands (backup and restore staging capacity), 3) creating scheduling nightmares and 4) increasing the administrative burden.

Managing the Flood of Data and the Advancements in Genomics

It has become axiomatic that data of all types is growing at mind-numbing rates across all industries. According to Google's Erik Schmidt, as much data is created every two days as was created since "the dawn of civilization until 2003".  This is not news. Anyone managing an IT infrastructure today is painfully aware that the growth of data is relentless, and shows no sign of letting up.

The LTFS Tape Revolution is Underway

Tape is alive, especially in the archive. With the phase out of optical discs as an alternative for long-term storage and the adoption of Linear Tape File System (LTFS) technology, tape offers more benefits for long-term data archives than ever before.  

As companies evaluate their strategy for tiering data, there’s a never-ending struggle between the need for large-scale unstructured data storage and static budgets for many IT departments. While disk supports rapid queries and access, it can cost as much as 15 times more than tape. Considering the environmental conditions and operational costs of a disk library, it isn’t surprising that IT pros are turning to tape for long-term storage needs.

Data partitioning for fast retrieval

Blurring the Lines

Recently there have been a number of product announcements from backup vendors stating the virtues of using backup applications for archiving. One of the main reasons the Active Archive Alliance was formed was to better educate organizations about archiving and to explain why an Active Archive is superior to backup for archiving.

Let’s look at the ways an Active Archive secures data for medium- to long-term retention periods, compared to using a backup product for archiving. 

Active Archive: Flexibility, Performance, Affordability, Ease of Use

 

For many years during the “tape wars” era, as I’ve come to call them, when most major non-tape vendors were attacking the technology, companies like Spectra Logic often found themselves on the defense. If you follow any of the dialogues on LinkedIn or other forums, there has been a common theme that the only value proposition tape offers is cost. There is also opposition to this opinion, thus creating what Chris Mellor at The Register describes as a “religious war” between technologies. Instead of firing another arrow amidst that war, I’d like to take a step back and take a look at why active archives are resonating so well with resellers and customers alike: Flexibility, performance, affordability and ease of use.

Active archives combine the best advantages of many technologies, which is why software, tape and disk vendors alike are joining the growing movement. With the data volume and retention requirements of most archives, tape technologies provide some key benefits and are one of the big reasons why active archives are so appealing, however tape is only one piece of a well architected active archive. Historically, archives and especially archives to tape, have had a reputation of being hard to use, cumbersome, unreliable: in other words a headache. Today, archives to tape are the healer of headaches – not the creator. These issues are not inherent to tape; rather, they are symptoms of problems that customers need to address. The words data migration, media format changes, full restores, lost data, and backup failures all evoke negative connotations at best. The true culprit of these pains is the data management process, or lack thereof, which active archives address with both short and long-term solutions.

When it comes to infrastructure architecture, flexibility and performance are king with cost as the regulator. These are the benefits that an active archive delivers, by offering a new approach to data management, rather than simply an updated single product with new features. Active archives take the approach of offering storage and archival features that can be tailored to the specific needs of individuals, ensuring the short-term storage and long-term retention needs specific to that organization’s data are met. This is because active archive is not a single product being promoted by a single vendor or even a single market.

Active Archives as NAS

It’s understandable that people initially mistake active archive for storage tiering or HSM…however, it’s much more than that. Active archives allow any storage medium to be used as NAS storage, in the form of a CIF or NFS share. When combined with open formats, it allows a company to architect its systems in a vendor-agnostic way, allowing the use of the most appropriate product for its specific needs. Migration is no longer a major undertaking, but simply a hardware upgrade and adjustment of policies. If a technology becomes obsolete, the data can be easily migrated to a new system. If a company fails, likewise the data is not compromised. And much to everyone’s relief, as technology evolves, data can be automatically moved onto new platforms. This prevents the nightmare of realizing that you have large amounts of data sitting on obsolete equipment or formats, because active archives proactively migrate data onto newer equipment as the system changes over time.

Performance seems to be where we all get snagged. Is tape slow, or is it fast? Random access, linear access--there are many ways to represent any technology as fast or slow. Active archive properly sidesteps the I/O battle. It simply takes advantage of the equipment implemented and the policies for where and how data is retained. Regardless of where data resides, it is accessible. This doesn’t mean that you should move transactional, high performance data immediately to tape. It means you can set realistic expectations for data retrieval times, and at no point does an IT administrator have to manually restore data to get it back, provided it is at least in a library or connected via a WAN. The performance of the system is left to the storage devices implemented and the policies around the data management application.  SSD and high performance disk should be a part of a well-designed active archive to meet performance needs.

With performance and flexibility addressed, we move on to cost and ease of use. The active archive advocates, of whom I am one, have hit hard on the cost advantages of an active archive. For today, I’ll briefly note that an active archive is simply less expensive than other strategies, both in capital expense and operational expense, because it uses less expensive storage platforms for data that would traditionally have resided on higher cost systems to maintain accessibility.

All that remains is ease of use. To the user, active archive is an automated system; all files are accessible; older or archived data simply takes a little longer to retrieve. From the administrator’s perspective, active archives need to be properly set up and tuned to optimize performance for the environment. However, once configured, the administrator is not needed for file retrieval, and can easily set up a migration without having to take the system down or spend weeks on planning. Also, in a disaster, data is accessible over a WAN if it’s stored on a live DR site, or can be rebuilt from offsite tapes. In the event that a single system goes down, retrieval is simply dependent on the performance of the device that holds the second copy. 

IT administrators can manage their data more easily and in less time, instead of allowing their data problems to manage them. Components can be upgraded or replaced without overhauling the entire active archive. Thus, active archives are flexible, perform well, are cost effective, and are much easier to administer than other storage strategies—no outsourcing required.

The Cost Advantage of Tape

 

In my previous blog posts, I talked about the advantages of data tape in terms of reliability and capacity and how tape plays a crucial role in supporting active archive systems. While these factors are key considerations in the merits of leveraging tape for tier 3 storage, cost effectiveness is a major factor that favors tape as well.

Leading analysts project that organizations will need to grow their data storage capacity dramatically in the coming years as a result of the explosion of unstructured file data, regulatory compliance and the need to keep data for longer periods in active archive mode. The numbers vary, but the consensus is around 50% data growth annually (there’s no recession in data creation!) Yet, IT budgets are barely increasing, so close attention is being paid to storage-related investments that can consume significant portions of CAPEX budgets. With OPEX budgets multiplying acquisition costs by several factors, it becomes clear that storing all data on disk drives is cost prohibitive from both an acquisition and an operations point of view.

In addition, the price of electricity to power and cool disk storage continues to climb, and some areas of the power grid are already over taxed, creating the problem of simply supplying the needed power to data centers.

TCO studies from leading analysts show tape systems cost less than disk. The Enterprise Strategy Group reported a 2-4X cost advantage in backup applications using LTO-5 compared to disk with de-duplication.

For long term archiving, The Clipper Group did a detailed TCO study and reported a 15X cost advantage for LTO-5 tape vs. disk in an archiving application over a 12-year period.

In yet another recent TCO study done by the Information Storage Industry Consortium, disk system acquisition prices turn out to be 9X more than the equivalent tape system for 500TB of storage over a five-year period.

When it comes to power consumption, tape is far greener than disk and this is where the real cost savings are to be found. The TCO study from The Clipper Group shows that disk consumes at least 238X more power than tape as data on tape consumes little or no energy, and tape does not require the significant energy associated with cooling spinning disks. In fact, Clipper showed that the cost of powering the disk solution over the 12-year period is the same as the cost of an entire tape solution including hardware, media and power!

Undoubtedly flash and disk technologies play a critical role in active archive systems for certain data applications, for example where rapid access time is important. But once again, studies show that anywhere from 60 to 90% of data is rarely accessed after 30 days. So it makes sense to move data from more expensive tiers of storage to the more cost-effective tape tier. And with active archive systems, the data remains accessible. Data storage that is efficient and always available – it’s the best of both worlds! 

The Smart First Step For Big Data

You’ve probably heard the buzz about Big Data, and stories of data mining technologies (i.e. Hadoop’s solutions) that take business analytics to the HPC level, making it possible to make sense of massive quantities of unstructured data.  And you may have heard the buzz from storage vendors about how their highly scalable disk platforms enable all this number crunching. 

But if all this data has reached a magnitude to now be branded Big Data, does it make sense to keep it all on disk? And what about long-term storage of the source data that not only drives the analysis today, but is likely to be needed again at some point in the future? How many spinning disks does it take to store all the interactions of consumers on the Web or to store years of satellite images at a half-meter GSD resolution? 

The unsurprising answer is that Big Data requires scalable active archives to go along with the scalable disk storage systems.  Big Data does not live on disk alone.  With a lower-cost, long-term media like tape and intelligent software, active archives can deliver large data sets as needed, when needed to the high-performance disk storage systems for the intense number crunching. And when the analysis is complete, an active archive can preserve the results for the future. That’s a no-brainer.

But another significant characteristic of Big Data is that much of the source data is fixed content—data that never changes.  Consider transaction log files from a bank, satellite images in weather research and raw footage from the movie set.  These files are fixed content from the moment they are created, and when handled properly will never be modified. As fixed content, these files should be preserved in an archive not just to conserve disk space, but because they are irreproducible.

So my advice to those responsible for those managing Big Data is to do what a number of Atempo customers are doing with their raw data today. First, archive all your raw data sets onto a low-cost media like tape as soon as they’re created, capturing and indexing the relevant metadata so you can search and retrieve it later. Make a second copy and send it offsite while you’re at it.

Then, if the data sets aren’t needed right away, remove them from the high-performance storage. When needed for analysis, the data sets can be retrieved quickly and easily from the active archive through the file system or via search. Your raw data sets will be secure and immediately available without crowding your expensive high-performance disk systems.

Archiving raw data sets is only one way that folks managing Big Data environments can reduce their Big Data management headaches.  At almost every step in analytical workflows there are opportunities to manage data better through active archiving. By taking that first step of archiving raw data sets, you’ll get your Big Data strategy off on the right foot.

Note: Image from http://flowingdata.com/2010/08/17/stacked-area-shows-the-web-is-dead/

The Data Armageddon: Time to Learn What You Dont Know

 

When Thomas Gray inked the phrase, "Ignorance is Bliss, 'tis folly to be wise," I don’t think he considered how best to manage data in our present-day data Armageddon.   If you are a data manager and you adhere to the "ignorance is bliss" school of thought, I would recommend that you refresh your resume immediately!

I have spoken with too many people who have no idea of what is to come concerning the world’s rapid and exponentially growing data.  Believe it or not, I talked to a person at the Supercomputing show in Seattle who said they are actually moving all their data to disk and neglecting the tremendous, inherent values and benefits (low cost, high capacity and performance, to name a few) of tape.  As their data doubles each year, which he said it does, the plan is to continue adding more disk... Really?  In his case, I believe he really thinks ignorance is bliss.  I offered to share with him how customers with hundreds of terabytes to hundreds of petabytes are managing data with intelligent file systems and using both tape and disk in cost efficient ways and he refused to listen because his ignorance has caused him to believe that "tape is dead".  Granted, I don’t hear this very often anymore because the HPC community, as a whole, is paving the way for a cost-effective tape-based storage concept we will discuss later, called "Active Archive".  

First, I want to address the ignorance of the individuals who have sipped the "tape is dead" Kool- Aid from certain disk vendors over the past 10 years. Growing up as a teenager in the great state of Texas, I listened to AM radio in my first pickup truck.  (Yes, all it had was an AM radio!)  Anyway, one of my favorite radio talk shows was Mr. Earl Pitts, who addressed controversial topics and would start by sharing his straightforward opinion on them by saying (insert Texas accent)"Ya know what makes me sick, you know what makes me so angry I could spit?"… or something along those lines.  (http://www.youtube.com/watch?v=4DDhrRooNp4)  Then he would talk about something that is usually contradictory to the American way since he was a patriot who was always watching out for our true, red-blooded American values.  Well, I feel sort of like Earl when someone tells me that they think that tape is of no value, which simply shows their ignorance.  I want to say “you know makes me sick, you know what makes me so angry I could spit?".....Ignorance!  He would always end his lesson on values and truth by saying “Wake up America!”  Well, when someone tells me “tape is dead”, I want to grab them, shake them and say “Wake up!”

The reality today, regarding data storage, is that it is not folly to be wise and it is not bliss to be ignorant.  Wake up Storage Admins!  I have to admit that the number of people I talk to around the country at trade shows, in meetings, etc., are awake and aware of the ever present danger of data explosion.  So, needless to say, my blood pressure stays in check and I don’t get angry as often.  I try to keep things in perspective and just assume that they simply don’t know what they don’t know. 

My job, and that of my colleagues, both at Spectra and within the tape industry overall, is to educate as many people as possible about how to reduce the cost, complexity and fear of managing exponentially growing data.  Spectra is leading the charge to create an awareness of how valuable tape can now be in the data center.  Tape is no longer used just for backup.  It was great to see so many of our HPC customers at SC11, most of whom don’t even use the terminology of “backup” any longer.  As tape continued to mature over the last 10 years by getting 700% more reliable, faster and more dense, many of our HPC customers started leveraging the benefits of tape in what we call an “Active Archive”.  In other words, they are using tape as disk.  An active archive is a combination of open system applications, varying types of disk, and tape hardware that intelligently monitors and migrates data across multiple storage devices while maintaining fast user accessibility.  Traditionally, in the backup world, one could only access tapes and the data on them through a proprietary backup application such as NetBackup, Legato, Commvault, etc.  I’m not advocating that corporations discontinue backups all together because one should always have a “second” copy of data in the event of a disaster.  However, the premise of an active archive is that all data can be online all the time. 

Obviously, when someone has hundreds of terabytes or even petabytes, it is cost prohibitive to try and keep all data online all the time in the traditional way of keeping it all on primary or secondary disk.  With an active archive file system, the data can be dynamically distributed across multiple storage platforms including disk and tape.  Policies can determine where data is at any given time and it is transparent to the end user where that might be.  They simply have a drive letter and directory with all their files as normal.  Nothing proprietary about access to their data—anytime they need it.  By extending a file system across high performing disk, capacity disk and now tape, the need for IT intervention to retrieve an archived file is minimized, if not eliminated.  This data management approach is being used by many of our HPC customers and they are benefiting tremendously by having a searchable, compliant format to store data for the total lifecycle of a file based on policies, industry regulations and laws.

I could go on about the benefits of active archive or the inherent values that are characteristic of the tape technologies of today, but I would rather provide some links to more information on both so you can continue your own research and put aside any tendencies you might have to subscribe to the “ignorance is bliss” philosophy!  Tape is here to stay and is poised to solve your storage headaches today and in the future by offering greater efficiency, better reliability and maximum performance. So wake up!  Data Armageddon: tape’s got this one.

Active Archiving and LTO tape DRIVES the Cloud

 

It is interesting to see all the different ways that LTO tape based solutions have been used in conjunction with best-of-breed active archive solutions, while remaining below most people’s radar screens.  Thanks to the Active Archive Alliance we are now bringing recognition to these innovative approaches which will only help to further promote the benefits of active archiving and tape.  Below I discuss how LTO tape and active archiving is enabling an emerging solution for cloud-based medical record and image archiving.

Telepaxx Medical Archiving is one of the leading vendors in Europe of PACS vendor neutral archiving (VNA).  For over 10 years, they have provided DICOM based cloud storage for healthcare customers in need of long term storage of medical images.  Recently, they teamed with GRAU DATA, an Active Archive Alliance Contributing Member, to complement their cloud offering by providing a gateway for file based archiving into the Telepaxx cloud.  This allows file based solutions like content management, email archiving and Electronic Health Records to be stored in the same infrastructure as DICOM based medical images.  Now healthcare customers have a single consolidated archive for all their enterprise data, in a secure cloud based archive.

The success of this solution over the years has been Telepaxx’s ability to store data in a secure, private and cost effective manner.  It is the key attributes of LTO tape that has enabled the success of this solution over the years:

Removability – Allows extra copies of encrypted images to be stored in secure off-site vaults.

Customer Privacy – Individual tapes for each customer which insures each medical institution’s data is maintained separately.

Low power consumption – Facilitates managing multiple petabytes of images with a small energy footprint.

Future proof – Medical images are stored for extended time periods, longer than the usable lifetime of any storage technology.  LTO’s roadmap to higher tape capacities and performance coupled with an automated forward migration capability means the image is preserved for its useable life on the most recent generation of LTO

Cost Effective – The lowest dollar per GB of any storage technology enabling customers to utilize cloud storage at the lowest possible cost. 

High Scalability – Provides a small footprint in a data center while scaling into the multiple petabytes range.

Capacity on demand – Facilitates capacity expansion with minimal hardware investment.

It is all of the above key attributes of LTO technology coupled with the active archive solutions which has enabled Telepaxx to offer such unique capabilities to their customer base.  We are sure that more companies will be offering tape-based cloud services based on all the benefits mentioned above.  Telepaxx’s successful deployment of this strategy for over 10 years has demonstrated that an archive can outlive the useable life of an individual storage technology, without impact to the customer or applications using the storage.  Only the combination of active archive with LTO tape can provide the benefits of a cost effective, future proofed cloud storage solution.