Blogs

Should You Be Archiving? Here’s How to Know for Sure

By David Cerf

Active archiving is centered on the convergence of various storage technologies to create a balanced, accessible and affordable method for storing data long term. But, what if you don’t know how much of your data belongs in this “archive” category?  The truth is: the seemingly easiest method of use what you have, fill it up and buy more gets unnecessarily expensive. This is especially true if you’re using a high-performance storage array for data that doesn’t need those performance capabilities.  

While each company may define archive differently, here are a few common criteria for “archival” data:

  • Data is not accessed regularly
  • Data was created more than 1 year ago and has not been accessed in more than 90 days
  • Data has not been modified or accessed in more than 90 days

Sometimes companies simply don’t know how much data they have that should be archived. Fortunately, there are free tools like the storage assessment at www.freemystorage.com that can help you answer this question by showing dynamic reports  and transparent views into the state of your storage environments.  

With detailed results on the active state of storage and the file make-up of active/inactive data,   assessment tools can help organizations understand how to free up their storage and avoid unnecessary upgrades and over-provisioning. It can help organizations save thousands of dollars each year and reduce operating and capital expenses.  

Why should end users think differently about their storage?

The Active Archive Alliance thinks there is a better option to meet the growing demand for storage. Data is growing faster than budgets, driving a need for more cost-effective storage and protection. While high-performance storage keeps line-of-business data and applications quickly available, up to 80% of data on primary storage is often inactive and doesn’t need to claim expensive storage capacity. Storage needs to be intelligently balanced between active and inactive data, on-site and off, and must meet both performance requirements and budgets. Active archiving can deliver intelligent storage management that combines tuneable, user-defined policies for capacity optimization (no more over-provisioning) and will:

  • Simplify nearline and archival storage management
  • Free up storage capacity 
  • Reduce backup
  • Simplify data protection
  • Protect long-term content
  • Improve storage costs by over 50%

Discover the benefits of reduced primary storage cost with automated data movement into an active archive solution that best fits your needs. Be sure to choose one that can transparently move files to your archive from primary storage like NetApp, Windows, Isilon and any CIFS/NFS or object storage.

Get started at FreeMyStorage.com for a free assessment to help you take control of your storage. 

The Impact of Object Storage on Active Archiving and on Tape Usage

By Rich Gadomski

Object storage has made great strides in the decade since EMC released its Centera system. Since then, numerous other vendors have brought their own object storage systems to the market. Object storage is great for storing unstructured data, since it separates the metadata from the data so the storage system isn’t dependent upon the particular file system or block storage structure. Additionally, administrators do not have to worry about matters such as setting RAID levels or building and managing logical volumes.  Lastly, from an integration perspective, object storage is a good platform for archiving because it is massively scalable, cost effective, and is able to act as a cloud infrastructure for collaboration.

However, there was one major problem in using object storage for archiving – at least until recently.

“You can't do it -- get object storage taped, I mean,” wrote Chris Mellor, storage editor at The Register in March 2012. “There is no way to get the contents of an object storage system onto tape. Instead, it has to stay on spinning disk forever.”

And since it had to stay on spinning disk, this meant continually buying more storage arrays, as well as laying out all the support, networking, licensing, power and cooling needed to keep those disks spinning.

“As the amount of data to be stored grows and grows, tape will become the lowest-cost option,” wrote Mellor. “For high-volume data archive capacities, disk economics suck, and it’s no use pretending data deduplication and thin provisioning can change that. … What is needed is a way to drain off cold, inactive objects from disk and stuff them into a tape archive. Isn't it obvious?”

Well, three years is a long time in IT, and apparently, tape storage vendors did think it was obvious that they should support object storage. A year after Mellor wrote his plea, various storage vendors began releasing tape systems that could store objects including many from the Active Archive Alliance that have released tools to make object storage feasible in a tape environment.

A good example of this is Fujifilm’s Dternity NAS which allows for both file and object storage on LTFS tape media in its active archive solution. By utilizing a standard S3 interface with an underlying RESTful API, cloud storage users can connect to Dternity directly without needing to program special calls or APIs. Active archives managed by Dternity NAS are easily accessible by CIFS/NFS or S3.

The bottom line is that tape is a viable place for object storage. This opens the door to massively scalable object stores comprising billions of graphical images, for example. Not only is it possible to achieve this, but doing it on tape, which recently demonstrated 220 TBs on Barium Ferrite media, means that it can now happen in a cost effective manner. 

Activating an Active Archive with MAM

By David Miller

The flood of information into modern organizations will not ebb any time soon. Connected workers and connected devices link staff and customers 24/7. Smart phones and tablets make producing, sharing, and consuming rich content easier than ever. Professional camera formats have increased in resolution from SD to HD to 4K and beyond. 

In the absence of a better method, networks, storage, and processors will be tied up transferring and transforming this content from office to office, country to country, between organizations and their audiences. The obvious challenge for any organization trying to stay afloat is optimize storage and infrastructure to control costs.   

The end game for cost control is to have content with the fewest number of copies stored in the repository, along with the greatest long term security and lowest ongoing operating costs:  an LTO active archive system. Depending on the economies of scale for a centralized archive, it may be cheaper to have localized archive pools near the locations where the content is generated and commonly used.  

The challenge? Linking the archives together and providing universal access.

The obvious challenges to maximizing the use of archived storage is the ease of use of the localized/desktop system, the demanding nature of non-linear video editing systems and users, and similar system for rapid access to large video files with low latency, and the difficulty or inefficiency/laziness for moving current content to the archive. Localized primary storage is the default for the infrastructure and mindset of most users, so any active archive must have intuitive access to the content in order to be successful. Proprietary storage systems for collaborative video editing are very expensive, but editors and IT staff are loathe to archive content for fear of taking media for other current productions offline. Finally, time challenged staff are unlikely to move content from primary storage to archive without easy tools and systems to do so.

Enter the management system.  

Storing proxy copies with rapid and universal access from a simple interface (a la You Tube) with simple archive and restore functionalities allows organizations to maximize the benefit of their active archive.

  1. A desktop browser interface and/or tablet interface allows users immediate access with minimal IT support. They can maximize productivity if they can search and re-task all of the content in the centralized archive.
  2. If the system can track all of the media in editing projects, then all of the elements in finished products can be archived while those elements still in use in other productions can be maintained in the editing storage.
  3. Simple archive and restore tools for individual digital assets or entire groups will allow users to quickly and efficiently archive content. Automated processes can move long unused content into the archive while maintaining access for preview, collaboration and restore functionalities if needed.

 

Further advantages can be accrued if the federated search and browse can allow access and control over archived content in multiple locations. Content can be archived near where it is produced or where it is most likely to be needed in the future. This saves time and money for moving content to the centralized archive and then moving it back to a different location when needed.

Nimble access to content is the key to enjoying the benefits of an active archive system.

How to Avoid 6 Costly Cloud Storage Snafus with a High-Powered Active Archive

By, David Cerf

Storage requirements are growing at 40% year over year, and that’s a top concern for 1 in 4 CTOs. It's nearly impossible to talk about modernizing IT without mentioning the cloud. Among users, cloud storage solutions are as ubiquitous as iPhones, but for enterprise IT, the cloud presents unique challenges. Relying on pay-for-play cloud services can send costs skyrocketing. Sure, the cost for cloud storage is cheap, but the service fees pile on as the need to access data arises. Shouldn’t your data remain just that – your data?

Instead of letting cloud services evaporate your IT budget, here are six ways to avoid the risk, high costs and complexity of typical cloud storage by deploying an active archive instead.

  1. Know the true costs. Not all storage is alike. Cloud storage includes the cost per GB as well as transfer fees, access fees and other fees to get your data back. Hybrid active archive solutions can slash costs by 70%. With no hidden charges, it's easy to calculate costs and meet budgets today and in the future.
  2. Choose non-proprietary solutions. Make sure that you can get your data securely whenever you need it. Technology continues to evolve as do your business needs. Pick a vendor that does not lock up your data with proprietary systems. Active archive solutions that use LTFS tape can give you peace of mind in knowing that your data is always readable and recoverable.
  3. Ensure your data is safe and secure. Don't be the next company making headlines for the wrong reasons. By maintaining a clear chain-of-custody, you'll bypass painfully long retrieval times associated with bandwidth limits. Hybrid active archive solutions can provide storage onsite and offsite to seamlessly work with your current applications and processes.
  4. Read the fine print. Check your vendor's SLA and make sure you understand how your data is protected, secure and accessible.
  5. Simple is always better – Occam's razor. Your solution needs the intelligence to ensure that your data never changes or goes missing. An easy-to-use web interface for accessing data is also key.
  6. Leverage what you already own. Gateway technologies in combination with your existing storage can deliver rapid ROI. Look for vendors that can deliver online, nearline and archive in a seamless solution.

Many active archive solutions have incorporated S3 interfaces, allowing these solutions to serve as a simple target for offloading data from a cloud service such as Amazon Simple Storage Services. Because many of these active archive solutions employ hybrid storage architectures with technologies like LTFS tape, they can be intrinsically protected. Instead of offloading to disk alone, users can leverage a system that will provide automatic data protection at the most economical cost per gigabyte available.

Before you consider “inexpensive” cloud storage for your data preservation, be sure to check out the offerings from active archive solutions. The ability to take new technology combined with familiar storage methods can be even more effective than simply scuttling data storage off to a cloud service provider where you might lose access, risk security, or pay more than you should. 

How to Avoid 6 Costly Cloud Storage Snafus with a High-Powered Active Archiv

Blog for Active Archive Alliance
By, David Cerf

Storage requirements are growing at 40% year over year, and that’s a top concern for 1 in 4 CTOs. It's nearly impossible to talk about modernizing IT without mentioning the cloud. Among users, cloud storage solutions are as ubiquitous as iPhones, but for enterprise IT, the cloud presents unique challenges. Relying on pay-for-play cloud services can send costs skyrocketing. Sure, the cost for cloud storage is cheap, but the service fees pile on as the need to access data arises. Shouldn’t your data remain just that – your data?

Instead of letting cloud services evaporate your IT budget, here are six ways to avoid the risk, high costs and complexity of typical cloud storage by deploying an active archive instead.

  1. Know the true costs. Not all storage is alike. Cloud storage includes the cost per GB as well as transfer fees, access fees and other fees to get your data back. Hybrid active archive solutions can slash costs by 70%. With no hidden charges, it's easy to calculate costs and meet budgets today and in the future.
  2. Choose non-proprietary solutions. Make sure that you can get your data securely whenever you need it. Technology continues to evolve as do your business needs. Pick a vendor that does not lock up your data with proprietary systems. Active archive solutions that use LTFS tape can give you peace of mind in knowing that your data is always readable and recoverable.
  3. Ensure your data is safe and secure. Don't be the next company making headlines for the wrong reasons. By maintaining a clear chain-of-custody, you'll bypass painfully long retrieval times associated with bandwidth limits. Hybrid active archive solutions can provide storage onsite and offsite to seamlessly work with your current applications and processes.
  4. Read the fine print. Check your vendor's SLA and make sure you understand how your data is protected, secure and accessible.
  5. Simple is always better – Occam's razor. Your solution needs the intelligence to ensure that your data never changes or goes missing. An easy-to-use web interface for accessing data is also key.
  6. Leverage what you already own. Gateway technologies in combination with your existing storage can deliver rapid ROI. Look for vendors that can deliver online, nearline and archive in a seamless solution.

Many active archive solutions have incorporated S3 interfaces, allowing these solutions to serve as a simple target for offloading data from a cloud service such as Amazon Simple Storage Services. Because many of these active archive solutions employ hybrid storage architectures with technologies like LTFS tape, they can be intrinsically protected. Instead of offloading to disk alone, users can leverage a system that will provide automatic data protection at the most economical cost per gigabyte available.

Before you consider “inexpensive” cloud storage for your data preservation, be sure to check out the offerings from active archive solutions. The ability to take new technology combined with familiar storage methods can be even more effective than simply scuttling data storage off to a cloud service provider where you might lose access, risk security, or pay more than you should. 

The Decision Tree for Archiving Data

by Dave Thomson

Many users understand they have a need to archive data for compliance reasons, to improve data preservation or to reduce storage costs. Beyond these base user requirements, you have your own specific environment that an archive 
must support. For example:

  • Total capacity
  • New capacity per day
  • Smallest/average/largest file size
  • Average file age and last retrievals dates
  • Estimated retrievals per day (and type of retrieves – single file or file sets)
  • Existing archived data (technology and formats used)
  • Redundancy requirements
  • Plan for archive migration

Follow our decision tree for data archiving
In all circumstances, we follow a decision tree that provides you with the best and most economical solution for your individual circumstances. A variety of data storage technologies are available including tape, disk, object storage and cloud storage.

This blog was excerpted from Around the Storage Block. To read the article in its entirety, click here.

Back to Basics

David Thomson – SVP Sales and Marketing – QStar Technologies

It is five years ago that a group of companies came together to form the Active Archive Alliance. That group agreed that on a regular basis the term “archive” was being misused, often to represent retaining backups for long periods of time. We saw archive in a different way, as a separate process to backup to secure non-changing data, making it available to the user or application that created it.

Today it seems that although many organizations understand this message, many more do not. I am still perplexed when IT staff fail to understand the significant advantages of using active archive technology. This inspired me to write this blog and to restate the benefits of using active archive, and what it means in 2015.

How much data within an organization is static or unchanging? For many organizations, it is a significant percentage, and there are simple, sometimes free, tools to help users understand how much data we are talking about. 

We do not archive changing / evolving data, this data is secured using RAID, replication, snapshots and backup, all of which are expensive and possibly time-consuming tasks. Most of the time involved is dedicated to ensuring that the processes are working correctly and that if something fails data is recoverable and not lost.

Archiving is about securing unchanging data in a different way. As data is ingested into the archive it is written to multiple places or media. Should one site or media fail there should always be second and sometimes third places to access the data from. This could be an automatic switch to a second repository or require manual intervention; this choice is left to the organization based on their budgets and minimum response times.

By relocating significant amounts of data that is unchanging into an archive environment, primary data sets of constantly changing information can be more easily and cost effectively protected. Backup windows are reduced, the replicated capacities are reduced and the frequency of snapshots could be increased.

Active archives can be as fast or slow, as expensive or low-cost as an organization needs.  You are not forced to use tape libraries, although many organizations do, due to their low total cost of ownership. Many active archives use SSD, disk, optical and/or cloud to store and secure data. It all depends on the individual requirements of the organization and the static data they are archiving.

If architected correctly, active archives can benefit the entire organization by categorizing data and protecting it using the most economical methods for that data type.

How to Implement an Active Archive for HPC Research

By Eric Polet

The world of high performance computing requires ever-present data accessibility, along with scalable capacity. The data management required for computational and data-intensive methods in a high-performance infrastructure can create unique challenges for those tasked with maintaining data and ensuring its long-term veracity and availability.  Research and high performance computing (HPC) sites face the challenge of retaining the ever-growing amount of data being generated by employees and computers. The data’s value expands far beyond what can be gained from it today.  To deem that data useful, information needs to be kept for decades for reexamination when future advancements are achieved.

HPC requires active archive solutions that are, among many things, reliable, scalable, cost-effective, and energy efficient. As data volumes grow, its imperative new solutions can sustain the organization’s anticipated data growth while seamlessly replacing other legacy equipment. The National Computational Infrastructure (NCI), home of the Southern Hemisphere’s fastest supercomputer and Australia’s highest performance research cloud, was facing this data growth problem.  NCI’s supercomputer supports more than 5PB of data that must be backed up and archived. NCI was faced with significant forecasted growth and wanted to implement an updated, single archive solution. This goal was achieved with an active archive solution created by Spectra Logic and SGI.

How did they do it?

NCI selected an active archive approach to manage their data, which has proven to operate quite flawlessly. Active archive solutions turn offline archive into visible, accessible extensions of online storage systems, enabling fast and easy access to archived data. “The incorporation of an active archive solution provides a platform for storage growth,” said NCI associate director Allan Williams, “It allows us to keep our primary data online and accessible to users, while also increasing the reliability of our stored data across physical sites.” The organization is able to easily scale their storage solution as their data continues to expand due to NCI’s depth of engagement with research communities and organizations. Some of the key features gained by the implementation of NCI’s active archive solution are:

• Extreme scalability
• Intelligent data management
• High data reliability
• Portable data storage solution
• Low cost per terabyte
• Reduction in energy costs and space
• Performance and uptime

HPC organizations that need a scalable storage solution are faced with a number of difficult decisions on how to store and archive their data.  Important factors to consider when selecting an archive solution include scalability, data reliability, and affordability.  Active archive’s intelligent data management framework provides organizations file level access to data at a significant reduction of cost.  When NCI introduced its active archive solution they were provided with a dense, high capacity storage solution for its cloud installation with significant economies of scale and data integrity safeguards.  By selecting an active archive solution, NCI has created a long-lasting and reliable storage solution for the country’s largest supercomputer.

A Perfectly Rational Approach to Data Hoarding

by Mark Pastor

People and the companies they work for hoard data - it's a fact brought out in survey after survey. Hoarders are not always proud of their habit and are often curious about the options available. Contrary to popular belief, in many cases it is OK to hoard data. Sometimes it is necessary, and in many cases the data being saved can be of great value to the company. Having clarity on the purpose and requirements in your own organization will provide insight into best practices for maximizing the value of the content you keep with the greatest efficiency.

The Four Hoarder Personas

There are four hoarder personas: Pacifist, Captive, Opportunist and Capitalist. Take a look below to decide which of these best describes your situation and to get ideas on best practices and technologies for your situation.

Pacifist. This persona describes an individual or an organization espousing the policy that it is OK to keep everything, even when there are no requirements to retain data. There are no formal data deletion policies or guidelines for deciding what to delete. These users don't take the time to delete their content, and IT is not empowered to delete it for them. Risk and the cost of doing nothing different is tolerable on all fronts. Storage and protection costs are acceptable; backup windows are satisfactory; there is no legal exposure resulting from keeping all that content laying around; and there is no motivation to shave costs of storage or infrastructure. If this describes your situation, congratulations on finding a rare nirvana.

Captive. Regulations and corporate policies are driving the need to hoard data for years or even decades. The day-to-day business value of the preserved content is negligible. Time-to-data and performance metrics, if they exist, will help decide between the likely [technology] choices below. Organizations involved in finance and healthcare are well represented in this persona.

Opportunist. This group generates and acquires valuable content. They have made substantial investments to develop the content, and it would be sinful to not have it available when a perfect use arises in the future. They often want to contrast with, or build upon, historical snapshots or perhaps take advantage of an opportunity to monetize the content. The use of the Opportunist’s hoarded content is generally unplanned. An opportunity will surface, and if it is not easy to get to the relevant content, the opportunity to leverage it may quickly disappear. The organization that can be nimble and regularly draw from the past can gain tremendous advantage. Those who can impressively go beyond only current content will be the star performers.

Capitalist. Content is king. Capitalists are in the content business and generate or capture content that is difficult if not impossible to reproduce. They market, sell and otherwise monetize their content. Their data and content are core to their business strategy, and success is measured by how quickly they can deliver the content, how economically they can store it until it is needed, and even by the volume of the repository from which they draw.
Which type of hoarder are you?

Use Case Requirements and Technologies

The personas above each carry a set of requirements for data storage architectures. Longer time to access data is acceptable to some while completely unacceptable to others. However, in almost all cases, when hoarding large amounts of content, the most important thing to avoid is using expensive high-performance storage for the hoarded content.

There are many great tools available to help understand how much of a company’s content is not active (typically 50% - 80%) and [demonstrate/reinforce] that inactive content should be stored on a less expensive tier (LTFS tape, object storage disk or cloud). Cheap NAS is not a good option once the cost to protect content is considered – protection software and replication hardware will be added, raising cost of ownership and burdening infrastructure.

When discussing best practices, referring to specific storage technology choices is unavoidable. Two key areas must be understood to have a complete view of best hoarding practices: data movers and storage technologies.

The table below simplifies and summarizes the key attributes of storage technology choices that need to be considered for the various hoarding architectures.

Best Practices Based on Persona

Pacifists and Captives: Leverage Your Backup Process. Retained data is not strategic for you so investments should be focused on protecting the currently active data and leveraging that process for long term retention.  Disk with deduplication or tape backup are both very acceptable alternatives. Speedy access to retained content is not critical, so it is acceptable to leverage backup jobs for retention by copying tape backup to deep archive, or sending a copy of backup data to be archived in a cloud.

Opportunists: Deploy a Cost-Effective Active Archive.  You want to take advantage of content when it’s needed, and you cannot predict when that will be.  LTFS tape or object storage disk are very cost-effective means of hoarding content.  These technology choices enable ready access (active archive) to content. Where high growth, larger scale and global access are important, object storage is the obvious choice, though LTFS tape behind a global access infrastructure is still worth considering.

Capitalists: Integrate Active Access and Content Protection.  Disk backup is critical when practical, but backing up very large content data is not always practical. Some content sets are tens to hundreds of terabytes or more. For these environments archive and protection need to be one in the same. Data dispersed object storage is perfect for this use case. Data can be cost effectively and simultaneously stored and protected.  Smaller environments (i.e., less than 200TB of data) may do well with LTFS tape, but larger environments still need to consider object storage for their hoard.

As you can see there are many good reasons for hoarding data, and as the motivations for hoarding become clear, so does the best way to manage it.

As previously published on Wired Innovation Insights, Jan 6, 2015 http://insights.wired.com/profiles/blogs/a-perfectly-rational-approach-to-data-hoarding#axzz3O4ctINC3

2015 Trends in Data Storage and Archiving

As we predicted at the end of last year, active archives became a more mainstream best practice in 2014. Businesses and organizations are recognizing the value of active archives in addressing their overall long-term data storage needs.

As we begin 2015, Active Archive Alliance members shared their predictions for data storage as it relates to active archives in the coming year. Here's a look at what’s to come according to some of the industry’s top storage experts:

  • Advanced Data Tape Will Carry More of the Storage Load

With all the significant innovation occurring in the tape market, the pieces are in place for tape solutions to expand their presence in the data center and carry more of the storage load in 2015. The timing could not be better as users struggle with increasing data loads and limited budgets. New and exciting innovations like LTFS, Barium Ferrite, tape NAS, Flape (flash + tape), tape in the cloud, new high capacity formats and newly extended roadmaps are all coming together to provide best-practice solutions for data protection and active archiving.

  • There Will be Increased Adoption of Storage Tiers

The need for large-scale data capacity is driving the implementation of an increasing number of tiers of storage across a growing number of organizations.  There will be an increase in Tier 0 with a tidal wave of flash adoption for the fastest form of storage as well as a multi-tier approach to long-term data, with the rapid adoption of public cloud and an anticipated swift increase in private cloud creation. Combinations of flash, disk and tape are being used in both public and private clouds to meet custom requirements. An increasingly complex storage environment will become the norm, with specific data being placed on specific storage technologies for specific periods of time with automated "data fluidity" systems controlling the life-cycle process.

  • Greater Intelligence Between Applications and Storage Will Simplify Active Archive Deployments

Applications that can be integrated with storage will improve overall storage management by removing complexity and helping organizations to better utilize active archive solutions. Solutions will use intelligence to deliver the right storage to meet application performance while driving efficiencies that help keep storage costs within targeted budget requirements.

  • There Will Be a Move to Object Storage as an Archive

    There is a big movement in the industry towards object storage as an archive. Object storage is attractive for several reasons: 1) it is massively scalable; 2) it is cost effective; and 3) it is able to also act as a cloud infrastructure for collaboration.  The trend is being accelerated because there are many ways to access an object based archive these days, including NFS, CIFS, Mobile Oss and more.

As the demand for more cost-effective, long-term storage options continues, active archives will proliferate. The Active Archive Alliance will support technology expansion and innovation to address the newest advancements in data storage.

Alliance members Crossroads Systems, DataDirect Networks, Fujifilm, and QStar contributed to this blog.