Structured management of unstructured data with object-based storage

The challenge of storing data has become significant with the growth of data. Object-based storage is essential for building world-class cloud infrastructure.

Vivek Tyagi Jul 07th 2017

Technology advancements have created a world of connected devices and individuals are generating more and more data every minute. According to IDC, digital data is expected to cross 163 zettabytes by 2025.

The datacenter market, with the adoption of cloud, is changing the way enterprises operate. Unstructured data from mobile devices, social media, sensors, video and audio have all played a significant role in altering how client-to-cloud communication happens.  Today, connected devices and IoT data is generated from cars, drones, robots, cameras, wearables every second of the day, and there needs to be a way to deal with all this data cost effectively and efficiently. 

Analyzing and extracting value from this data, helps in better decision-making for products and services that enrich our lives. However, not all data brings valuable insights immediately, nor is able to be economically stored for future analytics. For such usage patterns a cost-effective solution with adequate performance, scalability and durability is required. Object-based storage (OBS) solutions can be deployed to help capture all this data and support innovation. 

Traditional storage solutions: block and file storage, have significant shortcomings for large amounts of unstructured data.  File-based storage manages data in a folder hierarchy which tends to suffer performance problems at scale.  Similarly, block-based storage manages disk sectors collectively as blocks which can deliver blazing performance as long as the amount of data being stored remains relatively modest. 

OBS solutions stores all data - a document, audio/video files, images, emails, etc. as an  object in a flat file format, avoiding the hierarchies found in file and block storage.  Instead, metadata provides the ability to describe the object in addition to the address of the object.  There is no hierarchical structuring of data and all data is placed in a flat address space. Every object has a unique identifier making it easier to track data.  As metadata is user defined, it enables data analytic tools to leverage it efficiently for processing large volumes of data. 

OBS solutions have a number of benefits when compared with SAN or NAS architectures: 

1. Extreme scalability

OBS operates on a flat address space and supports metadata. This enables logical combinations of objects into buckets and deliver efficient scaling capacity. As a result, massive scalability is achieved and it is virtually unlimited making a good match for data at scale. IDC’s recent paper “The Economic and Operational Benefits of Moving File Data to Object -Based Storage” states that, “Several end-user organizations claim that management becomes a challenge once seven to eight NAS arrays are deployed.” Clearly, an alternative to file storage is needed for large unstructured data needs. 

2. Advanced data availability

Data availability is one of the key factors in data storage. Data availability ensures that data is available at a required level of performance in any situation. In traditional storage architectures, RAID technology was implemented for data availability. RAID rebuilds continue to be used as an operation to help protect and re-create data from drive failures. However, rebuild times could take days or weeks depending on the volume of data being stored. While this approach focuses on drive redundancy, OBS focuses on data durability, availability and redundancy. 

As it relates to OBS, data availability is achieved through advanced erasure coding. It ensures data protection by breaking down data into fragments, expanding and encoding with parity data pieces and sorting it across a set of different locations. Hence, data is made accessible even if multiple components fail. Through geo-spreading or geographically spreading, only one third of the object data is stored in the different locations and this reduces network traffic while still providing high availability. Compared to traditional triple mirror data replication like RAID, geo-spread OBS models provide very high data accessibility and resiliency at a substantially lower cost.

3. Advanced data durability

Data durability is having access to data as it was originally stored.  OBS with data scrubbing technology secures data from bit failures and drive level failures. If a data fragment gets corrupt, the remaining fragments with redundant data constructs a new fragment, a self-healing operation conducted in the background. So there is no need to replace the entire drive.  Geo-spread OBS model with data scrubbing makes data durability and protection even more efficient. Cloud service providers and datacenters who deploy OBS architectures receive one of the highest data availability and data durability capabilities available in the market today.

4. Simplified data management

OBS has an effective, simplified and cost-effective management solution for data. It has a flat architecture, which is the collection of objects held within the object store -- even those located in disparate storage system hardware and locations. One system spread across different locations, data can be managed easily. This means multiple racks of storage, even multiple locations can be managed from a single pane of glass for significant management efficiency. 

Most OBS system have high-storage density that can scale from 740TB to more than 52PB of data, which helps reduce its data center footprint, allowing customers to easily grow their object store in increments of easy-to-install capacity as needed, while enabling on-premises and off-premises cloud-based strategies. This helps deliver lower operating expenses (OpEx) compared to SAN or NAS implementations. 

The challenge of storing data has become significant with the growth of data. Object-based storage is essential for building world-class cloud infrastructure. It offers a cost-effective solution with extreme scalability, advanced data availability and durability, and a simplified management structure ensuring a positive effect on business operations. 

The author is Director for India business development, Commercial sales and Support, SanDisk Brand at Western Digital Corporation. 

Disclaimer: This article is published as part of the IDG Contributor Network. The views expressed in this article are solely those of the contributing authors and not of IDG Media and its editor(s).