Digital Data Management Trends 2018: A Mind-Boggling forecast

Sunil Uttamchandani, Co-Founder, Mithi Software Technologies Founded in 1999, Mithi is an award winning software Product Company and maker of products that help substantially lower costs & improve productivity.

Digitization is the buzzword of the next decade. Adopting digitization is a great opportunity for all kinds of businesses to rise to the next level in terms of speed of innovation, time to market, distribution and fulfillment, lower cost and time for service delivery, optimise resource usage etc. If every interaction and transaction is going to happen digitally, the amount of data which will be generated and need to be stored can well is termed as "Big data". It is also noted that more than 90% of all the digital data, is unstructured data comprising emails, files, music, video, images etc.

The Cambrian explosion of digital data:
"Between the dawn of civilization and 2003, we only created five exabytes; now we're creating that amount every two days. By 2020, that figure is predicted to sit at 53 zettabytes (53 trillion gigabytes)-- an increase of 50 times."-- Hal Varian, Chief Economist at Google.

This article focuses on the business perspective on managing this data (WHAT), rather than too much about the HOW (the subject of a more technical article in future). There are various architectures and platforms which can be deployed to ingest and manage the data and some of them are covered in our data management forecast for 2018. Read on.

BIG Data is the New Oil
We keep hearing this phrase that “Data is the new oil”. Going forward however, thanks to deep digitization even by smaller organizations, this phrase can be expanded to “Big Data is the new oil”.

Big Data is characterized by large Volumes of data, having a Variety of forms and structures and being generated and ingested at a high Velocity.

And Big Data is no more generated only by large ecommerce companies, stock exchanges, airlines, etc. Given the pace of digitisation, even smaller organizations are stepping into the Big Data territory and collecting and storing information that can help reshape their businesses. This interestingly includes click stream data, user activity and transactions, ancillary data surrounding transactions, notifications etc. To top it all, most of this data has no end of life and is critical for deeper analysis, and to uncover patterns to provide market insights.

For example, a business user, on average, transacts and accumulates 4 GB of email every year. For a 100 member team, this means 1 TB of data added into the storage pool. And this is only email.

This trend of hyper growth in data shows no sign of slowing down in the foreseeable future.

The Death of Backups
Not too far ago(and even today in many organizations),data management equaled Backups to another medium, like drives, tapes, and in some cases even the cloud.
Most organizations believe that their data security requirements are met as long as they are maintaining backups.

However, these backups are for all practical purposes notional when it comes to supporting the business requirement of data analysis, selective retrieval, business intelligence, knowledge mining, compliance and more.

“Traditionally businesses have been used to the idea that whenever they need to pull up historical data, they would need to deploy a team to locate the relevant offline storage devices, mount them for access, sync them to the clients and then locate the required information”

Along the way, there were several attempts to develop tools and systems, which attempt to use the backup as a live source of data, but fail at this miserably. Backups are typically snap shots and periodic by nature, don’t guarantee on capturing changes between two backup runs, and the original purpose of Backups was for disaster recovery only. So in case of a crash, you can restore the last good state of the system from the most recent backup.

While backups,managed by the application administrators, will stay on to manage the disaster recovery part of the deployment(at least for some more time),businesses will become aware that backup is not equal to data management.

In 2018, businesses will become more aware that Backups and data archival are different and that each has a different purpose in the IT architecture. Newer tools will be introduced into the environment, which archive data in real time, to a separate operational infrastructure, ensure that all of that data is search ready, available on demand and serve as a “near source” of data for the business.

Big Data on Tap
Traditionally businesses have been used to the idea that whenever they need to pull up historical data, they would need to deploy a team to locate the relevant offline storage devices, mount them for access, sync them to the clients and then locate the required information. This process can take days.

In 2018 and beyond, businesses would expect “data on tap”. They would be able to search for any data of any period on demand in seconds via a discovery console.

For this to happen, the backend systems will need re-architecting to incorporate large storage banks, central or distributed, and methods to ingest data in flight, store and index it to be search ready, active and online at all times and make it available via applications and dashboard.

This will be an all parallel tamper proof one way system, which essentially only ingests data from various data sources, thus allowing the business to track changes, history of conversations, trends, and pretty much is a repository of every event and action between the business and its stakeholders.

Inadequacy of In-premise
Any modern business, going forward, will generate 10’s of terabytes of data year on year. And this may need retention across several years. Even if the businesses don’t have a regulatory mandate to retain data across years, it is only in the interest of the business to retain data over a long period to be able to use it effectively to analyse trends, gain insights into customer behavior, and decide product roadmaps.

Deploying systems and teams, to manage this humongous growth,in premise, will turn out to be costly and complex.

We are already seeing a trend of moving workloads, which need elasticity in compute and storage,to the cloud. Cloud Platforms like AWS and Cloud native SaaS tools for archival, are the best bet as core infrastructure components in the data management architecture of an enterprise. They provide opex based costing,zero upfront provisioning leading to pay per use, handle all the backend complexity of scaling their systems as your consumption grows and have the necessary capability to keep all data online and extremely durable.

2018 and beyond is a transformational period for any business of any size due to the digital thrust from governments, users, and other collaborating businesses. Without a modern, automated, scalable and reliable data management strategy in place, these businesses are likely to be left behind on the growth curve.