Removing data fragmentation leads to positive IT disruption

Removing data fragmentation leads to positive IT disruption

What’s the point of having a vast collection of data if you can’t effectively use it? Ezat Dayeh, SE manager, Cohesity, explains how organising your data could open the door to leveraging the full potential of big data and AI. 

We hear a lot about the potential of big data. IT analysts regularly name it a trend that will dictate the future of IT in modern organisations. Thought leaders take the stage and discuss AI-driven futures, the edge and big data’s eventual ability to power everything from smart cities and autonomous vehicles to diagnostic tools in healthcare.

While it is true that data is the most valuable resource for organisations, how many are truly using the data in their possession? How many still have the majority of data sat on legacy infrastructure, siloed in databases? How many know how to take advantage of growing datasets, or indeed, know what data they have at their disposal at all?

To realise change, behave differently and add versatility

According to Forrester’s study of big data, most enterprises can only analyse 12% of their data at any one time. Moreover, somewhere between 60-73% of all data within an enterprise goes completely unused for analytics. Earlier this year, Experian released findings that showed many businesses believe a third of their data is inaccurate, 70% don’t even have the control they need to impact strategic objectives. Talk to anyone working closely with data and they will give you several reasons as to why this is happening.

Budget, legacy technology, skills shortages, security concerns; all of these contribute to the apparent inaction among companies to begin taking control of their data sets. There is also another reason: mass data fragmentation. This is essentially just a technical term describing ‘data that is siloed, scattered and copied all over the place’ leading to an incomplete view of the data and an inability to extract real value from it. These data sets tend to be located on secondary storage, used for backups, archives, object storage, file shares, testing and development, and analytics. This data fragmentation can be compounded by duplication of the same sets of data between on-premises storage and one, or multiple public cloud locations. This leads to data sets which are difficult to locate, let alone analyse and manage.

The vast majority of business data is dark, so it is no wonder that big data isn’t being fully leveraged. Any CIO could tell you that the inability to manage and harness insights is a big competitive disadvantage when it comes to customer satisfaction and the development of products and services.

This is a wider issue than IT. Poor visibility into your data and where it is located leads to a host of challenges, including major compliance risks and security vulnerabilities. Since GDPR legislation came into force in May 2018, the issues around data handling and compliance, and in particular personally identifiable information (PII), have been highlighted to board level, with financial penalties of a significant magnitude promised for those that get it wrong. Also, and more real for those below the executive level, the time, cost and overall resource drain that accompanies managing fragmented data is akin to a leaky tap that needs to be fixed, such is the detrimental effect it has on a data-led business.

Implement a new data management model

Data fragmentation is not an impossible puzzle to solve but doing so requires investment and attention from CIOs and IT leaders to begin making their data more able to deliver better business outcomes, not sit in a silo.

In such cases, the cloud becomes top of mind. And with good reason: many organisations have been looking to incorporate the public cloud into their architecture for less mission-critical data and apps. 79% of enterprises said they are likely to move some legacy tape data to the cloud.

But the cloud comes with its own challenges. Silos can emerge in the public cloud too - indeed they already are. Recent research shows that 45% of senior IT decision-makers say their IT teams are spending between 30 and 70% of their time managing data and apps in public cloud environments today. Mass data fragmentation in the public cloud coupled with explosive data growth can create the same lack of visibility and compliance issues that businesses are trying to avoid.

Hybrid IT environments that blend public cloud with on-premise are strong solutions for most businesses. But that requires a platform that allows enterprises to build a full hybrid cloud strategy for data and apps, offering critical capabilities such as long term retention, full lifecycle disaster recovery, backup from on-premise, and the ability to bring critical applications and analytics to the data, to avoid further copies of that data being made.

Protection and enablement as priorities

The reduction in visibility which keeping large amounts of data stored across multiple infrastructures can cause is also a risk to security. Ransomware is increasingly targeted at businesses who do not have their data practices in order. Best practice means establishing clear oversight over all data sets, allowing IT teams to take control and give other employees appropriate levels of access to sensitive data. Solutions with built-in analytics capabilities can be applied directly to the data, and analytics can flag to the IT teams when their data has changed or if files have recently been accessed, modified or even worse, deleted.

As a last resort to targeted ransomware, enterprises need solutions that provide the ability to instantly recover from disaster at-scale. Not only does this shield an organisation from the demands of an attacker, but allows it to avoid substantial costly downtime, and focus on quickly remediating the problem.

Will AI set a new direction?

Once these hurdles are overcome, the real advantages of big data are apparent. AI, machine learning, and a new generation of applications which run directly on top of a large organisation's data are huge drivers of value for leading companies. A recent survey by New Vantage Partners revealed that 97% of enterprises are currently investing in artificial intelligence (AI) capabilities. With business leaders now looking at deploying AI capabilities, specifically machine learning, to boost productivity and automate low value but high need tasks, having reliable sets of big data, rather than many siloed data sets is a solid way to ensure your AI efforts aren’t wasted.

Embrace risk, drive market disruption

To drive good results of AI deployment, data needs to be accessible and manageable, enabling analytics, testing, malware scanning, auditing and script building. When data is siloed, it makes it hard to access, which is a risk to future success. It’s therefore crucial to train machine learning algorithms, develop automation, and build reliable apps and services to power a business.

There can be no doubt that big data can enable new technologies and strategies, and as such will be a key competitive differentiator for businesses. What must be clear to all IT decision-makers is the data management associated with big data project will be costly, time-consuming and a risk to the business if not managed correctly.

Wherever data workloads exist, data storage strategies must be carefully considered, or big data will only deliver bigger challenges. You have to prioritise data management of big data, to realise the desired business outcomes so often touted. It's not about keeping the lights on anymore, it's about providing the innovations to the wider business to deliver better outcomes.