Unclassified data could be a silent saboteur to AI ambitions

Mark Molyneux, EMEA CTO at Cohesity, warns that unless businesses bring order to their data chaos, the costs – from multi-million-dollar breaches to garbage-in-garbage-out AI – will keep piling up.

Every day, approximately 400 million terabytes of new data are generated around the world. From Instagram reactions and slack messages to endless Zoom call recordings, all of this information needs somewhere to live, but holding onto unnecessary data can quickly rack up costs for organisations.

What’s worse, most businesses don’t even know what they’re storing. Employees inadvertently download all kinds of personal files – from bills and passport scans to pictures of their dog – many of which pose a GDPR risk. While their desktops might be cluttered, the problems actually run much deeper.

Without proper visibility into their data or its storage locations, businesses struggle to manage storage efficiently, comply with regulations, and fully leverage the power of AI – because if you put garbage in, you get garbage out.

This is why businesses must move beyond poor data management practices and take data indexing and classification seriously.

Unclassified data: The hidden risk

Most businesses overlook their data practices, often underestimating the risks. According to the 2024 Verizon Data Breach Investigation Report, the average cost of a data breach is approximately $4.45 million, which can be exacerbated by poor data management practices.

Think about it – critical files scattered across desktops and servers, buried in email threads, or saved under vague names like ‘Final_v3_UPDATED.’ Not only does this make data harder to access when needed, but it also increases security risk. More businesses than you think are powered by duct tape and willpower, and the consequences can be dire.

It’s not just about security, wasted storage costs, inefficiencies, duplicated efforts, redundant systems, lost time, and fed-up staff – though it is all of these things and more. It’s about missed opportunities. When companies aren’t able to extract vital insights from their data, they miss out on market trends, misunderstand customer needs, and end up with incorrect conclusions and poor decision-making.

The other big danger right now is that businesses are facing far stricter data compliance requirements than ever: DPA, GDPR, DORA, LGPD, BDSG, HIPAA – and there are more in the pipeline. Unclassified data makes it near impossible to really ensure compliance, leaving businesses open to hefty fines and reputational damage.

One of the most newsworthy examples of this in recent years is TalkTalk’s 2025 data breach. The breach originated from unauthorised access to a third-party supplier’s system, and the fallout exposed the personal information of more than 18.8 million current and former TalkTalk subscribers. It highlights that dangers can come from an organisation’s wider ecosystem, and they have a responsibility to ensure security and risk management is sold downstream.

The solution? Businesses should get serious about their data management practices. By organising and structuring data, organisations can reduce costs, improve efficiency, and stay compliant. The hidden cost of unclassified data is too high to ignore – it’s time to clean up the chaos. They do not and should not need to wait to be told by a regulator to do this.

Why data classification is essential

When data management practices are more chaotic than a toddler with a box of crayons, data classification might sound like an insurmountable challenge, but at its core, it’s just indexing data based on type, structure, relevance, and sensitivity, and then connecting the data to a relevant record policy – which defines for companies what they need to keep, why, and for how long.

Most companies will do this to a degree. Many will have some form of data that is indexed and classified, like customer records and transaction logs, sitting neatly in databases, making it easy to search and analyse, as well as unclassified data, meaning everything else – the scattered emails, PDFs, and videos. The problem with poor data management practices comes from where we sit in time: the early stages of the AI revolution. When AI is meant to give us faster and more accurate insights, we need strong foundations to draw from. Otherwise, AI is flying blind.

Take ChatGPT, for example. It generates responses based on broad training data, which is great for drafting emails, but not so great for precise, data-driven insights. And this then leads to irrelevant or misleading information.

That’s where modern data storage and indexing solutions come in. Many third-party providers don’t just store data; they make it smart by using advanced methodologies and proprietary natural language processing applications. The real game-changer? RAG (Retrieval-Augmented Generation) AI. Unlike generic internet-trained models that draw from anything and everything, RAG retrieves and verifies information directly from properly indexed data, ensuring accuracy and reliability, and importantly providing the source of its context.

The impact? Businesses get actionable intelligence from better-organised data, smoother workflows, cost savings, regulatory compliance, and happier colleagues. According to KPMG’s Global Tech Report 2024, the proportion of execs reporting a positive impact on profitability from data and analytics has risen by 25 percentage points on average. But most importantly, it creates a rock-solid foundation for AI-driven insights – insights that don’t just work today but get smarter and more valuable over time.

Making a start on smarter data management

Businesses can streamline data management by moving from cluttered local systems to the cloud, where automated services handle organisation and indexing. These solutions automatically process and index data, extracting metadata and structuring information efficiently. This ensures seamless access, improved searchability, and optimised storage without manual intervention. This is part of the journey as organisations still need to apply classification here to avoid falling into the ‘keep everything’ trap.

Or organisations can also establish clear data management goals that are aligned with business objectives, implement strong security practices, and then investigate ways to automate processes to take the administrative burden off staff. It’s a complex process, but plenty of businesses do it. Once that’s up and running, it’s important to train staff and set expectations. And then employ AI test applications and run pilot tests.

Whether handled in-house or with a third-party partner, the priority is to establish strong governance frameworks and create structured data management before compliance pressures mount. In a fiercely competitive landscape, it’s time to ditch outdated storage habits and treat data for what it truly is: a strategic business asset.