Skip to content Skip to footer

Dealing with data? Don’t try to treat it all the same

Image: Adobe Stock / sdecoret

According to Economist Impact, 37% of companies list having a data-driven mindset as a business-critical priority, while a further 57% rated this as a medium or high priority for their organisations.

The same research found that 80% of companies had implemented big data analytics. Yet the percentage of companies that see real value from their data is still shockingly low at present — the Economist Impact research found that 56% of executives did not perceive value from their big data analytics projects. Similarly, according to PwC, only 16% of companies have so far achieved business value from implementing their data and analytics projects in the cloud.

So, why are so many companies having trouble getting their data and analytics projects to work in practice? Because different use cases and applications leverage different types of data, and what works for one does not work for the other. To know what will work — and equally, what won’t — you have to understand some key details about how these data sets are created, stored and accessed over time.

Big and simple vs. small and complex

One type of data set is generally known as ‘big data’. Over the past decade, this term was applied to describe the data sets created by applications that served online customers. The techniques and technologies built up around big data were created for the express purpose of dealing with the huge amounts of data that were flowing all the time. Companies like Netflix and Meta built data pipelines that could manage petabytes of data coming in every single day.

Today, the number of companies that create and use these kinds of data sets has gone up. Rather than being the sole preserve of the big social media and online companies, this kind of data set has sprung up at thousands of companies. The moniker of ‘big data’ has now become the norm for many. These sets of data are large and update rapidly, and they are well-ordered. This makes analysing the data easier, and petabytes of information can be scanned and used rapidly.

However, not all data sets follow this pattern. Operational data is the data created by business applications as orders are taken and managed through the enterprise resource planning (ERP) applications that run the business. This includes financial and accounting systems, supply chain operations and other processes. Instead of being orderly data that can be processed quickly at scale, operational data sets are highly connected and extremely dense.

The challenge here is that ERP systems are built to get every ounce of performance out of the transactions as they are made. Each business function has its own system of record for transactions, and these are optimised to improve performance within that specific function. For example, a customer sale will lead to the creation of invoices for payment, to sales orders in manufacturing and production, and the requisite orders for supply chain processes and financial ledger systems in internal accounts. These systems all connect to each other and each customer record has to be updated in each system. In practice, tens of thousands of individual database tables that track business data elements and relationships have to be updated over time. Because this optimisation generally takes place in each domain, no single ERP provides that joined-up view across the entire business.

Analysing operational data can provide great insight into how the business is performing. However, the separation of data sets makes it hard to answer questions that the business wants to ask. For instance, the approaches that work for big data sets will routinely fail with interconnected ERP data. The data pipeline model that we know and use today is built for big data, not for ERP data.

Designing data analytics approaches

A data pipeline is the set of tools and processes that a team will use to get value from data. It takes information from business applications and then cleanses, organises and presents that data to those who need it. For operational data, trying to apply a pipeline in this way is not effective.

The approaches used with ERP systems to improve transaction speed and keep those customer records updated across multiple tables is not really compatible with how you might implement analytics with a data pipeline. Instead of straightforward data that is already organised, operational data is distributed across multiple disparate systems. Rather than being able to look at single transactions in one place, the information needed might be spread across 50 or more distinct tables. These tables may then need multiple lookups and calculations in order to create the final results that an analyst wants.

To get that data into a data pipeline involves understanding all those various connections. To build this, analysts usually try to break those connections down into smaller and smaller sections. The aim here is to create a simplified view of the data that can then run the query, rather than try to handle all the connections in one go. The problem with this approach is that it oversimplifies the data, meaning that analysts can only answer predefined questions. If they need anything else, it means a long process of going back to the source system to get the data and massage it into shape. This means slower time to get insights, and therefore slower time to achieve results.

To solve this problem, we have to stop looking at every data analysis problem as if a more complex pipeline is the solution. Instead, we have to consider how to handle connected data sets from the off. In practice, this means making data accessible to users without the headache of managing pipelines to move the data to them.

It also involves having the right approach for analysis in place before any queries are made. Gartner defines this approach as query acceleration, as you scan the entire data set for analysis before any query is created. It brings the whole set of data to the problem, so that it can be used to answer the question quickly. This improves the query process too, as analysts can pose the questions they want to ask over time, rather than sticking to set questions.

Apply the right approach as you need it

In order to make data analytics projects successful, we have to consider the goals that we have to meet and what objectives exist. The small percentage of companies that have succeeded in implementing their data and analytics projects shows how we have to look at the kinds of data that we have across the business, and then apply the right tools and approaches where they are needed. As more and more companies look at data to create their competitive advantages and help their decision-making processes, getting this right will be essential.

Picture of Nick Jewell
Nick Jewell
Technology Evangelist at Incorta

You may also like

Stay In The Know

Get the Data Centre Review Newsletter direct to your inbox.