Skip to content Skip to footer

The sustainability challenge of dark data

Image: Adobe Stock / .shock

With the IPCC recently issuing a stark environmental warning to the world, and the UN reiterating the need to reduce global carbon emissions, questions are now being asked about how the tech industry can improve its green credentials – especially as data storage, data mining and file sharing increases.

Data generation and storage slows no sign of slowing

Modern enterprises continuously generate and accumulate vast amounts of data. This includes routine activities across enterprise systems, machines, sensors, and demand-side digitalisation.

According to IDC, worldwide enterprise data storage is rising at a 27% compound annual growth rate (CAGR). As a result, global data storage is anticipated to reach more than 11 zettabytes per year by 2025.  This is an increase from 2.6 zettabytes in 2018, and will likely double every two to three years, requiring more storage and management.

All of this data comes in multiple forms – dark, redundant, critical, etc. The term ‘dark data’ refers to unstructured and inert content, which is fundamentally opposed to critical structured data. However, redundant data is semi-structured information with a high risk of becoming dark. The data also comes in a wide range of formats, including information streams, customer call records, log files, master data, and manually entered operator data.

Unstructured data is also becoming more prevalent. This is because we have become accustomed to converting text, pictures, and music into digital formats for computer processing, social networking, search engine inquiries, and real-time streaming. The result is a large volume of digital data that needs to be stored – and much of this data won’t even be accessed later.

Data vs. information

Although the terms data and information are often used interchangeably, data transforms into information only when seen in context or analysed to provide insights. With the emergence of the Internet of Things (IoT), devices such as smartphones and smartwatches have data-gathering chips installed into them and broadcast that data via the internet.

The IoT has the potential to create hidden information in logs, metadata, text fields and documents, video, audio, and photographs. However, about 90% of the data generated by IoT devices is never accessed, and up to 60% of that data loses its value within milliseconds of generation.

There are various estimates of the percentage of dark data in all data. One study found that dark data is the most important subset of unstructured, accounting for 90% of all data. Yet, less than one per cent of it is ever accessed again, at the time of gathering, for business analytics and decision making. Another study from Veritas Global reveals that an average of 54% of stored data by worldwide enterprises is classified as dark since individuals in charge of it are unaware of its content and usefulness.  

The environmental impact of dark data

Dark data has both pros and cons. The disadvantages of dark data are frequently more obvious than the benefits, and they are legitimately serious. Storing such a high percentage of concealed data, just like dark matter, may be difficult to analyse as dark data far outnumbers the amount of visible data. While visible data may be easily accessed in databases, dark data requires a more sophisticated extraction process before being actively utilised.

Storing and safeguarding dark data usually comes at a higher cost and, in some cases, a higher risk than the data itself. This is because parts of this data might become valuable over time, and create a target for theft and cyber crime activity such as ransomware or distributed denial of service (DDoS) attacks.

Furthermore, storing a massive amount of dark data on a conventional hard disk drive (HDD) and solid-state drive (SSD) storage type is currently wasting a significant amount of energy to keep stored data alive. This is mostly powered by non-renewable resources, and therefore leads to an increase in CO2 emissions. This, in addition to the heat produced as a by-product of production, traffic, and storage, necessitates cooling.

Storage also drives power demand, which could increase from 11% to 19% of overall data centre energy consumption if the infrastructure impact of storage cooling is taken into account. Although it is impossible to precisely calculate emissions associated with data storage, the entire ICT sector is estimated to account for about 1.4% of global CO2 emissions. This is because the large amount of energy the industry consumes is often carbon-intensive.

It is also worth mentioning that, according to Veritas Global’s Databerg Report, the storage power required to hold and process global dark data is estimated to emit 5.8 metric tons of CO2 annually. In the US, for example, power consumption due to data centre data storage alone was estimated to be at 14 billion kWh in 2020 resulting in almost 6.5 metric tons of CO2 emissions.

Rethinking storage of digital assets

Research in Big Data illustrated that 91 zettabytes of dark data would exist in the next five years, which is more than four times the quantity currently held, according to Veritas Global. A large portion of this needs to be stored, requiring significant storage space unless data users and organisations adjust their activities. Thus, for large-scale data centres and cloud providers, finding ways to use electricity more effectively is critical.

Given the size of data storage facilities, seemingly minor modifications can have a significant impact on the cost of environmental issues and carbon footprint. Although cleaner energy sources are an essential area of focus for green data centres, reducing waste and optimising resources can also play a big part in their progress.

The combination of improved energy efficiency, dark data filtration, the removal of unnecessary information, and greener energy sources will yield the best results in terms of overall cost savings and carbon emissions reduction.

Aoife Foley and Dlzar Al Kez
Aoife Foley and Dlzar Al Kez
Aoife Foley, IEEE Senior Member and Reader in the School of Mechanical and Aerospace Engineering, and Dlzar Al Kez, IEEE Student Member and PhD student in the School of Electronics, Electrical Engineering and Computer Science – Queen’s University Belfast

You may also like

Stay In The Know

Get the Data Centre Review Newsletter direct to your inbox.