Getting distributed systems right

November 28, 2022

Edge computing is growing. According to the Linux Foundation State of the Edge 2021 report, up to $800 billion will be spent on new and replacement IT server equipment and edge computing facilities between 2019 and 2028.

Gartner expects there to be 18 billion connected things on the internet by 2030, covering enterprise IT and devices in automobiles. Alongside this, Gartner also found the majority of enterprise data will be created outside of on-premises data centres by 2022, based on workloads moving to the cloud and to the edge.

To support all these connected devices – and to make them useful – businesses will create new applications that will work at scale. These applications will be ‘edge-native’ and built to work in this environment. For instance, supply chain management teams create data using smart sensors and barcode scanners to monitor product availability at multiple factories and warehouses. Tracking locations for products and vehicles within the supply chain in real-time generates an unprecedented amount of data from the network edge.

These implementations create real-time transactional data that needs to be analysed for insights, so that decisions around cost efficiencies and process optimisation can be made. Without this data, companies are effectively blind to their operational performance.

Edge computing and distributed data

These edge-native applications will have a curious relationship with data. On the one hand, the application, transaction and processing will be closer to the customer to reduce latency and process in real-time. However, this data will also need to be centralised for analysis and to look for patterns at scale.

To manage this consistently, your approach will have to be fully distributed. Most traditional databases will have a primary server that controls how data is processed, with secondary servers that then store and manage that data over time. However, this model does not fit well with how edge-native applications will function – they will need to manage data across edge locations and central data centre or cloud environments, without having the bottleneck of a primary server.

Fully distributed databases run across multiple locations, with each node treated as an equal. Groups of nodes in specific locations can interact with each other – for example, to provide resiliency for a service if a data centre goes down, or a connection is lost – but all the nodes will have copies of the data for resiliency. A good example of this is the open source database Apache Cassandra – this database provides geographical fault-tolerance as well as fast performance.

For edge-native applications, Cassandra can help organisations store and manage data in the same way across both edge and centre, while keeping data closer to customers to reduce latency. This data can then also be used centrally for analytics at the same time, informing machine learning models or for running the business.

Alongside managing this data at rest, it also has to be moved to where it is needed. This will rely on application event streaming, where new data is recognised and then directed to where it is needed. In the supply chain example, this could be data from a sensor that passes a specific threshold. This data can be streamed to where it is needed – for instance, to provide an alert locally to the team for them to take action, but also to any central analytics service for processing.

Supporting edge and data

To make this work more efficiently for companies we must support developers around how they want to work. This means supporting the tools that they as developers want to use, such as APIs like GraphQL, REST and gRPC. These APIs will connect the applications to the underlying data that is fuelling the end-user experience. Rather than having to understand how the underlying database works in order to get started, they should be able to use the APIs that they are already familiar with. APIs abstract away the implementation details of the database, which accelerates feature velocity and makes future changes to the underlying data models easier to implement.

Supporting multiple cloud services such as AWS, Google Cloud and Microsoft Azure can help with getting closer to where customers and devices are located. At the same time, telecoms providers may have locations that are even closer to where devices are, so any implementation should be able to deploy and run across a mix of different cloud and on-premises data centres. This hybrid model helps to support deployment out to the network edge.

Alongside APIs and cloud, you should also look at cloud native technologies, such as Kubernetes. Timelines for organisations to get to the cloud vary, but getting cloud-ready now and modernising existing applications using the best of cloud native technology means you can minimise that transition time. For example, Kubernetes is proving to be popular with telecoms operators as it can host containers and orchestrate them over time. This makes it possible for telecoms companies to host their customers’ application containers and run them effectively. This also makes it easier to run in hybrid environments across on premises, edge and central cloud services.

As more companies start to embrace edge computing for their applications, they will have to think about their edge data too. Using a fully distributed database that can run in geographically dispersed locations will be essential.

Getting distributed systems right

Related Articles

Q&A: Can smart electrification keep AI-hungry data centres from crashing the grid?

OpenAI revealed as mystery client behind $30 billion Oracle deal

Unclassified data could be the silent saboteur undermining your AI ambitions

Top Stories

Energising the digital economy: the impact of IDNOs

In The Spotlight… WB Power Services’ e-POD Solution

AI meets sustainability: The data centre challenge

How to avoid quantum decryption in the cloud

We want to hear your views on data centre design and operations

Benefits of registering with Data Centre Review