You may already be familiar with graph databases. It’s a technology for addressing complexity that has been specifically designed to help you gain critical insights into connected information at scale. Enterprise users are turning to graphs to solve big enterprise challenges and build Knowledge Graphs, for example.
You may be less familiar with the idea that graphs can be useful in a data centre context. From dependency management to automated microservice monitoring, graphs can help optimise and automate some of the complexities of day-to-day jobs. An idea that’s also emerging is the use of graph technology as a tool for lowering costs.
It’s an approach that is good for you bottom line, as well as helping your company meet key ESG goals around combatting climate change.
Graph tech in practice
Graph technology tends to be lightweight, using a far lower server footprint, and so less power and cooling for the same (or better) functionality than other kinds of database. A good example is Adobe’s Behance social media network. Behance is like Twitter for the creative community, and is based on networks of creatives who publish their content to subscribers.
The first version of Behance needed no fewer than 120 servers running MongoDB and 20TB of disk to operate. Even with that server capacity, it didn’t provide sufficient business value or operational simplicity for the Behance team.
They needed less complexity, storage, and infrastructure cost. They tried to solve those issues by rebuilding the service atop 48 servers running Cassandra. It was considered an improvement, but the data increased the platform’s disk footprint to 50TB (because of write amplification), and the platform lacked the features that would allow it to evolve to meet user needs.
David Fox, a Software Engineer at Adobe, responsible for backend infrastructure and performance for the app, describes the issues: “For every action a user took on our app, we would use a ‘fanout’ method to populate the activity feed of every user that followed them with the new activity item. For users who were followed by thousands of people, resource utilisation skyrocketed every time they did that, and our application worker processes that processed those items would experience delays because of all the work they needed to do to populate the activity feeds in Cassandra.”
Fox says that he also experienced a lot of challenges maintaining his Cassandra cluster, which led to the Behance team having to devote a significant amount of ops/developer time to supporting the cluster.
Furthermore, with the schema format Adobe was using, the Behance team didn’t have an efficient way to delete feed items from the database, so the disk usage got larger over time. The consequence was that when disk usage became high, the team would have to perform maintenance tasks to stabilise the cluster. The rigid schema structure meant the team couldn’t easily make any improvements to the main activity feed feature, and in effect they were just working to keep it running with little hope of improving it.
It was only when the code was ported to a graph database that everything fell into place. And here’s the carbon footprint benefit – the current full production version runs on just three servers for the same workload, with a huge drop in data storage, and with substantially improved functionality. In effect, a factor of 40 reduction in hardware.
Fox explains, “Our Neo4j activity implementation has led to a great decrease in complexity, storage, and infrastructure costs. Our full dataset size is now around 40 GB (yes, Giga), down from 50 TB of data that we had stored in Cassandra. We’re able to power our entire activity feed infrastructure using a cluster of 3 Neo4j instances, down from 48 Cassandra instances of pretty much equal specs.”
Graphs: powerful tools in managing infrastructure
The move to a graph platform has led to reduced infrastructure costs for Adobe, Fox adds. Graphs have also made life easier for the team in terms of operational management and administration. And if Adobe can make a 40 times reduction in the hardware deployed for a service like this, perhaps more applications of graphs to complex data centre jobs are also possible?
You can also run a graph database on low-power hardware like ARM, making a significant dent in the energy you need to run your app. This and other examples show that graphs continue to prove themselves as powerful tools in managing your infrastructure. Graphs also offer a welcome contribution to tackling the IT carbon footprint problem at the same time.