Thomas LaRock, head geek at SolarWinds discusses how anomaly detection can maximise database performance.
Detecting anomalies has come a long way in the last decade. When we observe the data centre of ten years ago, “anomaly detection” comprised a system or database administrator (DBA) receiving a rules-based alert based on a hard value. For example, if CPU spiked to 80%, an alert would automatically generate an email. Unusual behaviour and events were identified by a simple rule-based alerting function.
Then, we started to see more progressive system admins and DBAs put baseline monitoring systems in place. These baselines established the perceived normal state of a data centre and triggered an alert if activity subverted the baseline of normality. While an improvement, it still wasn’t true anomaly detection.
Real database anomaly detection is based on data science and machine learning (ML). And what’s exciting is today, the accessibility of tech means everyone can move closer to delivering it.
The data rush
The volume of data being produced in today’s organisations only continues to rise — to such an extent that Forbes research predicts more than 150 zettabytes (150 trillion gigabytes) of data will require analysis by 2025.
On top of this, all systems administrators are monitoring more applications and systems – when was the last time you heard someone say they’re managing fewer? These factors underline the sheer necessity of anomaly detection. System admins and DBAs need a signal to cut through the noise and point to the real issues in need of attention. Those beyond simply subverting baselines.
The risk of missing a critical anomaly grows with the complexity of IT environments. Without anomaly detection in place, missing out on activity in need of attention is increasingly inevitable.
No quick fix
With ML, there’s a potential to establish intelligent, sophisticated anomaly detection models. But such models aren’t always plug-and-play — ignore any vendor saying otherwise. It’s just not realistic. System admins and DBAs need to be able to train ML models by pointing out which spikes are normal and which ones aren’t. It’s only through continuous feedback and development that ML models know what constitutes as an anomaly.
It’s also important to remember; the best ML model for my data centre may not necessarily work for yours. Every data centre is configured differently, meaning anomalies look different in each. A one-size-fits-all anomaly detection solution most likely won’t cut it.
Taking matters into your own hands
The difficulty of building an ML model for anomaly detection, without using a vendor solution, is the onus is put firmly on the system admins or DBA to build such a model. This requires a full understanding of an environment’s workloads to understand which algorithms best suit each.
Once a tailored model is put in place, system admins and DBAs can set modern metrics for anomaly detection better reflective of today’s data centres. Traditional metrics assume data centres are physical and on-premises. But, as DBAs will already know, today’s data centres are hybrid, a mixture of earthed and cloud. Database workloads are virtualised, and performance is squeezed out of hardware. Improving the latency of modern data centres requires a breakaway from old school thinking — ditching old school metrics is an obvious start.
Once an effective ML model is put in place and working to the right metrics, what benefits will be felt? Fundamentally, it boils down to solving issues faster. Modern data centres produce a lot of noise; a signal cutting through the noise to highlight issues is an invaluable asset. Ideally, these signals will bring attention to issues before they cause any problems for the end-user.
Effective anomaly detection also allows system admins and DBAs to enhance the scalability of their monitoring. The ability to monitor ten machines can jump to 100 — just as long as admins are confident in their anomaly detection model and its ability to catch issues. Ultimately, this equates to more dollars per technician.
Overcoming challenges with knowledge
There are challenges with establishing anomaly detection models all system admins and DBAs should keep in mind. Firstly, as already touched upon, vendors offering anomaly detection solutions should be put under the microscope. Artificial intelligence (AI) is a vague concept at the best of times. “AI-powered” anomaly detection vendor solutions often boil down to nothing more than rules-based recommendations and code, hardly the ML-driven predictive analytics behind real anomaly detection today.
There needs to be a conversation going back and forth between system admins, DBAs, and vendors. System admins and DBAs need to articulate a vision for their ideal model and speak up when a vendor solution isn’t providing enough business value. This requires confidence from having enough of an understanding of data science.
This understanding will help system admins and DBAs answer some key questions that crop up in the process of setting up anomaly detection — whether it’s via a vendor solution or a unique model. What data needs to be fed into the model? What requires predictions or alerts? How can the correct data be sourced and fed into the model? Answering these questions is vital to making sure models function as effectively as possible.
Inquisitively thinking about data in this way puts system admins and DBAs in a good position to evaluate the overall suitability of IT networks for producing predictive analytics. For example, some legacy hardware (such as certain routers and switches) don’t produce any kind of metrics that can be fed into an anomaly detection model. Key benchmarks such as CPU utilisation are left a mystery and — without any kind of API interface or log file metrics — remain a mystery.
Answering the key questions and successfully identifying the legacy systems incompatible with predictive analytics comes back to this important point. System admins and DBAs need to be comfortable with data science. This might sound daunting, but it shouldn’t; long gone are the days where data science was exclusively reserved to sophisticated and specific roles — mainly in finance and insurance institutions.