In the coming years, there are few parts of daily life that artificial intelligence won’t touch in one way or another, and the automotive sector is no exception. When it comes to vehicles, however, implementing AI will place enormous demands on data centre technology. Christian Ott, director of Solution Engineering at NetApp, explores how storage and compute can keep pace with the challenge.
Usually, when you mention the idea of applying AI in the automotive sector to someone, their first thought is self-driving cars. In the current wave of research and development in autonomous vehicles (AVs), it’s estimated that the leading thirty companies have invested $16 billion into building a car that can take itself from A to B, and, as we all know, that technology hasn’t arrived just yet.
There is more, however, to automotive AI than AVs. Indeed, AI-assisted driving is already a part of Advanced Driver-Assist Systems (ADAS), performing feats like autonomously changing lanes and parking.
These developments led one market research firm to predict an $81 billion ADAS market by 2025. The streets that those cars travel along, too, are becoming increasingly smart: communication between cars and their environment is predicted to be the fastest-growing segment of a total spend on smart cities which will reach $124 billion this year.
These projects will rely on AI to manage traffic flow, monitor for incidents, and improve the overall efficiency of transport networks. Automotive manufacturing is also feeling this trend, with smart factories – which, amongst other things, will bring AI to the production line – expected to add $160 billion a year to the industry by 2023.
$16, 81, 124, 160 billion dollars: all of these innovations involve huge sums of money, but what they really have in common is the even larger amounts of data involved – and that means that the data centre will need to step up.
AVs are a perfect example of the challenge at hand. A vehicle surveying streets in order to provide training data for AV development might drive eight hours a day and 250 days a year, resulting in 2,000 hours of video data.
If the vehicle has five cameras running at thirty frames per second, that’s one billion images per year. If the cameras are, conservatively, recording two-megapixel images, we’re dealing with a terabyte of data every hour, or two petabytes every year.
While only around a third of this data is likely to be cleansed, labelled, and made fit for training purposes, this shows the kind of data scale which is typical of automotive AI applications.
Remember, too, that this is just one survey car; a future with millions of AVs, each of which with a more data-intensive sensor array than our test car, each sending at least some of the data it gathers back to a central store for further AI training, can comfortably result in an exabyte-scale data task. If we’re considering a city-wide mesh of sensors and cameras informing automated traffic decisions, or factory processes communicating with one another across the globe, we can expect a similar picture to emerge.
These big numbers, of course, are not incidental to the training of effective AI systems. Large datasets are necessary to produce good results and maximising the amount of good-quality data available is vital.
The automotive sector represents the bleeding edge of edge computing: these are real-world situations in motion, generating huge amounts of data, in unpredictable conditions, making potentially life and death decisions.
The problem, then, is not just scale, but also speed and accuracy – while faults in a content delivery network can lead to a frustrating customer experience, faults in automotive AI could have far more serious consequences.
Across AI automotive applications, the big data challenge is going to be establishing a continuous data pipeline which can square up to the size of the task at hand.
Data is collected at the edge, in vehicles and other machines either surveying the environment or doing productive work. This data, from diverse sources, needs to be consolidated and then normalised for training in a data lake, either in the cloud or on premises.
The data is then provided, with the addition of external datasets, to a training cluster running on GPUs which can massively parallelise the training process. Finally, the training data and new data produced by the training process need to be stored indefinitely to be fed back into the data lake in future training cycles and for compliance/litigation reasons.
This pipeline, from edge to core to cloud, requires innovative approaches. The training and inference steps in AI development, in particular, are extraordinarily compute-intensive, and consistently push the limits of currently available technology.
Supplying GPU clusters with the data they need, at the rate they need it, demands large-scale flash storage arrays. And let’s not forget, the data being used is often sensitive, involving individuals’ locations and habits, meaning that the whole, seamless, high-speed pipeline needs to be managed in a policy-compliant and privacy-aware way. The challenge is real – but we’re rising to meet it.
Over the last decade or so, developments in high-speed home networking and content delivery networks have upended the way that we consume media and communicate with each other on a daily basis.
For a glimpse at the next wave of change in data centre infrastructure, there are few better places to look than automotive AI.