The potential for AI to improve business performance and competitiveness demands a different approach to managing data lifecycle. Kurt Kuckein, director of marketing at DDN examines the five key areas to consider when creating an AI platform that will not only provide faster time to value, but open up the floor for some speedy scaling.
Analytics, AI and Machine Learning continue to make extensive inroads into data-oriented industries presenting significant opportunities for enterprises and research organisations. However, the potential for AI to improve business performance and competitiveness demands a different approach to managing the data lifecycle.
Companies who are planning to fully embrace these innovative technologies in the future, should consider how data which is at the heart of their competitive differentiation will require extensive scaling. While current workloads may easily fit into in-node storage, companies that are not considering what happens when Deep Learning is ready to be applied to larger data sets will be left behind.
Here’s five key areas to strongly consider when creating and developing an AI data platform that ensures better answers, faster time to value, and capability for rapid scaling.
Saturate your AI platform
Given the heavy investment from organisations into GPU based computer systems, the data platform must be capable of keeping Machine Learning systems saturated across throughput, IOPS, and latency to eliminate the risk of this resource being under utilised.
Saturation level I/O means cutting out application wait times. As far as the storage system is concerned, this requires different, appropriate responses depending upon the application behaviour: GPU-enabled in-memory databases will have lower start-up times when quickly populated from the data warehousing area.
GPU-accelerated analytics demand large thread counts, each with low-latency access to small pieces of data. Image-based deep learning for classification, object detection and segmentation benefit from high streaming bandwidth, random access, and fast memory mapped calls. In a similar vein, recurrent networks for text/speech analysis also benefit from high performance random small file access.
Build massive ingest capability
Ingest for storage systems means write performance and coping with large concurrent streams from distributed sources at huge scale. Successful AI implementations extract more value from data, but also can gather increasingly more data in reflection of their success.
Systems should deliver balanced I/O, performing writes just as fast as reads, along with advanced parallel data placement and protection. Data sources developed to augment and improve acquisition can be satisfied at any level, while concurrently serving Machine Learning compute platforms.
Flexible and fast access to data
Flexibility for AI means addressing data manoeuvrability. As AI-enabled data centres move from initial prototyping and testing towards production and scale, a flexible data platform should provide the means to independently scale in multiple areas: performance, capacity, ingest capability, lash-HDD ratio and responsiveness for data scientists. Such flexibility also implies expansion of a namespace without disruption, eliminating data copies and complexity during growth phases. Flexibility for organisations entering AI also suggests good performance regardless of the choice of data formats.
Scale simply and economically
Scalability is measurable in terms of not only performance, but also manageability and economics. A successful AI implementation can start with a few terabytes of data and ramp to petabytes. While flash should always be the media for live AI training data, it can become economically unfeasible to hold hundreds of terabytes or petabytes of data all on flash.
Alternate hybrid models can suffer limitations around data management and data movement. Loosely coupled architectures that combine all-flash arrays with separate HDD-based data lakes present complicated environments for managing hot data efficiently.
Integration and data movement techniques are key here. Start small with a flash deployment and then choose your scaling strategy according to demand; either scaling with flash only, or combining with deeply integrated HDD pools, ensuring data movement transparently and natively at scale.
Understanding the whole environment
Since delivering performance to the application is what matters, not just how fast the storage can push out data, integration and support services must span the whole environment, delivering faster results. This underscores the importance of partnering with a provider that really understands every aspect of the environment – from containers, networks, and applications all the way to file systems and flash. Expert platform tuning to your workflow and growth direction is paramount to removing barriers in your path to value from AI, and enabling the extraction of more insights from data.
The new AI data centre must be optimised to extract maximum value from data; that is, ingesting, storing, and transforming data and then feeding that data through hyper-intensive analytics workflows. This requires a data platform that isn’t constrained by protocol or file system limitations, or a solution that ends up being excessively costly at scale.
Any AI data platform provider chosen to help accelerate analytics and Machine Learning must have deep domain expertise in dealing with data sets and I/O that well exceed the capabilities of standard solutions, and have the tools readily at hand to create tightly integrated solutions at scale.