AI: A new approach to managing the data life cycle

November 26, 2018

The potential for AI to improve business performance and competitiveness demands a different approach to managing data lifecycle. Kurt Kuckein, director of marketing at DDN examines the five key areas to consider when creating an AI platform that will not only provide faster time to value, but open up the floor for some speedy scaling.

Analytics, AI and Machine Learning continue to make extensive inroads into data-oriented industries presenting significant opportunities for enterprises and research organisations. However, the potential for AI to improve business performance and competitiveness demands a different approach to managing the data lifecycle.

Companies who are planning to fully embrace these innovative technologies in the future, should consider how data which is at the heart of their competitive differentiation will require extensive scaling. While current workloads may easily fit into in-node storage, companies that are not considering what happens when Deep Learning is ready to be applied to larger data sets will be left behind.

Here’s five key areas to strongly consider when creating and developing an AI data platform that ensures better answers, faster time to value, and capability for rapid scaling.

Saturate your AI platform

Given the heavy investment from organisations into GPU based computer systems, the data platform must be capable of keeping Machine Learning systems saturated across throughput, IOPS, and latency to eliminate the risk of this resource being under utilised.

Saturation level I/O means cutting out application wait times. As far as the storage system is concerned, this requires different, appropriate responses depending upon the application behaviour: GPU-enabled in-memory databases will have lower start-up times when quickly populated from the data warehousing area.

GPU-accelerated analytics demand large thread counts, each with low-latency access to small pieces of data. Image-based deep learning for classification, object detection and segmentation benefit from high streaming bandwidth, random access, and fast memory mapped calls. In a similar vein, recurrent networks for text/speech analysis also benefit from high performance random small file access.

Build massive ingest capability

Ingest for storage systems means write performance and coping with large concurrent streams from distributed sources at huge scale. Successful AI implementations extract more value from data, but also can gather increasingly more data in reflection of their success.

Systems should deliver balanced I/O, performing writes just as fast as reads, along with advanced parallel data placement and protection. Data sources developed to augment and improve acquisition can be satisfied at any level, while concurrently serving Machine Learning compute platforms.

Flexible and fast access to data

Flexibility for AI means addressing data manoeuvrability. As AI-enabled data centres move from initial prototyping and testing towards production and scale, a flexible data platform should provide the means to independently scale in multiple areas: performance, capacity, ingest capability, lash-HDD ratio and responsiveness for data scientists. Such flexibility also implies expansion of a namespace without disruption, eliminating data copies and complexity during growth phases. Flexibility for organisations entering AI also suggests good performance regardless of the choice of data formats.

Scale simply and economically

Scalability is measurable in terms of not only performance, but also manageability and economics. A successful AI implementation can start with a few terabytes of data and ramp to petabytes. While flash should always be the media for live AI training data, it can become economically unfeasible to hold hundreds of terabytes or petabytes of data all on flash.

Alternate hybrid models can suffer limitations around data management and data movement. Loosely coupled architectures that combine all-flash arrays with separate HDD-based data lakes present complicated environments for managing hot data efficiently.

Integration and data movement techniques are key here. Start small with a flash deployment and then choose your scaling strategy according to demand; either scaling with flash only, or combining with deeply integrated HDD pools, ensuring data movement transparently and natively at scale.

Understanding the whole environment

Since delivering performance to the application is what matters, not just how fast the storage can push out data, integration and support services must span the whole environment, delivering faster results. This underscores the importance of partnering with a provider that really understands every aspect of the environment – from containers, networks, and applications all the way to file systems and flash. Expert platform tuning to your workflow and growth direction is paramount to removing barriers in your path to value from AI, and enabling the extraction of more insights from data.

The new AI data centre must be optimised to extract maximum value from data; that is, ingesting, storing, and transforming data and then feeding that data through hyper-intensive analytics workflows. This requires a data platform that isn’t constrained by protocol or file system limitations, or a solution that ends up being excessively costly at scale.

Any AI data platform provider chosen to help accelerate analytics and Machine Learning must have deep domain expertise in dealing with data sets and I/O that well exceed the capabilities of standard solutions, and have the tools readily at hand to create tightly integrated solutions at scale.

AI: A new approach to managing the data life cycle

Saturate your AI platform

Build massive ingest capability

Flexible and fast access to data

Scale simply and economically

Understanding the whole environment

Related Articles

Why data centres should renew their faith in renewables

Can air cooling survive the AI era, or is hybrid the new default?

Is your Java estate audit-ready – or just hoping for the best?

More stories

80 MW Blackpool data centre wins planning approval

The internet’s next upgrade shouldn’t just be faster – it should be cleaner

Ark gets green light for £250m data centre in Corsham despite local backlash

Microsoft has a plan to reduce its data centres’ impact on the grid

DataVita lands £44.9m deal to run core IT infrastructure for Glasgow City Council

Why data centres should renew their faith in renewables

European data centre market struggles to deliver amid an AI boom

Top Stories

Guidance for RDHx deployment: Whitepaper by nVent

Trend Report: How data centre cooling challenges are driving UPS innovations

Zoho announces plans for UK data centre

Why a data centre need more than a UPS and some diesel to keep it running

Half of England’s data centres now use waterless cooling, techUK report finds

Benefits of registering with Data Centre Review