Is the real AI revolution happening above the model layer?

Mavinder Singh
Mavinder Singh
VP of Product Management for AI at Redis

Manvinder Singh, VP of Product Management for AI at Redis, believes the spotlight is shifting from racing to build the best AI models towards creating robust application architectures – and the powerful infrastructure to make them enterprise-ready.

The AI conversation is shifting, with the focus moving beyond model innovation to the development and deployment of AI applications – and the infrastructure that powers them. Developers are realising that it’s time to focus higher up in the stack. This shift is driven by a convergence of factors, from the maturing of foundational models landscape to the growing demand for rapidly deploying AI Agents in real-world use cases.

Firstly, this shift reflects a growing recognition that while AI models hold immense potential, deploying them at scale remains a significant challenge. A recent MIT Technology Review survey found that while 79% of companies planned generative AI deployments in 2023, only 5% had production use cases by May 2024 – underscoring the hurdles of real-world implementation. As a result, there is a heightened focus and investment in improving the accuracy, performance, and reliability of AI applications to make them truly enterprise-ready.

Secondly, the AI model landscape has changed dramatically in the past 12 months. OpenAI’s GPT-4 series held the top spot on performance leaderboards for a while, but recent models from Anthropic, Google, Meta, and DeepSeek have reached comparable levels. Over the last year we saw models from each of these providers match or surpass the ranking of OpenAI’s top models on LMArena.ai, the popular crowdsourced benchmarking platform for AI models. 

Enterprises and developers now have more choice when selecting high-performing base models, making them less dependent on AI providers. Also, if the existing model doesn’t work for a new use-case, a different one can be tried instead of trying to make it work by tuning them.

Finally, the most significant driver of this shift is the much-discussed ‘Rise of AI Agents’. These advanced applications promise to amplify workforce productivity by orders of magnitude. However, building high-performing AI agents is a complex engineering challenge – demanding thoughtful architectural design, the right technology stack, and rigorous testing and iteration to ensure reliability and efficiency.

Tackling the memory requirement for AI Agents

Building AI agents is a complex challenge that demands careful design decisions and rigorous human-in-the-loop testing. Unlike traditional software, there is no one-size-fits-all blueprint for deploying agentic applications. As a result, more developers are recognising the need to invest in ‘Agent Engineering’ – the discipline of architecting, optimising, and iterating on AI agents.

One major challenge in this space is managing long-term memory. Just like human colleagues, AI agents need to remember relevant information and learn over time to improve performance. This requires an efficient memory layer – essentially an in-memory database – that can store, retrieve, and manage memories while handling factors like relevance and decay. As AI agents become more sophisticated, this memory layer will be a critical component of AI application infrastructure.

Specialisation will drive success, but also add complexity

The first wave of generative AI applications focused on broad use cases – think ChatGPT-style interfaces that provide information. However, as AI apps evolve from chat-based interactions to automating real-world workflows, they must develop a deeper understanding of context. This includes recognising the role of specific functions within an organisation and integrating with specialised tools to execute tasks like a human.

This shift brings increased complexity to AI infrastructure. As applications connect with multiple enterprise systems, builders will need to rethink onboarding, identity management, privacy controls and authentication. These challenges will drive rapid innovation and fundamentally reshape IT infrastructure to support AI-driven automation at scale.

The growing need for speed in AI

The prospect of AI agents actively working for organisations is rapidly becoming a reality. No longer seen as passive tools, these agents are evolving into dynamic decision-makers – expected to respond instantly and take actions faster than current language models allow. However, agentic applications often rely on iterative loops of planning and reflection, repeatedly calling base models within a single task execution. This can sometimes take minutes – an unacceptable delay for real-world applications that require real-time responsiveness.

To meet these demands, AI infrastructure must prioritise low-latency, real-time technologies. Choosing the right components – such as a high-performance vector database for rapid knowledge retrieval – will be critical to maintaining speed. Additionally, organisations will need to adopt emerging technologies like semantic caching, which accelerates responses by checking past AI outputs for similar queries before triggering costly new model inferences. As AI applications mature, optimising for speed will be just as important as optimising for intelligence.

What comes next?

As we move into 2025, the conversation around AI will centre less on groundbreaking innovations in model design and more on addressing the practicalities of application development, agent architectures, scaling and implementation. The journey from potential to production has revealed critical bottlenecks, driving a shift in how organisations approach AI. 

Prioritising infrastructure efficiency, embracing practical solutions, and fostering the development of compound AI systems will be at the forefront. It is not merely just the matter of adopting this technological advancement, but also of preparing our workforce for this change. As we venture into this uncharted territory, it is essential to update and refine these frameworks.

Related Articles

Top Stories