We Don’t Need Higher Peak Intelligence, Only More Intelligence Density

When we think of making progress on the intelligence front in AI, we typically think of producing models that have higher “peak intelligence”. But in the Customer Support space, and many other similar applications, it is not higher peak intelligence that we most need in order to build generally able AI systems anymore. Instead, “intelligence density” is how we can efficiently drive immediate and meaningful progress in real-world, latency-constrained AI applications, today.

Afonso Marques

2025.08.20

We argue that, in the absence of a latency constraint, Large Language Models have by now achieved a sufficient level of peak intelligence to effectively solve most Customer Support tasks. As such, in our industry and others like it, it is pragmatic that we shift our attention from expanding the maximum achievable intelligence to optimising the quantity of intelligence produced per second, which we will refer to as the intelligence density.

To make this argument:

We start by explicitly defining intelligence for AI agents along three essential dimensions.
We then introduce a taxonomy categorising AI products according to their requirements, illustrating why certain applications thrive while others see slower progress.
With this foundation, we make the argument that, in Customer Support and similar applications, current state-of-the-art models have already achieved sufficiently high levels of peak intelligence, shifting the bottleneck to intelligence density.
This reframing provides us with an alternative path to build effective and reliable systems without having to rely on ever-increasing levels of peak intelligence.
This reframing impacts how we should build AI applications today.

What is Intelligence?

In this article, we will be making a point about the intelligence of state-of-the-art Large Language Models (LLMs) and Large Reasoning Models (LRMs)—essentially LLMs post-trained to develop native reasoning abilities—in the context of the field of AI-powered Customer Support. In order to do that, we need to start by defining what we mean by intelligence. This definition has to respect the general principles that govern intelligence, but be made tangible with respect to what we expect from AI Agents.

Let us define intelligence in a LLM-centric manner: Assuming a fixed and sufficient amount of base world knowledge, we argue the more intelligent model will be the one that, with all other dimensions held constant:

Is able to reason about the most intellectually complex tasks most correctly and systematically.
Is able to understand the nuances of human expression and interaction most accurately.
Is able to discern the limits of its own world knowledge most clearly.

The first of these dimensions is most obvious, and the most watched. But the other two play crucially important roles, particularly in high-risk domains: they provide the context required for the problem-solving as well as enable the understanding and setting of boundaries within which the reasoning should take place. Together, the three dimensions allow us to derive the intelligent behaviours we expect from, and find valuable in, an AI Agent.

Let us consider for a moment the concept of “instruction-following intelligence”, particularly topical in the Customer Support space. This concept is used to refer to the ability of LLMs to consistently and faithfully follow instructions given to them. While as a short-hand this concept is a useful abstraction, we argue that following instructions is not a form of intelligence in itself. Rather, it is an ability borne out of traits that can be mapped back to the concept of intelligence.

In contrast to that framing, the three dimensions of intelligence we outlined are all individually necessary and jointly sufficient to capture “instruction-following intelligence”. That is, to follow an instruction intelligently the agent needs to:

Have the ability to reason through the logical steps in the instruction, as well as any steps that would logically follow from them and be required to achieve the instruction’s goals.
Identify and understand what exactly each top-level logical step in the instruction is and is not about, i.e. where their exact boundaries lie.
Be able to clearly delineate what set of knowledge it should ground its problem-solving in, both with respect to its learned knowledge (through training) and given knowledge (in-context).

Each of these abilities directly maps to one of the three dimensions we used above to define intelligence, and together they yield the ability to expertly follow even the most complex of instructions. Remove one, and the agent will fail to reliably produce the desired outcomes. And this applies more generally: agents need to be able to understand what they know and what they do not, understand what they are told by humans (and other agents, which for now communicate just like us), and be able to reason about it all.

Why is this important? Because we need to deeply understand what today’s agents are capable of and where they fall short if we want to make progress in how we build and leverage these systems. Simply concluding that an agent “lacks instruction-following intelligence” does not guide us toward a solution. Instead, progress comes from understanding specifically where and why the agent fails: whether it is in interpreting the instruction and the interaction, at reasoning through these elements, or in recognising the boundaries of its own knowledge within the context of the instruction.

Now, equipped with this definition we want to make the following case. While in many domains—such as cutting-edge scientific research or high-risk medical applications—frontier LLMs still fall short across one or more of these three dimensions of intelligence, there are domains where they already surpass the necessary threshold. We argue that Customer Support is precisely one domain in which LLMs demonstrably meet the intelligence threshold today. A more intelligent LLM may help us make progress, of course, but is not a necessary condition for progress. Understanding this will have a big impact on the products we build today.

Latency, Intelligence and building AI Products

Latency is a decisive factor in the outcomes we deliver for our customers and the experiences we provide to our end-users with Fin. It is thus an important leverage point for us, one that can make or break an interaction, and as such, a defining factor of the solutions we build.

This is not, however, necessarily the case for everyone building AI products right now. We can broadly split the AI Products currently being built into five categories:

Trivial Peak Intelligence, any latency profile (TI): e.g. summarisation, real-time translation agents.
Lower Peak Intelligence, Higher Latency (LIHL): e.g. data labelling, web-search agents.
Higher Peak Intelligence, Higher Latency (HIHL): e.g. app builder, research agents.
Lower Peak Intelligence, Lower Latency (LILL): e.g. content moderation, customer support agents.
Higher Peak Intelligence, Lower Latency (HILL): there aren’t applications here yet, and we will attempt to explain why this might be the case.

Figure 1: Illustrating the trade-off between peak intelligence requirements and latency budgets for LLM-driven applications. Tasks range from those requiring minimal intelligence at any latency (e.g., summarisation or real-time translation) to tasks with more demanding peak intelligence and/or density of intelligence profiles. The axes capture increasing complexity: on the horizontal axis, from lower to higher required peak intelligence; on the vertical axis, from higher to lower latency budgets. Notably, applications demanding both high peak intelligence and low latency are nowhere to be found, highlighting practical and computational constraints.

Leveraging this taxonomy, we can take stock of where progress lies for each category. Arguably, over the last year in particular, TI and LIHL use-cases have mostly been solved. HIHL use-cases have been seeing strong progress as of late, with notable examples such as OpenAI’s Deep Research and Anthropic’s Claude Code. And while we have not yet seen the same level of achievement in the LILL domain outside of purely informational interactions, among which Intercom’s very own Fin is a notable example, we are now starting to, also right here at Intercom.

Note that HIHL and LILL use-cases seem to have been on somewhat similar timelines of progress towards real-world impact up to now. And the reason for this is simple: new versions of state-of-the-art LLMs have had similar latency profiles than their preceding versions, but stronger capabilities across all three dimensions of our definition of intelligence, and in particular the first two. Because up to this point both categories have required these higher levels of peak intelligence, and they have been accompanied by broadly stable latency, improvements to the state-of-the-art have been equally benefitting both sets of applications at the frontier.

However, from here on out the paths of HIHL and LILL will diverge, at least partially. In certain HIHL tasks we will be able to make great progress, while others will remain unsolved for some time – the key obstacle being that the word “higher” in “Higher Intelligence” is unbounded, and in the limit there is no latency that can compensate for infinite intelligence requirements. But we now have a clear opportunity to hit escape velocity on almost all LILL use-cases. And this is possible due to the confluence of two key factors: the commoditisation of these higher levels of peak intelligence and the push towards technology that significantly speeds up inference.

Intelligence Frontier and Intelligence Density

Let us be clear – there are still situations in the Customer Support space that only humans can resolve. However, this is more closely tied to the “embodiment” problem than the “intelligence” problem: most fundamentally, LLMs lack the interface and tools to build their own context (a situation that, perhaps, web browsing agents can help change). But outside of this question, with the latest wave of LLM and LRM releases, peak intelligence itself is no longer the critical limitation in AI-powered Customer Support.

Today, the primary constraint is not the models’ frontier level of intelligence, but rather the density of intelligence they can produce. But how does realising that take us any closer to solving LILL use-cases? In the context of LLMs and LRMs, the concept of density of intelligence is typically framed as the amount of intelligence produced per token. However, this framing gets the denominator wrong, and obscures the real operational bottleneck. To help us understand this, let us consider an example.

Imagine a slow but very densely intelligent LRM that reads a duplicate-charge ticket in one comprehensive pass and takes sixty seconds to produce a refund resolution. In contrast, a faster, less intelligent model performs thousands of tiny checks—extracting amounts, verifying timestamps, grouping by invoice, checking refund history—one-by-one, some in parallel, some in sequence, and pools the results into the same conclusion in the same sixty seconds. The high-throughput model’s many rapid, lower intelligence passes collectively reconstruct the deep reasoning of the slow model. Under the assumption of a sufficiently high peak level of intelligence for the lower-intelligence-but-faster model, the system laid out matches the speed and performance of the highly intelligent, but slower, model.

And if we extend this reasoning to other parts of a notional “intelligence-latency frontier”—the boundary representing the optimal achievable trade-off between peak intelligence and latency for any given amount of intelligence required—we get the same outcome, even if for wildly different system implementations. That is, conditional on a high-enough latency of peak intelligence for the faster agent, we can trade-off peak intelligence for lower latency along the curve and still maintain the same level of performance.

Figure 2: Illustrating the intelligence-latency frontier. The grey curve marks the most efficient combinations of peak intelligence (x-axis) and per-token latency (y-axis) that deliver one fixed workload requiring a certain total amount of intelligence. The vertical line shows the minimum peak-intelligence threshold above which that workload becomes attainable; below it, no latency suffices. Sliding rightward and upward along the curve, from the “fast model assemble” point to the “slow intelligent model” point, demonstrates how the same total capability can be achieved either by rapid, lower-intelligence passes or by a single, slower yet smarter pass, with every intermediate mix representing an equally valid, if theoretical, implementation choice.

Therefore, in reality, the right metric for measuring density of intelligence is the amount of intelligence produced per unit of inference time, and not per token. Shifting the denominator to seconds instead of number of tokens fundamentally reshapes our approach because it highlights that there are two actionable optimisation levers:

Peak Intelligence: Enhancing the quality and depth of understanding, reasoning and problem-solving abilities embedded within each token.
Throughput (Tokens per Second): Increasing the speed at which these intelligently produced tokens are generated.

The key to solving LILL use cases is therefore to understand that simply having more tokens per second, at the same level of intelligence, is a sufficient condition for solving the Customer Support space, in a world where we are no longer limited by the upper-bounds of intelligence. Given a high-enough level of peak intelligence, 1/100 x’ing the latency per token is functionally equivalent to 100 x’ing the intelligence per token. The shapes of the resulting solutions are very different, but the bounds of what they can achieve are a lot more similar than they might appear at first glance.

Ultimately, this reframing illustrates a vital principle: when peak intelligence levels are sufficient, the key driver of value shifts to making LLMs and the systems we build around them denser in intelligence. High-enough peak intelligence is necessary, but once achieved, the opportunities for value creation move towards latency and density, where significant opportunities for differentiation remain under-explored and are cheaper to capture. Work on intelligence density is where we, at the application layer, must focus to make progress and capture value better and faster.

Corollaries

From the various arguments we have made throughout this article, we arrive at a few critical corollaries.

The first is this: many difficult problems are actually solvable provided we have infinite time. However, once we explicitly condition these tasks on a finite, practical time budget – which is always the case in real-world scenarios – they once again become challenging, and sometimes effectively impossible. Real-world tasks inevitably impose constraints, and solutions divorced from practical considerations become irrelevant for value creation irrespective of their intellectual elegance.

This leads us naturally to our second corollary: as the time allowed for task completion approaches infinity, the value of any model or system that requires full utilisation of such time budget to reach a solution correspondingly approaches zero. A solution’s worth is inextricably linked to the constraints of its practical context, making latency-bounded, resource-limited tasks arguably the most faithful metrics for measuring intelligence. In other words, without clearly defining a time budget, any attempt to measure intelligence is at best inconsequential, and at worst meaningless. And we are not alone in thinking this.

Finally, this framing also helps us revisit and hopefully understand a point we made earlier in the article – why it is that we have seen no practical examples of HILL applications to date (think: Ironman’s Jarvis AI). The reason is straightforward: HILL scenarios leave us with no latency budget through which we might be able to compensate for deficits in intelligence. Without latency to leverage, the only way to solve the challenge becomes relentlessly pursuing ever-higher levels of peak intelligence. But incremental intelligence improvements are costly and hard to come by and thus, the absence of practical HILL applications is not merely incidental—it is structural. Solving HILL use cases effectively becomes the costliest form of AI progress, and thus the rarest.

Why does this matter?

Understanding these notions of peak and density of intelligence, and where we are with respect to each of them on the timeline, directly impacts the products we build. A few months ago, we deeply understood how to leverage the agency and reliability trade-off in building agentic AI solutions to produce superior instruction-following agents. That realisation was how we built the future then.

Now, recognising the primacy of intelligence density over frontier intelligence in the Customer Support space provides us with a new, clear direction. This shift in focus will shape the next wave of AI-powered Customer Support beyond information gathering use cases. It will allow us to deploy agentic systems that follow strict and demanding instructions to execute complex and personalised tasks, interfacing with and modifying customers’ internal and external systems and databases, rapidly and reliably enough to meaningfully enhance real human interactions across the various dimensions our customers care about. And this realisation is how we build the future now.

About the author

Afonso Marques is a Staff Machine Learning Scientist in Intercom's AI Group. His previous work and interests lie in Natural Language Processing, Reinforcement Learning and Simulation, in both academic and industry settings. Currently working on Fin Tasks, enabling Fin customers to automate complex processes.

We Don’t Need Higher Peak Intelligence, Only More Intelligence Density

What is Intelligence?

Latency, Intelligence and building AI Products

Intelligence Frontier and Intelligence Density

Corollaries

Why does this matter?

About the author

Related Articles

Building out Intercom’s AI infra

Building a Better Language Detection Model for Fin

Cost of Serving LLMs

Fin: Running a Reliable Service over Unreliable Parts