The Real Cost of AI in CX

1 July

Written By Geoff Willshire, Chief Product Officer

AI in CX is sold as a cost-saving story.

Fewer agents. Lower cost per contact. More issues resolved without a human.

That promise is real. But it is only half the equation.

The better question isn't just what AI saves you. It is what it costs to run AI well enough to trust it in front of customers. Because once AI is doing real work in CX, the bill isn't just compute and tokens. It is also the cost of assurance, and the cost of organizing the people and process around it.

AI in CX has four costs:

Compute.
Tokens.
Assurance.
Organizational cost.

Most leaders see the first two. Some see the third. Very few price in the fourth.

Cost one: Compute

AI-led CX runs on cloud infrastructure, and that infrastructure scales with ambition.

More channels, more concurrency, more real-time voice and language processing all draw on compute you pay for whether the interaction resolves cleanly or not. This is the layer most teams budget for, because it looks like traditional cloud spend.

It is also the layer that quietly grows every time you extend an agent into a new journey.

Cost two: Tokens

This is the layer the recent Salesforce numbers make impossible to ignore.

Last year Salesforce made headlines when it burned through more than 12 trillion AI tokens running its agentic use cases. By its most recent quarter, it had processed 28.6 trillion tokens to date, up 152% on the prior quarter.

Every prompt, every retrieval, every step an agent takes through a multi-turn conversation consumes tokens. And token usage rises with autonomy.

A scripted IVR has a fixed cost per call. An agentic system that reasons, clarifies, and calls backend systems does not. The same interaction can cost very different amounts depending on how the agent handles it, and that cost is often invisible until you aggregate it across millions of conversations.

Anthropic has put numbers on the same pattern. In its 2025 engineering research on multi-agent systems, a single agent used roughly four times the tokens of a chat interaction, and a multi-agent system about fifteen times. The reason is straightforward: each turn adds context, and context compounds. Anthropic was candid that the higher multiplier only pays off when the task is valuable enough to justify it, and that routine, consumer-grade question answering can't absorb it.

That is the uncomfortable part. Token spend and quality aren't the same thing.

An agent can burn compute and tokens and still give a customer the wrong answer. Spending more doesn't buy you trust. It only buys you scale.

Cost three: Assurance

The third cost is the one that should protect the first two.

You can't trust an AI system you aren't continuously validating. But validation has its own economics, and this is where many CX teams get caught out.

Most testing and assurance platforms still price the way the old IVR world priced: per port, per session, or per API call. That model was tolerable when test volumes were predictable. It becomes a problem when AI volume starts to compound.

Think about what that pricing does to behaviour. If every test run carries a per-session charge, thorough coverage gets expensive fast. So teams ration. They test a subset of paths and sample instead of validating continuously. They lean on production incidents and agent feedback to tell them what broke.

Coverage stops being a risk decision and becomes a budget decision.

That is a big gamble when the system you're testing makes its own decisions in front of real customers.

Forrester named a related structural trap: the platform that designs the runtime and assembles the hidden context also bills for the tokens it creates, so weak design can quietly become a recurring platform tax you can't easily see, tune, or challenge. Spending less doesn't solve it. Proving independently whether what you pay for actually works does.

Testing also burns tokens and compute. That is honest reality. Synthetic conversations against an agentic system consume resources too.

But the cost of not testing is usually far higher. An agent that loops, misroutes, or hallucinates burns tokens producing interactions that were never going to resolve. Left unwatched, you pay for that failure at scale, for weeks, before it shows up in the averages.

And in a multi-agent system the failure rarely stays contained. HFS Research, in a 2026 survey of 202 large enterprises, found that 21% had already hit cascading failures, where one broken agent pushed contradictory outputs and runaway loops across the rest of the system. One fault becomes many, and each one keeps spending tokens and compute on work that was never going to resolve. Catching the first break, independently and before it multiplies, is what continuous testing is for.

So testing has a cost. Run well, it is a small, predictable one that protects you from a much larger, unpredictable one.

Cost four: Organizational cost

There is a fourth cost that usually gets missed: organizational cost.

This is the cost of the operating model around AI. It includes scarce specialists, manual test setup, slow handoffs, repeated context switching, and the coordination overhead required to keep AI safe and useful at scale.

It is easy to underestimate because it doesn't show up as a neat line item on the AI budget. But it is real.

If every model update, prompt change, or integration shift needs specialist scripting and specialist review, then the organization becomes the bottleneck. Product waits on QA. QA waits on engineering. Engineering waits on availability. Availability turns into a queue.

That isn't just inefficient. It is expensive.

And it gets worse as AI scales, because AI changes more often than traditional CX systems. New intents, new prompts, new policies, new integrations, new model versions. Every change creates a fresh demand for validation and governance.

If the process depends on too many scarce people, the hidden tax keeps growing.

This is where QAI matters.

QAI lets anyone who can describe a customer journey test it in plain English. That means the people closest to the customer experience can create tests, run them, and understand the result without waiting for a specialist bottleneck.

So the benefit isn't just faster testing. It is lower organizational drag.

What the four costs add up to

The four costs interact.

Compute rises with ambition.

Tokens rise with autonomy.

Assurance rises with risk.

Organizational cost rises when the system depends on scarce people and manual process.

That is why the cheapest-looking AI stack can be the most expensive to operate. It can hide its true cost in a mix of infrastructure, token spend, test friction, and human bottlenecks.

The real question for a CX leader isn't only, “What does our AI cost to run?”

It is also:

What does it cost to test properly?
What does it cost to govern continuously?
What does it cost the organization every time the system changes?

Those are the questions that determine whether AI in CX creates value or just creates expense.

Spend on the right cost

None of this is a reason to slow down AI in CX.

The point is to spend on the costs that matter, and not on the costs that quietly add overhead without improving outcomes.

The assurance layer should build trust, catch failures that hide inside healthy-looking averages, and give you an evidence trail when a board or regulator asks for it. That is worth paying for.

What isn't worth paying for is a pricing model that penalizes you for testing thoroughly, or a feature estate built for someone else’s problem.

If you want to trust the AI you deploy, the spend worth protecting is the one that proves it works.

The token and compute bill will keep climbing as agents take on more work.

The assurance bill should be the one cost in your CX stack that scales with your risk, not with your vendor’s price list.

And the organizational cost should be the one you actively work to reduce.

It is worth remembering what good CX is actually worth. Watermark Consulting’s Customer Experience ROI study found that CX leaders delivered 7.8 times the total shareholder return of laggards over 18 years. Those experiences are now mostly automated, so whether that advantage holds increasingly depends on whether the AI behind them is continuously tested, not merely assumed to work.

Because in AI CX, the real savings don't come from running less.

They come from running better.

Geoff Willshire, Chief Product Officer