Blueprint Logo

The AI Agent Blueprint is a strategic map for launching and scaling AI in customer service.

It helps customer service, CX, and AI transformation leaders deploy fast, scale with confidence, and achieve meaningful business transformation with AI.

2.5 Evaluate

This four-step process helps you apply both evaluation lenses – business performance and conversation quality – in a structured, outcome-driven way.

Note: You don't need to run a multi-vendor evaluation to make a confident decision. In many cases, a single-threaded proof of concept (POC) with the strongest-fit solution is the fastest and clearest path forward. It gives you more control, lets you go deeper, and builds a stronger signal around how the AI Agent performs in your real-world environment.

Step 1: Define what success looks like

Before testing an AI Agent, align your success criteria and metrics to what matters most across the two evaluation lenses – business performance and conversation quality.

Use a mix of quantitative and qualitative metrics to get a complete view of value and make a compelling case for adoption.

For example:

Business performance
Conversation quality

Use quantitative metrics like:

  • Resolution rate
  • Deflection/containment rate
  • Time saved
  • CSAT (if testing in a live environment)
  • Customer Experience Score (if testing in a live environment)

Use qualitative signals like:

  • It understood what the customer was asking.
  • It answered questions accurately and on-brand.
  • It knew when to hand over to a human agent and did it smoothly.

These will provide measurable proof of the AI Agent's impact, making it easier to justify investment and compare against benchmarks.

These capture subjective but crucial elements like trust, usability, and perceived value – all of which are key drivers in decision-making.

Note: We do not recommend tracking metrics in isolation. By combining both quantitative and qualitative metrics, you’ll get a more complete view of the AI Agent’s impact. This approach will also help ensure the evaluation isn’t just about hitting numbers, but also about demonstrating real-world fit and usability, addressing both executive and customer concerns.

Clay
Keep what’s important to your business up front and center. For us, that was transparency and control over the customer experience. Focusing on the end goal helped us come to the right decision because we knew what was important to us
George DiltheyHead of Support
George Dilthey

Step 2: Build a realistic test environment and train the AI Agent

Once you’ve defined what success looks like, you can begin testing.

1. Set up your test environment using real customer questions and your current knowledge base or help content

You can choose to test in a sandbox environment or with live conversations. The important thing is to use real customer questions to test the AI Agent from a business performance and conversation quality perspective.

Once you are confident with the baseline performance, we recommend testing the AI Agent with real users to validate the quantitative and qualitative metrics in the real world.

Source a range of customer conversations to test against:

  • Complex queries that typically require multiple touchpoints from different team members.

  • Vague queries that don’t contain any “real” information and require further clarification to resolve.

  • Edge cases that have been difficult for your human team to resolve.

  • A few sensitive scenarios, such as billing disputes and cases where customers have become frustrated.

  • Examples of queries in different languages, if you provide multilingual support.

Take this a step further and prepare variations of the same questions to test how the AI Agent handles different types of communication:

  • Difficult questions that require information from multiple sources to answer.

  • Different phrasings of the same question.

  • Incomplete or fragmented queries.

  • Questions with typos or grammatical errors.

  • Conversations with various levels of formality.

The goal is to simulate what happens in reality. Any AI Agent can look impressive in a controlled setting, but performing well when faced with real challenges customers bring is what separates “good enough” from great.

If you’re evaluating more than one solution, make sure you set up your AI Agents in the same way for a fair comparison. Split out the conversation volume equally so you can accurately test each of the solutions.

2. Train the AI Agent

Prepare your knowledge base or help content

You need quality content for an AI Agent to deliver good results. Assess your knowledge base content for:

  • Coverage Make sure you have adequate coverage for the testing cohort to give the AI Agent all the information it needs to address key questions and topics. For example, if you want to test whether the AI Agent can fully resolve queries on a specific topic, like your accounting product, or for a specific audience, like your Freemium users, it must have access to relevant content for both.

  • Accuracy To prevent the AI Agent from learning outdated information, make sure what you’re exposing it to is accurate and up to date. For example, if your return policy has changed from 60 days to 30 days, update this.

  • Structure The more straightforward and comprehensive your articles are, the easier it will be for the AI Agent to consume them. Focus on simple language and an easy-to-scan structure.

You don’t have to reformat or rewrite all your help content before running tests. This is just something to be aware of and potentially return to should content gaps emerge during the testing or you spot any glaring issues.

Configure the AI Agent’s rules, tone, and behavior

Modern AI Agents let you control how they communicate and act – for example, you can instruct them to provide concise or comprehensive responses, use specific terminology for your industry, or follow protocols that match your support policies.

For Fin, we call this Guidance. It enables you to define Fin’s communication style, coach it to gather context and clarify issues, and set rules for routing and handovers.

Guidance

Step 3: Score performance and analyze results

Run your test conversations through the AI Agent and evaluate results through your two performance lenses:

Business performance

How well does it deliver results?
Resolution rate

How many queries did the AI Agent resolve end-to-end?

Deflection / containment rate

How many queries did the AI Agent manage without needing to hand it over to a human?

Time saved

How quickly did the AI Agent resolve queries, and how much time would it have taken your team to handle them?

CSAT* (if in a live environment)

How are customers rating the AI Agent vs human agents?


[If you’re testing Fin] What’s the Customer Experience (CX) Score?

Conversation quality

How well does it communicate?
Accuracy
  • Did the AI Agent understand the customer’s intent? Did it ask for clarification when needed?

  • Did it pull from the right knowledge sources?

  • Did it personalize the response appropriately for the customer’s context?
Behavior
  • Did it maintain the right tone for your brand?

  • Did it escalate appropriately when issues were beyond its scope?

  • Did it route to the correct team when handoffs were needed?
Experience
  • Was the overall interaction smooth and satisfying?

  • How would a typical customer rate this response?

  • Would your support team feel confident standing behind this answer?

Important note: you should also evaluate the vendor, not just the AI Agent

The vendor behind the AI Agents matters just as much as the solution itself. You’re choosing a partner for transformation. One that will help you evolve how your business delivers customer experience.

This isn’t traditional vendor management. You’re betting on a vision of the future. Ask:

Are they pushing boundaries?

  • Are they shaping the future of AI-powered customer experience, or reacting to it?

  • Do they have a clear point of view on where AI is headed?

  • Are they building capabilities that stretch beyond today’s benchmarks, and not just keeping pace?

Are they a true partner, not just a provider?

  • What does their product roadmap look like? How does customer feedback shape it?

  • What kind of support will you get post-launch, e.g., ongoing support, or does the relationship shift to basic technical support?

  • Are they transparent about current limitations? Vendors who acknowledge gaps and commit to fixing them are more likely to be honest partners than those who oversell capabilities.

Are they built for long-term success?

  • How long do companies like yours typically stay with this vendor? Look at their existing customer base, retention, and growth rates.

  • How do they respond to hard questions during evaluation? Those who get defensive about limitations or rush you toward a decision may not be committed to long-term success. The best vendors welcome hard questions and help you spot risks before they become problems.

Ask yourself: does this vendor feel like someone who will help us reinvent customer experience, or just someone selling software? Great AI Agents are backed by great partners. Look for vendors that are obsessed with support, transparent about how the technology works, and committed to co-building the future with you.

Clay
We’re in a world where lots is changing and we’re still learning about AI. The most important thing I would express to other CX leaders is that it’s crucial to have the right partners alongside you to teach you what you need to know to be successful in this world.
Jess BergsonHead of CX
Jess Bergson

Step 4: Decide whether the AI Agent is the right fit

Weight your findings based on your personal priorities. If accuracy is non-negotiable, don’t compromise, even if other qualities like tone or personality feel strong. If you need immediate deployment, factor in integration complexity and vendor support quality.

Consider both immediate performance needs and long-term operational success.

The trade-offs you have to consider here are:

  • If business performance is low, you’ll struggle to show ROI.

  • If conversation quality is bad, customer trust may suffer.

Ultimately, the AI Agent you choose should be the one that fits your goals, supports your team, and will help you scale sustainably.

Share section:
Next section: 2.6Deploy

Get started with the #1 AI Agent today