Use case

Customer Support Agents

AI customer support agents handle thousands of refund decisions a day. When one misfires, it doesn’t fail loudly — it costs money quietly, and you find out from a support ticket, not a trace.

Where things go wrong

Over-refund / under-refund

Your refund agent interprets policy ambiguously and issues $1,000 instead of $500 — or $100 instead of $500. The customer is either angry or your finance team is calling. Neither outcome surfaces in a trace unless you’re scoring the output.

Double the intended refund, or a support escalation that costs more to resolve than the refund itself.

Invented departments or procedures

The agent tells the customer to contact "our coupon department" — a department that does not exist. The customer calls support again. The support rep has no context for why the AI said what it said. The second interaction costs more than the first.

Compounded support load and customer trust erosion with no trace of what the agent said.

Policy drift across prompt versions

A prompt update ships that changes how the refund policy is interpreted. Outputs that passed eval against the old policy now fail silently against the new one. No regression test was run because eval was manual and infrequent.

A class of failures goes undetected across hundreds of interactions before someone notices the pattern.

Confident wrong answer with no uncertainty signal

The agent states a wrong policy interpretation with the same confident tone it uses for correct answers. Nothing in the output signals that it’s wrong. The customer acts on it. The downstream cost hits finance, not the agent.

Financial exposure and customer harm with no signal in the output that would trigger a human review.

Eval + control loop

What happens when a rule fires

Customer Support Agents control loop: original span scores faithfulness 0.42 — below threshold, triggering fallback model — resolved to 0.89.STEP 1Original spanarrivedSTEP 2Eval firesfaithfulness 0.42 — below thresholdSTEP 3Fallback modelnext call on the same failure pathSTEP 4Resolvedfaithfulness 0.89 — resolved

The response

How TruLayer closes the loop

  • Faithfulness
  • Hallucination
  • Function Call

TruLayer’s 25 built-in evaluators score every agent output inline as each span arrives — not in a nightly batch, not after a support ticket surfaces the pattern. For customer support agents, the evaluators that matter most are faithfulness (is the response grounded in the actual policy document?), tool-call correctness (did the agent invoke the right action with the right parameters?), and hallucination (did the agent assert a fact — a department name, a policy term, a refund amount — that isn’t in its grounding context?). Every decision, scored inline, every call.

When an eval rule fires — for example, a faithfulness score below threshold on a refund decision — the control loop acts on the next call on the same failure path. The three action types are: retry with a corrected or more specific prompt; fall back to a more conservative model; or route the case to a human review queue (HITL escalation). The queue holds the case for a human agent to review before the same failure class can repeat. The choice of action is determined by the rule you define — no code change required to update it.

After a remediation action fires, TruLayer runs a regression check: if the corrected output also fails eval on the same rule, it surfaces an alert. You see a per-trace before/after delta — the original eval score alongside the post-remediation score — so you know whether the fix actually worked or introduced a new class of failure. The system doesn’t just close the loop; it tells you whether the loop closed correctly.

See it in practice

Instrument your customer support agent in two lines.

Wrap your LLM client. Every span from this trace is captured and scored by every built-in evaluator. Eval rules and control-loop actions are configured in the dashboard.

agent.ts
import { TruLayer } from '@trulayer/sdk'
import OpenAI from 'openai'

const tl = new TruLayer({ apiKey: process.env.TRULAYER_API_KEY })
const openai = tl.instrument(new OpenAI())

// Every span from this client is captured, scored by all 25
// built-in evaluators, and surfaced in the support project.
// Eval rules + control-loop actions are configured in the dashboard,
// not in your application code.

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: task }],
})

Ship reliable customer support agents.

Free tier includes 1M spans / month · No credit card