Use case
Customer Support Agents
AI customer support agents handle thousands of refund decisions a day. When one misfires, it doesn’t fail loudly — it costs money quietly, and you find out from a support ticket, not a trace.
Where things go wrong
Over-refund / under-refund
Your refund agent interprets policy ambiguously and issues $1,000 instead of $500 — or $100 instead of $500. The customer is either angry or your finance team is calling. Neither outcome surfaces in a trace unless you’re scoring the output.
Double the intended refund, or a support escalation that costs more to resolve than the refund itself.
Invented departments or procedures
The agent tells the customer to contact "our coupon department" — a department that does not exist. The customer calls support again. The support rep has no context for why the AI said what it said. The second interaction costs more than the first.
Compounded support load and customer trust erosion with no trace of what the agent said.
Policy drift across prompt versions
A prompt update ships that changes how the refund policy is interpreted. Outputs that passed eval against the old policy now fail silently against the new one. No regression test was run because eval was manual and infrequent.
A class of failures goes undetected across hundreds of interactions before someone notices the pattern.
Confident wrong answer with no uncertainty signal
The agent states a wrong policy interpretation with the same confident tone it uses for correct answers. Nothing in the output signals that it’s wrong. The customer acts on it. The downstream cost hits finance, not the agent.
Financial exposure and customer harm with no signal in the output that would trigger a human review.
Eval + control loop
What happens when a rule fires
The response
How TruLayer closes the loop
- Faithfulness
- Hallucination
- Function Call
TruLayer’s 25 built-in evaluators score every agent output inline as each span arrives — not in a nightly batch, not after a support ticket surfaces the pattern. For customer support agents, the evaluators that matter most are faithfulness (is the response grounded in the actual policy document?), tool-call correctness (did the agent invoke the right action with the right parameters?), and hallucination (did the agent assert a fact — a department name, a policy term, a refund amount — that isn’t in its grounding context?). Every decision, scored inline, every call.
When an eval rule fires — for example, a faithfulness score below threshold on a refund decision — the control loop acts on the next call on the same failure path. The three action types are: retry with a corrected or more specific prompt; fall back to a more conservative model; or route the case to a human review queue (HITL escalation). The queue holds the case for a human agent to review before the same failure class can repeat. The choice of action is determined by the rule you define — no code change required to update it.
After a remediation action fires, TruLayer runs a regression check: if the corrected output also fails eval on the same rule, it surfaces an alert. You see a per-trace before/after delta — the original eval score alongside the post-remediation score — so you know whether the fix actually worked or introduced a new class of failure. The system doesn’t just close the loop; it tells you whether the loop closed correctly.
See it in practice
Instrument your customer support agent in two lines.
Wrap your LLM client. Every span from this trace is captured and scored by every built-in evaluator. Eval rules and control-loop actions are configured in the dashboard.
import { TruLayer } from '@trulayer/sdk'
import OpenAI from 'openai'
const tl = new TruLayer({ apiKey: process.env.TRULAYER_API_KEY })
const openai = tl.instrument(new OpenAI())
// Every span from this client is captured, scored by all 25
// built-in evaluators, and surfaced in the support project.
// Eval rules + control-loop actions are configured in the dashboard,
// not in your application code.
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: task }],
})Ship reliable customer support agents.
Free tier includes 1M spans / month · No credit card