Control Loop v0.1 — now live

Your AI nails the demo. TruLayer makes it nail production.

Evals score every output. A closed control loop retries with a fallback model, gates high-stakes actions on human approval, and rolls back automatically when a fix introduces a new regression — turning every production failure into a system fix before it hits the next user.

Start free Read the docs

Free tier includes 1M spans / month · No credit card

< 50msIngestion latency

1MFree spans / month

SOC 2Type II in progress

99.9%Uptime target

Works with

OpenAIAnthropicClaude (MCP)LangChainLangGraphAutoGenCrewAILlamaIndexPydanticAIDSPyHaystackVercel AI SDKMastraLlamaIndex-TSCustom LLMs

How teams use TruLayer

Observe. Evaluate. Improve.

Most tools stop at the trace. TruLayer takes you from “something’s wrong” to “here’s the fix, shipped” in one platform.

Observe

See what’s happening. Understand why.

Distributed traces, failure clustering, anomaly detection, and semantic search — everything you need to go from "something’s wrong" to "here’s exactly why." Configurable retry depth prevents runaway cascades: when a trace has been retried N times without passing eval, it escalates to the human-in-the-loop queue automatically.

Explore observe

Evaluate

Know whether it was correct, not just whether it ran.

25 pre-built evaluators, eval rules on any span, regression testing against golden datasets, and score trends over time.

Explore evaluate

Improve

Close the loop before the next user hits it.

AI-suggested prompt improvements, self-healing actions, human-in-the-loop approval, and remediation diffs — the full control loop.

Explore improve

Who uses TruLayer

AI agents handle millions of decisions. Here’s where they go wrong.

TruLayer keeps them on track — automatically, at the system level, before the same failure hits the next user.

Customer & Revenue

Customer Support Agents

Thousands of refund decisions a day. One bad policy interpretation costs you twice.

Your refund agent handles thousands of decisions a day. One bad policy interpretation issues $1,000 instead of $500, another invents a department that doesn’t exist — and you find out from a support ticket, not a trace. Eval rules score every refund decision inline. When a rule fires, the control loop retries with a corrected prompt or routes the case to a human queue — the next customer gets the right answer.

Explore customer support agents

Outbound Sales Agents

Deprecated pricing, opted-out prospects, and a deal that collapsed.

Your SDR agent quoted a pricing tier you deprecated six months ago, then emailed a prospect who had opted out last quarter. The deal collapsed; legal is now involved. Faithfulness scoring flags outputs that drift from your pricing and compliance context. The same failure class doesn’t reach the send queue twice.

Explore outbound sales agents

Engineering & Operations

Agentic Coding Agents

Wrong-scope refactor. Deleted file. Test edited to pass. Found at CI, hours late.

Your coding agent refactored a module that a parallel branch already rewrote, deleted a file based on a truncated context window, and edited a test to make it pass. CI catches the deletion hours after the agent session closed; the rest only surfaces in staging. Function-call correctness, prompt injection, and faithfulness evaluators score every tool call inline. When a rule fires, the control loop retries with a corrected file scope or routes the next agent run on the same failure path to a human review queue.

Explore agentic coding agents

AI Ops Agents

Restarted the wrong service. Now you have two incidents.

Your incident response agent restarted a healthy service that shared a label with the degraded one. Now you have two incidents and it’s 2am. Tool-call correctness evaluators score every automated action inline. When a rule fires, the loop routes to a human before the next runbook step executes — not after the postmortem.

Explore ai ops agents

Data & Documents

Document Extraction Agents

Wrong total line. Dropped tax ID. Silent type mismatch. Pipeline reported success.

Your invoice extraction agent read the subtotal line instead of total-due and wrote the wrong amount to your ERP. A second invoice dropped a vendor tax ID because the field label varied from the template — the PII landed in the CRM anyway. A third had its amount field silently coerced to string; the type validation failed, the pipeline reported success, and nobody found out until reconciliation. PII leakage and tool-call correctness evaluators run inline on every span. When an extracted field is missing, malformed, or the wrong type, the control loop retries with a targeted prompt, falls back to a stricter extraction template, or routes to a human review queue — before the record propagates downstream.

Explore document extraction agents

Finance & Reporting Agents

Last quarter’s forecast. This quarter’s actuals. One board deck.

Your analyst agent applied last quarter’s forecast model to this quarter’s actuals. The board deck had the wrong numbers. The model produced a clean, confident summary with no error indicators. Faithfulness scoring catches context-window drift before it propagates. The control loop retries with corrected grounding context; the trace shows exactly where the eval score dropped.

Explore finance & reporting agents

Regulated & High-Stakes

Legal Research Agents

The citation looked right. The case doesn’t exist.

Your research agent cited a case that doesn’t exist. The associate submitted the brief. Opposing counsel found it in writing. Hallucination and faithfulness evaluators run on every span. When a citation diverges from verified sources, the control loop escalates to a human reviewer instead of letting it move downstream.

Explore legal research agents

Clinical AI Assistants

Wrong dosage range. Delivered with full confidence.

Your health assistant described the right condition with the wrong dosage range — confidently, with no indication it was wrong. Nothing in the pipeline flagged it. Clinical faithfulness scoring runs inline as each trace arrives, not in a nightly batch. Outputs that deviate from grounding context route to a clinical review queue automatically.

Explore clinical ai assistants

See all use cases

Instrument in minutes

Two lines. Full visibility.

Wrap any OpenAI, Anthropic, or custom LLM client with TruLayer. Every call, chain, and tool use is automatically traced — no manual spans, no config files.

✓Auto-captures inputs, outputs, tokens, latency, cost
✓Propagates trace context across async agent hops
✓Zero overhead — proxy-based, not monkey-patching

agent.ts

import { TruLayer } from '@trulayer/sdk'
import OpenAI from 'openai'

const tl = new TruLayer({ apiKey: process.env.TRULAYER_API_KEY })
const openai = tl.instrument(new OpenAI())

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: task }],
  tools: agentTools,
})

// TruLayer automatically captures:
//   ✓ inputs, outputs, tool calls
//   ✓ token usage and cost per hop
//   ✓ latency and eval scores on every span
//   ✓ trace context across async agent steps

How it works

From deploy to confidence in three steps

Instrument

Wrap your LLM client with two lines of code. Supports OpenAI, Anthropic, and any custom model.

Observe

Every agent step, tool call, and chain hop appears in your trace explorer in real time.

Evaluate & Fix

Automatic evals score every output. Failure alerts and auto-remediation close the loop.

Built for this

Purpose-built for production AI

TruLayer is purpose-built for teams shipping LLM features to production — not adapted from general-purpose APM.

One stack, not three stitched together

You already have traces, evals, and a feedback queue — in three different tools, with three integrations to maintain and no shared data model. TruLayer puts tracing, automated evaluation, and human feedback in a single pipeline. One place to see what ran, how it scored, and what to fix.

OTEL-native from day one

If your stack already speaks OpenTelemetry, TruLayer fits without a rewrite. Send spans over OTLP — same exporter you use for everything else. No proprietary SDK to lock in, no parallel instrumentation to maintain. When you outgrow TruLayer, your traces leave with you.

First trace in under five minutes

Sign up, add two lines, see your first trace. No YAML, no collector config, no support ticket. The quickstart is written for engineers who have a running agent today — not for teams evaluating a six-month rollout.

Ready to see your first trace? Start the quickstart

Pricing

Simple, usage-aligned pricing

Start free. Scale with your usage. No credit card required on Starter.

Starter

$0/ month

No credit card required

1M spans / month
3 seats
2.5K evals / month
Anomaly detection
30-day retention
Docs & GitHub support

Start free

Pro

$149/ month

$5 / additional 1M spans

20M spans / month
5 seats
90-day retention
50K evals / month
Anomaly detection + webhooks
Semantic search
Email support

Get Pro

Team

$699/ month

$4 / additional 1M spans

100M spans / month
15 seats
180-day retention
250K evals / month
SSO / SAML included
Slack support (trulayerai.slack.com)

Get Team

Biz

Coming soon

Large teams beyond 15 seats. Contact us for custom pricing.

Enterprise

Custom volume · Dedicated support · Compliance-ready

Need custom volume, procurement, or dedicated support? Let's talk.

See full pricing + calculator

Annual billing saves 20% on the Team plan. No credit card required on Starter.

Reliable AI.
Not just observable AI.

Observability tells you what broke. TruLayer tells you what broke, why, and fixes it automatically. Start free.

Start free

Your AI nails the demo. TruLayer makes it nail production.

Observe. Evaluate. Improve.

Observe

Evaluate

Improve

AI agents handle millions of decisions. Here’s where they go wrong.

Customer & Revenue

Customer Support Agents

Outbound Sales Agents

Engineering & Operations

Agentic Coding Agents

AI Ops Agents

Data & Documents

Document Extraction Agents

Finance & Reporting Agents

Regulated & High-Stakes

Legal Research Agents

Clinical AI Assistants

Two lines. Full visibility.

From deploy to confidence in three steps

Instrument

Observe

Evaluate & Fix

Purpose-built for production AI

One stack, not three stitched together

OTEL-native from day one

First trace in under five minutes

Simple, usage-aligned pricing

Reliable AI.Not just observable AI.

Reliable AI.
Not just observable AI.