Everything you need to ship reliable AI

Five scenarios, one platform. Instrument, observe, evaluate, improve, and govern your production AI — without stitching tools together.

Instrument

Two lines. Your entire AI stack, visible.

Start seeing every LLM call, tool invocation, and retrieval step in under 5 minutes.

Distributed Tracing

Auto-captures every hop in your agent pipeline. Zero config, full fidelity.

OpenTelemetry Native

OTLP in and out via /v1/otlp/export. No lock-in — export to any backend.

CI-Safe Testing Modes

TRULAYER_MODE=local runs the SDK in-process with no network calls. TRULAYER_MODE=replay deterministically replays golden trace files. Keep evals out of production pipelines.

MCP Server & Claude Code Skills

Query traces from an AI agent or directly from your IDE. Observability lives in the AI workflow you already use.

Observe

See what’s happening. Understand why.

From raw traces to patterns and anomalies — without writing a single query.

Session Replay

Step through any agent run chronologically. Latency and token breakdowns at each step.

Live Trace Feed

Real-time stream of every trace as it lands. Watch production behavior unfold span by span.

Failure Clustering

Group similar errors automatically. See patterns, not noise.

Anomaly Detection

Automatic alerting when score, latency, or error rate drifts outside baseline.

Semantic Search

Find traces by meaning, not just metadata filters. Powered by vector similarity on your span embeddings.

Evaluate

Know whether it was correct, not just whether it ran.

Score quality at every step, not just the final output.

Eval Catalog

25 pre-built LLM evaluators: hallucination, faithfulness, toxicity, retrieval accuracy, and more.

Eval Rules

Attach any scorer to any span. Score intermediate steps, not just final answers.

Datasets & Regression Testing

Build golden trace sets. Re-run them on every change to catch regressions before users do.

Eval Trends

Track scores over time. Know when a model update silently hurt quality.

BYOK Eval Models

Use your own Anthropic or OpenAI key for premium-tier evaluators. Pay providers directly.

Improve

Close the loop before the next user hits it.

From detected failure to shipped fix — without leaving the platform.

Prompt Improvements

AI-proposed prompt edits based on failure clusters. A/B tested automatically, shipped or held for approval.

Self-Healing Actions

Retry with exponential backoff, fall back to a secondary model, or escalate — automatically. Configurable retry depth (1–10, default 3) prevents runaway cascades: when a trace has been retried that many times without passing eval, it routes to the human-in-the-loop queue. Operators see the escalation reason and "Retry cap hit" metric.

Human-in-the-Loop Approval

Any action can require owner approval before it ships. Full audit trail of who approved what and when.

Remediation Diffs

Before/after delta: eval score deltas, latency, and embedding similarity across every eval rule that ran on both spans. Know the fix worked — see it inline in trace detail.

Model Routing

Route traffic by eval score, cost, or latency. Swap models without code changes.

Alert Rules & Webhooks

Slack, PagerDuty, or any HTTP endpoint with HMAC-signed payloads. Alert on the metrics that matter to you.

Govern

Operate AI you can audit, explain, and trust.

Enterprise-grade controls that don’t get in the way.

Multi-tenant RBAC

Owner, member, and viewer roles. Enterprise SAML via Clerk with zero config. Every tenant’s data fully isolated.

PII & Secret Redaction

Server-side auto-scrubbing strips emails, tokens, API keys, and common PHI patterns at ingest. Every project, no config.

Kill Switch

Instantly halt agent traffic for any project. One click, immediate effect.

Feedback Collection

Thumbs up/down on any trace. Feeds directly into eval datasets so quality compounds over time.

CI Gate

Block merges when eval scores drop below threshold. Quality as a merge gate.

Enterprise

Compliance-ready capabilities

Built for regulated workloads. Audit trails, data subject requests, custom retention, and a dedicated support channel.

Tamper-evident audit log

SHA-256 hash chaining over every audit event with periodic anchoring to S3 WORM storage. Detect any retroactive edit; export a verifiable audit trail for compliance reviews.

GDPR data subject requests

Right-to-erasure via the DSR API: delete or export all spans, evals, and feedback for a given subject ID across hot and cold storage in one call.

Custom data retention

Retain spans, eval results, and audit events beyond the standard 90 days. Retention windows are negotiated at contract time to match your compliance requirements — contact support to adjust.

Dedicated support & SLA

Named customer success engineer, shared Slack channel, and a contractual uptime SLA. Priority routing for incidents and feature requests.

Available on Enterprise plan. Contact sales →

How it works

Instrument. Observe. Act.

1

Instrument

Wrap your LLM client with two lines of code. Every call, chain, and tool use is traced automatically.

2

Observe

Every agent step appears in your trace explorer in real time with latency, token, and cost breakdowns.

3

Act

Automatic evals score every output. Self-healing rules retry, fallback, or escalate before the next user hits the same failure.

Start free

Get full observability and self-healing for your AI agents. Starter is free — 1M spans / month, no credit card.

Start free