Everything you need to ship reliable AI
Five scenarios, one platform. Instrument, observe, evaluate, improve, and govern your production AI — without stitching tools together.
Instrument
Two lines. Your entire AI stack, visible.
Start seeing every LLM call, tool invocation, and retrieval step in under 5 minutes.
Distributed Tracing
Auto-captures every hop in your agent pipeline. Zero config, full fidelity.
OpenTelemetry Native
OTLP in and out via /v1/otlp/export. No lock-in — export to any backend.
CI-Safe Testing Modes
TRULAYER_MODE=local runs the SDK in-process with no network calls. TRULAYER_MODE=replay deterministically replays golden trace files. Keep evals out of production pipelines.
MCP Server & Claude Code Skills
Query traces from an AI agent or directly from your IDE. Observability lives in the AI workflow you already use.
Observe
See what’s happening. Understand why.
From raw traces to patterns and anomalies — without writing a single query.
Session Replay
Step through any agent run chronologically. Latency and token breakdowns at each step.
Live Trace Feed
Real-time stream of every trace as it lands. Watch production behavior unfold span by span.
Failure Clustering
Group similar errors automatically. See patterns, not noise.
Anomaly Detection
Automatic alerting when score, latency, or error rate drifts outside baseline.
Semantic Search
Find traces by meaning, not just metadata filters. Powered by vector similarity on your span embeddings.
Evaluate
Know whether it was correct, not just whether it ran.
Score quality at every step, not just the final output.
Eval Catalog
25 pre-built LLM evaluators: hallucination, faithfulness, toxicity, retrieval accuracy, and more.
Eval Rules
Attach any scorer to any span. Score intermediate steps, not just final answers.
Datasets & Regression Testing
Build golden trace sets. Re-run them on every change to catch regressions before users do.
Eval Trends
Track scores over time. Know when a model update silently hurt quality.
BYOK Eval Models
Use your own Anthropic or OpenAI key for premium-tier evaluators. Pay providers directly.
Improve
Close the loop before the next user hits it.
From detected failure to shipped fix — without leaving the platform.
Prompt Improvements
AI-proposed prompt edits based on failure clusters. A/B tested automatically, shipped or held for approval.
Self-Healing Actions
Retry with exponential backoff, fall back to a secondary model, or escalate — automatically. Configurable retry depth (1–10, default 3) prevents runaway cascades: when a trace has been retried that many times without passing eval, it routes to the human-in-the-loop queue. Operators see the escalation reason and "Retry cap hit" metric.
Human-in-the-Loop Approval
Any action can require owner approval before it ships. Full audit trail of who approved what and when.
Remediation Diffs
Before/after delta: eval score deltas, latency, and embedding similarity across every eval rule that ran on both spans. Know the fix worked — see it inline in trace detail.
Model Routing
Route traffic by eval score, cost, or latency. Swap models without code changes.
Alert Rules & Webhooks
Slack, PagerDuty, or any HTTP endpoint with HMAC-signed payloads. Alert on the metrics that matter to you.
Govern
Operate AI you can audit, explain, and trust.
Enterprise-grade controls that don’t get in the way.
Multi-tenant RBAC
Owner, member, and viewer roles. Enterprise SAML via Clerk with zero config. Every tenant’s data fully isolated.
PII & Secret Redaction
Server-side auto-scrubbing strips emails, tokens, API keys, and common PHI patterns at ingest. Every project, no config.
Kill Switch
Instantly halt agent traffic for any project. One click, immediate effect.
Feedback Collection
Thumbs up/down on any trace. Feeds directly into eval datasets so quality compounds over time.
CI Gate
Block merges when eval scores drop below threshold. Quality as a merge gate.
Enterprise
Compliance-ready capabilities
Built for regulated workloads. Audit trails, data subject requests, custom retention, and a dedicated support channel.
Tamper-evident audit log
SHA-256 hash chaining over every audit event with periodic anchoring to S3 WORM storage. Detect any retroactive edit; export a verifiable audit trail for compliance reviews.
GDPR data subject requests
Right-to-erasure via the DSR API: delete or export all spans, evals, and feedback for a given subject ID across hot and cold storage in one call.
Custom data retention
Retain spans, eval results, and audit events beyond the standard 90 days. Retention windows are negotiated at contract time to match your compliance requirements — contact support to adjust.
Dedicated support & SLA
Named customer success engineer, shared Slack channel, and a contractual uptime SLA. Priority routing for incidents and feature requests.
Available on Enterprise plan. Contact sales →
How it works
Instrument. Observe. Act.
Instrument
Wrap your LLM client with two lines of code. Every call, chain, and tool use is traced automatically.
Observe
Every agent step appears in your trace explorer in real time with latency, token, and cost breakdowns.
Act
Automatic evals score every output. Self-healing rules retry, fallback, or escalate before the next user hits the same failure.
Start free
Get full observability and self-healing for your AI agents. Starter is free — 1M spans / month, no credit card.
Start free