Use case

Document Extraction Agents

Document extraction pipelines fail silently. The invoice total is wrong. The vendor tax ID was dropped. The type validation failed and the pipeline reported success. You find out three reconciliation cycles later — or when the auditor asks.

Where things go wrong

Wrong field extracted (subtotal vs. total-due)

Your extraction agent reads the subtotal line instead of the total-due line and writes the wrong amount to your ERP. The downstream accounts payable run is now short. Finance does not know it yet.

AP reconciliation failure discovered weeks later; correcting it requires a manual audit of every invoice processed in the window.

Dropped field due to template variation

A vendor uses a non-standard layout — the tax ID field is labeled differently than the template the model was trained on. The field is dropped from the output. The redaction pass runs on an incomplete record and the PII lands in the CRM anyway.

PII compliance exposure; the record is in a system it should not be in, with no audit trail of how it got there.

Silent type mismatch

An amount field is extracted as a string instead of a float. The downstream type validation fails, the error is swallowed, and the pipeline reports success. The record is written. Nobody finds out until the next reconciliation run surfaces an unexplained variance.

Data integrity failure that is invisible in the pipeline logs; correcting it requires tracing back through every record written in the affected window.

Confidence without completeness

The model extracts 11 of 12 required fields and returns a clean, structured output with no indication that a field is missing. The downstream system accepts the partial record. The missing field causes a failure two steps later, attributed to the wrong service.

Cascading failures misattributed to downstream systems; root cause is invisible without per-span eval scoring on extraction completeness.

Eval + control loop

What happens when a rule fires

Document Extraction Agents control loop: original span scores pii_leakage 0.61 — detected, triggering human review — awaiting review.STEP 1Original spanarrivedSTEP 2Eval firespii_leakage 0.61 — detectedSTEP 3Human reviewnext call on the same failure pathSTEP 4Human queueAwaiting review

The response

How TruLayer closes the loop

  • PII Leakage
  • Function Call
  • Faithfulness

For document extraction pipelines, the evaluators that matter are PII leakage (did a protected field appear in an output it shouldn’t reach?), tool-call correctness (did the extraction agent call the write-to-ERP action with the correct parameters and field types?), and faithfulness (does the extracted output match the source document — specifically, did the model pull from the right field?). All 25 evaluators run inline on every span as each trace arrives. You don’t configure a nightly batch job and wait; every extraction is scored the moment it completes.

When an eval rule fires — for example, a PII leakage detection on an extraction output — the control loop acts before the record propagates downstream. The action choices are: retry with a prompt that targets the specific failing field; fall back to a stricter extraction template or a more conservative model; or route the record to a human review queue for manual verification before the write proceeds. The HITL escalation path is the key mechanism for compliance-sensitive pipelines: a human reviews the flagged extraction before it touches the downstream system, not after the fact.

TruLayer’s per-trace before/after deltas surface exactly where the eval score dropped and what changed after the remediation action. For extraction pipelines where a single bad record can contaminate a reconciliation cycle, knowing which span produced the failure — and whether the retry resolved it — is the difference between a one-record fix and a full audit. The trace shows you both.

See it in practice

Instrument your document extraction agent in two lines.

Wrap your LLM client. Every span from this trace is captured and scored by every built-in evaluator. Eval rules and control-loop actions are configured in the dashboard.

agent.ts
import { TruLayer } from '@trulayer/sdk'
import OpenAI from 'openai'

const tl = new TruLayer({ apiKey: process.env.TRULAYER_API_KEY })
const openai = tl.instrument(new OpenAI())

// Every span from this client is captured, scored by all 25
// built-in evaluators, and surfaced in the extraction project.
// Eval rules + control-loop actions are configured in the dashboard,
// not in your application code.

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: task }],
})

Ship reliable document extraction agents.

Free tier includes 1M spans / month · No credit card