Autonomy you can put your name on.

Every action an agent takes carries its reasoning, its confidence, and the evidence behind it — and a person reviews anything that can't be undone. So when a surveyor, a client, or your board asks how a file was decided, the answer is already assembled.

Traceable by design

Every decision shows its work.

Agent decision

every action · inspectable

Confidence

96%

  • ActionVerify RN license · TX board
  • Source evidenceboard result · captured 9:41 linked
  • ReasoningActive, unencumbered, matches file
  • Policy gatepassed · read-only lookup ok
  • Supervisor reviewAI supervisor · consistent ok
written to a durable audit trail · reproducible on demand

One-way doors

The irreversible gets heightened scrutiny.

Reversible actions proceed. Anything that can't be undone or changes external state escalates: the primary agent is reviewed by an AI supervisor with a lower threshold for flagging, and a human gives the final approval.

Two-way door

Reversible

Draft a message, save a note, read a portal. Easily undone — the agent proceeds.

proceeds automatically

One-way door

Irreversible or external

Submit to a facility, send to a board. Heightened scrutiny.

Primary agent proposes
AI supervisor reviews
Lower flag threshold
Human approves

Human oversight

Built for people to review fast and intervene.

A purpose-built review surface presents each item with its evidence and a recommendation, so a decision takes seconds. Approve, correct, redirect — or pause the whole operation and hand it back to a person.

Review queue

3 waiting
  • Flagged background check

    evidence + recommendation attached

    approved
  • Low-confidence document read

    handwritten card · 61%

  • License discrepancy

    name mismatch across sources

ApproveCorrectRedirectPause operation

Injection & abuse defense

We assume hostile input will show up.

The agents read messages, documents, and external sources — so content is treated as data, never instruction, and the blast radius of anything misunderstood is kept small.

Direct prompt injection

Instruction hierarchy, relevance classification, and tool mediation stop a message from ever becoming authority.

Indirect injection

Documents and external text are labeled untrusted and quoted as evidence — never followed as instructions.

Data extraction

Role-bound retrieval, tenant isolation, and redaction keep data inside the user's authority.

Excessive agency

Agents get only the granular tools a task needs; high-impact and irreversible writes require extra checks.

Knowledge pollution

Uploaded knowledge passes source tracking, versioning, and review before it can become agent context.

Unbounded consumption

Request budgets, rate limits, and anomaly detection protect spend and availability.

And people, too

A human team reviews the agents' work — sampling live operations, catching drift, and keeping the system compliant.

We built this as an applied research system: useful autonomy, constrained by software boundaries, evals, and human review.

How we built it →

Questions

How is safety built into every LLM call?

Every meaningful output carries a confidence score, a reasoning summary, the source evidence it drew on, the policy that gated it, and any supervisor or human review — all written to a durable, reproducible audit trail. Nothing the agents do is a black box a person can't inspect after the fact.

How do you handle irreversible actions differently?

We distinguish two-way doors (reversible — draft a message, read a portal) from one-way doors (irreversible or externally visible — submit to a facility, send to a board). One-way doors get heightened scrutiny: the primary agent is reviewed by an AI supervisor with a lower threshold for flagging, and a human gives final approval before anything executes.

What kinds of decisions require human review?

By default: flagged background checks, low-confidence document reads, discrepancies between sources, and irreversible or externally visible actions. Beyond that, you can place custom checkpoints anywhere — by facility, requirement type, or risk level. A human team also samples live work to catch drift and keep the system compliant.

How do you defend against prompt injection?

Content the agents read — messages, documents, external pages — is treated as data, never instruction. An instruction hierarchy, relevance classification, tenant isolation, and complete tool mediation keep untrusted input from becoming authority, and least-privilege tools keep the blast radius small. See How we built it for the full model.

Does review create a bottleneck?

No. Items arrive pre-assembled with evidence and a recommendation, so most decisions take seconds — and the rest of the file keeps moving in parallel while an item waits.

See it work a real file.

Thirty minutes, one placement, worked live — start to submit-ready.