Your agent had production credentials.
You can't prove what it did with them.

When your auditor asks who authorized that database write, application logs don't answer. Diplomat writes a tamper-proof receipt the moment the action happens, before it reaches your tool.

76% of tool calls unguarded across 16 open-source agent repos 1,992 Python + 11,379 TS files benchmarked CERTFR-2026-ACT-016 Apache 2.0

Ticket TK-2847 - 3 agent actions evaluated
read_customer_profile
CRM — Salesforce
0.4msCONTINUE
cancel_subscription
Billing — Stripe
0.6msREVIEW
export_customer_pii
S3 Bucket (unverified)
0.3msSTOP

When the agent had the credentials, and nobody had the trail.

These are not hypothetical scenarios. Both were publicly documented. Both share the same root cause: a tool call executed before anyone asked whether it was authorized.

March 2026 The Block · Axios
ROME — Alibaba Research Agent
A coding agent created a reverse SSH tunnel to an external IP and redirected GPU resources to crypto mining. No prompt asked for it. Outside the intended sandbox. Without any instruction.
diplomat-gate would have stopped it:
create_ssh_tunnel(target="external_ip") STOP 0.4ms
Reason: outbound network call outside policy scope
April 2026 DEV Community
PocketOS — Coding Agent in Cursor
A coding agent deleted an entire production database in 9 seconds. Every table. Every backup. Recovery was only possible from a 3-month-old snapshot. The agent executed exactly what it was asked to do — no human had defined whether it was authorized.
diplomat-gate would have flagged it:
delete_database(target="production") REVIEW 0.3ms
Reason: irreversible action — human approval required by policy
Both incidents share the same root cause: a tool call executed before anyone asked whether it was authorized. That missing question is what Diplomat answers. — sub-millisecond, deterministic, no LLM in the path.

See what Diplomat finds in your stack.

Select your framework and what your agent does. Get the typical risk profile from our benchmark - zero upload, zero network call, everything is pre-calculated.

Risk profile report

For a LangChain (Python) agent doing agent that pays via stripe / processes payments, here's what we found in production code:

214
tool calls in this category
79%
unguarded
16%
partial guards
5%
fully governed

Typical patterns found:

process_refund
@tool
def process_refund(charge_id: str) -> str:
    return stripe.Refund.create(charge=charge_id)
    # No: amount cap, duplicate check, authorization
create_subscription
@tool
def create_subscription(customer_id: str, plan: str) -> dict:
    return stripe.Subscription.create(customer=customer_id, items=[{'price': plan}])
    # No: plan validation, spending limit
update_payment_method
@tool
def update_payment_method(customer_id: str, pm_id: str) -> dict:
    return stripe.Customer.modify(customer_id, invoice_settings={'default_payment_method': pm_id})
    # No: ownership verification

Run `diplomat-agent scan` on your real code to see your own numbers.

Talk to Josselin about your case ->

Platform vendors will govern their own agents.
Nobody governs the agent your team built.

Stripe agents

Stripe governs Stripe. SLAs, audit logs, role-based access - all native to the platform.

Not your problem.
Salesforce agents

Salesforce governs Salesforce. Einstein Trust Layer, action approvals, lineage. Same story.

Not your problem.
Your custom agents

Your team built an agent that calls all three. Who governs that one?

Your problem. Diplomat solves it.

Diplomat sits in your agent's process. Zero network calls. Decision in < 1 ms. Receipt written to a hash chain you own.

Every decision is a cryptographic proof.

Each verdict generates an immutable receipt - action, policy, outcome, timestamp. Hash-chained so modifying one breaks the entire trail. No LLM in the path. Decision in <1 ms. Receipt written to a hash chain you own.

<1 ms
evaluation latency
0
LLM calls required
100%
deterministic
Immutable Action Receipt
Receipt ID rc-20260302-094809-TK2847-003
Action export_customer_pii
Target S3 Bucket (unverified)
Verdict STOP
Evaluation 0.3ms
Executed false
Hash sha256:9f3a...7c2d
Previous sha256:8b1e...4a9f

From bottleneck to baseline.

100%
of tool calls evaluated
Manual review of every agent action
Automated verdict in <1 ms - no human in the loop for safe actions
<2 min
incident investigation
No proof when something goes wrong
Hash-chained receipts - every decision is a cryptographic fact
1 line
to integrate
Months of custom governance logic
One SDK integration line. Safe actions proceed instantly.

Three products. One stack. One purpose.

Know what your agents can do before they do it. Govern what they're allowed to do at runtime. Prove what they did after the fact.

diplomat-agent and diplomat-gate are Apache 2.0 and self-sufficient. You can run them forever without us. diplomat.run is what you need the day your auditor, your procurement team, or your board asks for cross-tenant evidence — not the technology, the paperwork.

diplomat-agent Know Static AST scan. Pre-deploy. Apache 2.0. GitHub ->
diplomat-gate Decide Runtime enforcement. <1 ms. Apache 2.0. GitHub ->
diplomat.run Prove Hosted audit, dashboard, EU AI Act export.

What Diplomat is not

Not an agent framework
Your agents already know how to act. We govern whether they should.
Not an observability dashboard
Datadog tells you what happened. Diplomat decides what's allowed to happen.
Not a policy engine
OPA and Cedar return allow or deny. Diplomat returns a verdict, an explanation, and an immutable receipt. The difference is accountability, not just access control.
Architecture
Agent
Diplomat
Tool
intercept -> evaluate -> verdict -> receipt

Try it now - open source.

diplomat-agent runs locally. Apache 2.0. Zero dependencies. Reads your Python or TypeScript repo, maps every side-effecting tool call, tells you which ones have no guards. No data leaves your machine.

pip install diplomat-agent
diplomat-agent scan .

What you want to know before going further

Is Diplomat a SaaS or a library I install?
Both. diplomat-gate is an open-source Python library that runs inside your agent's process - zero network calls, sub-millisecond decisions. diplomat.run is the hosted control plane on top of it: cross-tenant audit, dashboard, compliance exports. You can run gate alone forever without diplomat.run if you want. The hosted plane is the value-add for teams that need cross-team visibility or EU AI Act Article 12 exports.
How is this different from Guardrails AI, HumanLayer, or NeMo Guardrails?
Guardrails AI validates LLM outputs (content shape, profanity, PII redaction). HumanLayer routes human approvals over network. NeMo focuses on conversational safety. Diplomat is at a different layer: it intercepts tool calls - the actions your agent takes against external systems (DB writes, payments, emails, shell commands) - and decides if they proceed, get reviewed, or stop. Sub-millisecond, deterministic, no LLM in the path. Most teams end up running guardrails AND Diplomat.
What languages and frameworks do you support?
Python (LangChain, LangGraph, OpenAI SDK, Anthropic SDK, custom code) and TypeScript (Vercel AI SDK, OpenAI Agents JS, Mastra, custom code). Other languages: not yet. The integration is one line - a decorator on your tool function, or a wrapper around your agent's tool invocation.
Does my code leave my machine when I scan?
No. diplomat-agent (scanner) runs locally - static AST analysis, no network. diplomat-gate (runtime) runs inside your agent process - no network. Only diplomat.run (hosted) receives data, and only what you explicitly push: receipts, metadata, never source code.
We've already built our own approval system. Why switch?
You probably haven't. Most "approval systems" are ad-hoc Slack notifications wrapped around a single tool. Diplomat gives you: policy-as-code (one place for all rules), hash-chained receipts (audit-grade proof), sub-millisecond decisions (no Slack roundtrip for safe actions), and review queues (for actions that need a human). If you've built all of that already, you should sell it. If you've built less, that's the gap.
We're a young startup. Why should we trust a company founded in 2026?
Three reasons. (1) The core libraries are Apache 2.0 - your installation survives us. (2) The benchmark is public and reproducible - 16 Python repos and 3 TypeScript repos, every commit pinned. (3) Hash chains have no vendor lock-in - if we disappear, your audit trail keeps working. The risk profile of using us is closer to using a Python library than to using a SaaS.

Your agent is in production. The question is whether you can prove what it did.

Run the scanner on your repo, or talk to Josselin for 30 minutes.

EU AI Act Article 12 · DORA · Article 26 deployers · CERTFR-2026-ACT-016 · 1 design partner in production · 2 in integration