Decision Control Plane · EU AI Act Article 12 · DORA · Article 26 deployers

Your agent had production credentials.
You can't prove what it did with them.

Q: Is Diplomat a SaaS or a library I install?

Both. diplomat-gate is an open-source Python library that runs inside your agent's process - zero network calls, sub-millisecond decisions. diplomat.run is the hosted control plane on top of it: cross-tenant audit, dashboard, compliance exports. You can run gate alone forever without diplomat.run if you want. The hosted plane is the value-add for teams that need cross-team visibility or EU AI Act Article 12 exports.

Q: Does my code leave my machine when I scan?

No. diplomat-agent (scanner) runs locally - static AST analysis, no network. diplomat-gate (runtime) runs inside your agent process - no network. Only diplomat.run (hosted) receives data, and only what you explicitly push: receipts, metadata, never source code.

When your auditor asks who authorized that database write, application logs don't answer. Diplomat writes a tamper-proof receipt the moment the action happens, before it reaches your tool.

See what Diplomat finds in your stack → Talk to Josselin (30 min) →

76% of tool calls unguarded across 16 open-source agent repos 1,992 Python + 11,379 TS files benchmarked CERTFR-2026-ACT-016 Apache 2.0

Ticket TK-2847 - 3 agent actions evaluated

read_customer_profile

→ CRM — Salesforce

0.4msCONTINUE

cancel_subscription

→ Billing — Stripe

0.6msREVIEW

export_customer_pii

→ S3 Bucket (unverified)

0.3msSTOP

Documented in 2025—2026

When the agent had the credentials, and nobody had the trail.

These are not hypothetical scenarios. Both were publicly documented. Both share the same root cause: a tool call executed before anyone asked whether it was authorized.

March 2026 The Block · Axios

ROME — Alibaba Research Agent

A coding agent created a reverse SSH tunnel to an external IP and redirected GPU resources to crypto mining. No prompt asked for it. Outside the intended sandbox. Without any instruction.

diplomat-gate would have stopped it:

create_ssh_tunnel(target="external_ip") → STOP 0.4ms

Reason: outbound network call outside policy scope

April 2026 DEV Community

PocketOS — Coding Agent in Cursor

A coding agent deleted an entire production database in 9 seconds. Every table. Every backup. Recovery was only possible from a 3-month-old snapshot. The agent executed exactly what it was asked to do — no human had defined whether it was authorized.

diplomat-gate would have flagged it:

delete_database(target="production") → REVIEW 0.3ms

Reason: irreversible action — human approval required by policy

Both incidents share the same root cause: a tool call executed before anyone asked whether it was authorized. That missing question is what Diplomat answers. — sub-millisecond, deterministic, no LLM in the path.

Try it now

See what Diplomat finds in your stack.

Select your framework and what your agent does. Get the typical risk profile from our benchmark - zero upload, zero network call, everything is pre-calculated.

Risk profile report

For a LangChain (Python) agent doing agent that pays via stripe / processes payments, here's what we found in production code:

214

tool calls in this category

79%

unguarded

16%

partial guards

5%

fully governed

Typical patterns found:

process_refund

@tool
def process_refund(charge_id: str) -> str:
    return stripe.Refund.create(charge=charge_id)
    # No: amount cap, duplicate check, authorization

create_subscription

@tool
def create_subscription(customer_id: str, plan: str) -> dict:
    return stripe.Subscription.create(customer=customer_id, items=[{'price': plan}])
    # No: plan validation, spending limit

update_payment_method

@tool
def update_payment_method(customer_id: str, pm_id: str) -> dict:
    return stripe.Customer.modify(customer_id, invoice_settings={'default_payment_method': pm_id})
    # No: ownership verification

Run `diplomat-agent scan` on your real code to see your own numbers.

Talk to Josselin about your case ->

Why Diplomat exists

Platform vendors will govern their own agents.
Nobody governs the agent your team built.

Stripe agents

Stripe governs Stripe. SLAs, audit logs, role-based access - all native to the platform.

Not your problem.

Salesforce agents

Salesforce governs Salesforce. Einstein Trust Layer, action approvals, lineage. Same story.

Not your problem.

Your custom agents

Your team built an agent that calls all three. Who governs that one?

Your problem. Diplomat solves it.

Diplomat sits in your agent's process. Zero network calls. Decision in < 1 ms. Receipt written to a hash chain you own.

System of Record

Every decision is a cryptographic proof.

Each verdict generates an immutable receipt - action, policy, outcome, timestamp. Hash-chained so modifying one breaks the entire trail. No LLM in the path. Decision in <1 ms. Receipt written to a hash chain you own.

<1 ms

evaluation latency

LLM calls required

100%

deterministic

Immutable Action Receipt

Receipt ID rc-20260302-094809-TK2847-003

Action export_customer_pii

Target S3 Bucket (unverified)

Verdict STOP

Evaluation 0.3ms

Executed false

Hash sha256:9f3a...7c2d

Previous sha256:8b1e...4a9f

What changes

From bottleneck to baseline.

100%

of tool calls evaluated

Manual review of every agent action

Automated verdict in <1 ms - no human in the loop for safe actions

<2 min

incident investigation

No proof when something goes wrong

Hash-chained receipts - every decision is a cryptographic fact

1 line

to integrate

Months of custom governance logic

One SDK integration line. Safe actions proceed instantly.

The Stack

Three products. One stack. One purpose.

Know what your agents can do before they do it. Govern what they're allowed to do at runtime. Prove what they did after the fact.

diplomat-agent and diplomat-gate are Apache 2.0 and self-sufficient. You can run them forever without us. diplomat.run is what you need the day your auditor, your procurement team, or your board asks for cross-tenant evidence — not the technology, the paperwork.

diplomat-agent Know Static AST scan. Pre-deploy. Apache 2.0. GitHub ->

diplomat-gate Decide Runtime enforcement. <1 ms. Apache 2.0. GitHub ->

diplomat.run Prove Hosted audit, dashboard, EU AI Act export.

What Diplomat is not

Not an agent framework

Your agents already know how to act. We govern whether they should.

Not an observability dashboard

Datadog tells you what happened. Diplomat decides what's allowed to happen.

Not a policy engine

OPA and Cedar return allow or deny. Diplomat returns a verdict, an explanation, and an immutable receipt. The difference is accountability, not just access control.

Architecture

Agent

Diplomat

Tool

intercept -> evaluate -> verdict -> receipt

Open Source

Try it now - open source.

diplomat-agent runs locally. Apache 2.0. Zero dependencies. Reads your Python or TypeScript repo, maps every side-effecting tool call, tells you which ones have no guards. No data leaves your machine.

pip install diplomat-agent
diplomat-agent scan .

Python - View on GitHub -> PyPI -> npm ->

Questions

What you want to know before going further

Is Diplomat a SaaS or a library I install?

Both. diplomat-gate is an open-source Python library that runs inside your agent's process - zero network calls, sub-millisecond decisions. diplomat.run is the hosted control plane on top of it: cross-tenant audit, dashboard, compliance exports. You can run gate alone forever without diplomat.run if you want. The hosted plane is the value-add for teams that need cross-team visibility or EU AI Act Article 12 exports.

How is this different from Guardrails AI, HumanLayer, or NeMo Guardrails?

Guardrails AI validates LLM outputs (content shape, profanity, PII redaction). HumanLayer routes human approvals over network. NeMo focuses on conversational safety. Diplomat is at a different layer: it intercepts tool calls - the actions your agent takes against external systems (DB writes, payments, emails, shell commands) - and decides if they proceed, get reviewed, or stop. Sub-millisecond, deterministic, no LLM in the path. Most teams end up running guardrails AND Diplomat.

What languages and frameworks do you support?

Python (LangChain, LangGraph, OpenAI SDK, Anthropic SDK, custom code) and TypeScript (Vercel AI SDK, OpenAI Agents JS, Mastra, custom code). Other languages: not yet. The integration is one line - a decorator on your tool function, or a wrapper around your agent's tool invocation.

Does my code leave my machine when I scan?

No. diplomat-agent (scanner) runs locally - static AST analysis, no network. diplomat-gate (runtime) runs inside your agent process - no network. Only diplomat.run (hosted) receives data, and only what you explicitly push: receipts, metadata, never source code.

We've already built our own approval system. Why switch?

You probably haven't. Most "approval systems" are ad-hoc Slack notifications wrapped around a single tool. Diplomat gives you: policy-as-code (one place for all rules), hash-chained receipts (audit-grade proof), sub-millisecond decisions (no Slack roundtrip for safe actions), and review queues (for actions that need a human). If you've built all of that already, you should sell it. If you've built less, that's the gap.

We're a young startup. Why should we trust a company founded in 2026?

Three reasons. (1) The core libraries are Apache 2.0 - your installation survives us. (2) The benchmark is public and reproducible - 16 Python repos and 3 TypeScript repos, every commit pinned. (3) Hash chains have no vendor lock-in - if we disappear, your audit trail keeps working. The risk profile of using us is closer to using a Python library than to using a SaaS.

Your agent is in production. The question is whether you can prove what it did.

Run the scanner on your repo, or talk to Josselin for 30 minutes.

Talk to Josselin → pip install diplomat-agent

EU AI Act Article 12 · DORA · Article 26 deployers · CERTFR-2026-ACT-016 · 1 design partner in production · 2 in integration

Your agent had production credentials. You can't prove what it did with them.