Agent Governance and Trust

When an agent acts on the customer's behalf inside your product, the customer needs to scope, approve, and audit what it does. Permissions and audit stop being internal compliance — they become product features your customer evaluates you on.

Why trust is a product surface now

Before agents, your customer's relationship with your product was 1:1. One human user signed in, that user did what they intended to do, audit logs recorded who clicked what. Permissions lived in a settings page that nobody looked at unless they had to.

When an agent acts on the customer's behalf, the model breaks. The customer is now the delegator, not the actor. They need to:

Choose which capabilities the agent can use, before the agent runs.
Decide which actions need their explicit approval, during execution.
See what the agent actually did, after the fact.
Revoke the agent's access without breaking the customer's other integrations.

Each of those is a product surface — UI, API, and policy at the same time. The trust story your product tells is no longer "we're SOC 2 compliant." It's "here's how your admin governs what agents can do in our system." For a B2B SaaS at L2/L3 customer maturity, this is now part of the sales evaluation.

The three-layer scope hierarchy

The convergent pattern across Stripe, the MCP specification, and the early B2B SaaS movers is three layers of scope, applied in defense-in-depth:

Layer 1 — Organization policy

The customer's administrators decide whether agent access is enabled at all, and in which environments. Stripe's /settings/mcp dashboard is the archetype: per-account, per-environment toggles for sandbox vs. live. The toggle is intentionally separate from API key management — the org-level decision is "do we allow agents," not "do we have keys."

This layer is what enterprise buyers ask about first. If you don't have an org-level switch, you fail the procurement checklist before scope discussion begins.

Layer 2 — Credential scope

The credential the agent uses (OAuth bearer token, restricted API key, scoped PAT) determines which tools are even visible to the agent. Stripe's restricted API keys (rk_*) are the canonical pattern: "tool availability is determined by the permissions you configure on the restricted key."

Critical security requirement: tokens must be audience-bound. The MCP specification's normative use of RFC 8707 resource indicators means every token is bound to the canonical URI of the server it was issued for. A token for https://mcp.your-product.com cannot be replayed against https://mcp.competitor.com. SaaS vendors that haven't moved to audience-bound tokens have a confused-deputy vulnerability.

Layer 3 — Runtime step-up authorization

Some operations need scope the credential doesn't currently have. The MCP spec encodes this at the protocol level: a 403 Forbidden response with WWW-Authenticate: Bearer error="insufficient_scope", scope="files:write" triggers a scope-elevation flow. The agent can request the additional scope; the user is asked to consent; if approved, the operation proceeds.

This is the protocol-level encoding of "ask permission when you cross a line." It's what makes least-privilege practical for agents — you don't have to pre-authorize every possible scope at credential creation; you escalate as needed, with consent.

Confirmation as protocol, not UX polish

The MCP specification states that clients SHOULD show confirmation prompts for tool invocations. This is a normative requirement, not a design recommendation. Stripe explicitly recommends "enabling human confirmation of tools" and warns about prompt injection across composed servers. Anthropic's MCP connector ships with allowlist/denylist configs that institutionalize denylisting destructive tools by default.

For a product that exposes write capabilities to agents, the design implication is concrete:

Read tools can flow without prompts (after credential scope check).
Write tools should be confirmable per-call, with a customer-configurable threshold (e.g., "auto-approve under $X, prompt above").
Destructive tools (delete, send-to-customer, charge) should be confirmable by default and impossible to silently auto-approve.

Tool descriptions are part of the consent surface — they're what the user reads before approving. Clear, concrete, accurate descriptions are now a security property, not just developer UX.

Capability over endpoint — workflow-shaped tools

A common failure mode is exposing the REST API verbatim as MCP tools. The result is a 200-tool catalog where the agent has to make the same orchestration decisions a human integrator would — and agents are worse at orchestration than humans.

Workato names the alternative pattern "Skill — a pre-orchestrated business action … that encapsulates logic, sequencing, error handling, approvals, and security into a single callable action." The marketing is proprietary; the architectural insight is not. Stripe applies the same logic: create_payment_link is not a thin wrapper over POST /v1/payment_links. It's a curated agent-shaped capability with workflow semantics baked in.

For each capability your product exposes to agents, ask:

What's the workflow, not the endpoint? "Send a welcome email" is a workflow; POST /v1/messages is an endpoint. Expose the workflow.
What eligibility checks belong inside the tool? A cancel_subscription tool should check "is this subscription cancellable" before attempting cancellation, not surface every possible failure mode to the agent.
What approval gates are embedded? A refund_payment tool over $X should require human confirmation as part of its contract, not as an external policy layer.
What does the tool return that the agent actually needs? Not a 5 KB JSON object with 40 fields; the 3 fields the agent needs to make its next decision.

A small catalog of well-shaped tools beats a large catalog of thin wrappers. Fewer tools mean fewer orchestration decisions, fewer failure modes, fewer audit lines, and faster customer onboarding.

The admin console

Layer 8 of the product surface stack — the admin / governance console — is where the customer's administrator actually does the work. Stripe's /settings/mcp is the archetype, but the requirements are broader than what Stripe currently ships:

Enable / disable agent access per environment.
View scoped credentials currently in use, with creation date, last-used timestamp, IP scope.
Revoke any credential immediately (kill switch).
Audit what an agent has done, queryable by time range, customer, capability, status.
Configure per-tool approval thresholds and denylists.
Set spend caps and rate limits per agent identity.
Receive alerts when agents hit anomalous patterns (excessive retries, sudden volume changes).

Most early competitors don't have all of this. Braze ships the permission warning but not the kill switch + audit query model. ActiveCampaign and HubSpot's admin surfaces for MCP are thin. Customer.io's positioning is strong but the dedicated agent console isn't visible in the public docs.

This is a real wedge for differentiation. The first competitor in a category to ship a credible agent admin console becomes the procurement-safe choice for enterprise customers — independent of who has the best tool catalog.

Audit as a queryable resource — the literature gap

The MCP specification covers authorization extensively but is largely silent on audit. The Workato framing names "unified audit log" as a pillar of Enterprise MCP but doesn't specify how. Stripe's audit story is inherited from the existing Stripe dashboard. None of the early B2B SaaS movers ship documented audit-log APIs designed for agent forensics.

The questions an agent audit log must answer:

Identity — which agent, on whose behalf, with which credential, at which time?
Intent — what was the agent's stated goal? What tool calls did it plan? What did it actually do?
Outcome — what state changed? What was rolled back? What human approvals were involved?
Lineage — this tool call was triggered by this prior call, which was triggered by this user request.

Existing audit logs were built for human users; they don't capture lineage or intent well. A queryable, agent-aware audit log — exposed both as a UI in the admin console and as an API for the customer's own compliance tooling — is part of the trust surface that distinguishes deliberate competitors from tacked-on ones.

The Braze pattern — admin-side disclosure

Braze's MCP documentation is the most security-explicit of the early competitors. It includes a direct, plain-language warning to the customer's administrator:

"Only assign the API key permissions you want your agent to have. Agents may try to write data through any write permission you grant."

That sentence is doing real work. It tells the administrator:

Agents are not assumed-safe by default.
Granting a permission means granting it to the agent, not just to the trusted human team.
The administrator is the responsible party for what an agent can do.

Most competitors bury this in a footnote or assume the administrator already understands it. Treating it as primary product copy is a trust signal — and a hedge against the customer blaming you when their agent does something destructive.

Common failure modes

The MCP server with no admin console. Customers can connect; they can't see what's connected. They won't stay if they care about compliance.
API keys reused as MCP credentials. No scope difference between human and agent access. Either keys are over-scoped (security risk) or under-scoped (humans break).
Tool descriptions written like internal Jira tickets. "Updates customer record using upsert semantics" — useless as a consent affordance. Tool descriptions are read by the user before they consent.
No revocation path. Customers can grant access; they can't revoke without rotating their main API key, which breaks everything else.
Audit log indistinguishable from human audit log. A row that says "API call to POST /v1/messages" doesn't tell you human-vs-agent or the lineage.

A governance diagnostic

For each capability your product exposes to agents:

Scope — is access gated at org level (enable/disable per environment) AND credential level (scoped tokens, not full API keys) AND runtime level (step-up scope for sensitive operations)?
Consent — does the tool description, as the user sees it before approving, accurately describe what the tool will do?
Confirmation — for write and destructive operations, does the protocol surface a confirmation request? Is the customer's admin able to configure thresholds?
Audit — can the customer query "what did this agent do, on whose behalf, in this window"? Can they filter by capability, status, value at risk?
Revocation — can the customer revoke agent access in seconds without breaking their other integrations?

If you answer "no" or "partial" on more than one, the trust surface isn't yet a product feature — it's a known liability with a launch date attached.

← Back to AI-Native Product Strategy · Agent-Ready Documentation · Agent Operational Discipline