Provenwright MCP Gateway Readiness Audit

Sample report · 2026-06-09

Sample MCP Gateway Readiness Audit

An illustrative, anonymized sample of the report you receive — based on a real, read-only assessment of a public production LLM/agent gateway (a mid-size SaaS platform running multiple MCP servers across multiple teams).

Production-Ready with caveats

4 Green

3 Yellow

0 Red

Download as PDF

Executive Summary

Verdict: Production-Ready with caveats

For the gateway as written, assuming an operator who turns on the controls the platform provides.

This gateway is architecturally sound on the dimensions that matter most for safety. Authorization is enforced at the gateway on caller identity — resolved as a strict intersection of key, team, end-user, agent, and org permissions — and the model never decides what it may call (no authorization is expressed in prompts). That single property is what defeats prompt-injection-to-tool-call. Secrets are referenced, never inlined. Identity is JWT/OIDC-rooted on the call path and propagates end-to-end. MCP tokens use OAuth token-exchange (RFC 8693) with audience and scope binding, matching the MCP 2025-06-18 resource-server model. Cost is genuinely bounded — budget overruns raise an enforced exception rather than merely alerting.

The central risk is not a missing control but a defaulting one. Per-tool least-privilege and third-party-server pinning are opt-in, and one top-level authorization resolver path fails open on an unexpected exception. The single highest-leverage move is to flip those defaults to fail-closed and required-at-onboarding.

The two highest-leverage moves

Change the one fail-open authorization line

So the resolver returns "no servers" (not "all allow-all servers") on an unexpected error, and add a regression test. Effort: small. This removes the only fail-open path in an otherwise fail-closed authorization resolver.

Pin and allowlist the third-party MCP catalog

By version + digest, closing the floating-tag supply-chain exposure. Effort: medium.

Scope & Methodology

We assess 7 dimensions of MCP gateway readiness through read-only inspection — source code, configuration, CI workflows, and deploy artifacts at a pinned commit. Every finding cites a specific artifact (file and line) captured in an evidence index, and every gap-matrix color traces to at least one cited row.

Where a control is present in code but operator-configured (budgets, rate limits, allowlists, guardrails, IDP), we assess the code-level default and flag the dependency explicitly; where a live check could not run (fault injection, real trace pull), we mark it static-only and report the limitation rather than implying coverage we don't have.

Read-only inspection

Source code, config, CI workflows, and deploy artifacts at a pinned commit. No write access, no system modification.

Evidence-backed findings

Every finding cites a specific artifact (file and line). Every gap-matrix color traces to at least one evidence-index row.

Stated limitations

Live checks that could not run (fault injection, real trace pull) are marked static-only. We report the limitation rather than implying coverage we don't have.

Scored Gap Matrix

All 7 dimensions at a glance. Status colors trace directly to the per-dimension findings in Section 4.

#	Dimension	Status	Finding	Severity
01	Tool-access governance & RBAC	Partial	Strong gateway-enforced intersection RBAC — but per-tool least-privilege is opt-in.	High
02	Fail-close vs fail-open	Partial	Per-level authz failures fail closed; one top-level resolver path fails open to allow-all servers.	Critical
03	MCP / agent onboarding flow	Partial	Dual source-of-truth (declarative config and runtime DB); third-party servers unpinned; no tool-PR CI gate.	High
04	Observability & tracing	Pass	First-class OpenTelemetry + GenAI semconv; W3C trace-context extracted (propagation operator-configurable).	Medium
05	Multi-LLM routing & cost controls	Pass	Declarative routing; budget caps enforced (not alert-only); rate limits per key/model/MCP server.	Low
06	Security, secrets & identity (IDP)	Pass	Zero inline secrets; JWT/OIDC on call path; end-user identity propagates; OAuth token-exchange with audience+scope.	Low
07	Production-readiness gaps	Partial	Block-by-default guardrail, rate-limit 429s, alerting present — but no single tested global kill-switch or enforced canary.	Medium

Legend: Production-grade = control exists, is enforced, and is verifiable. Partial = intent exists but has gaps — opt-in enforcement, manual steps, or no audit trail; known, bounded risk. Absent/unsafe = no effective control, or the control fails open.

Per-Dimension Findings

What we looked at

Where the authorization decision lives, whether the model ever participates in it, and how granular the grants are.

What we found

Authorization is enforced at the gateway on caller identity, resolved as a strict intersection of key → team → end-user → agent → org permissions, with org as a ceiling. A full-tree search found no in-prompt tool gating — no "only call this if the user is an admin" logic in any system prompt. Deny-by-default holds for unmapped callers. Per-tool granularity exists, but it is opt-in per server: with no allowlist configured, the tool-permission check returns "allow."

Why it matters

The hard part — keeping the model out of the authorization decision — is done correctly. The residual risk is operator misconfiguration: a write or external tool on a server with no tool allowlist is callable by anyone with server access.

What we looked at

Every exception handler on the authorization and call path — does a degraded check deny or allow?

What we found

Every per-level permission resolver fails closed: on an unexpected exception it logs and returns an empty set, resolving to "no access" downstream. One exception: the top-level policy resolver returns the set of allow-all servers (not an empty set) on an unexpected error — a partial fail-open, bounded to servers an operator already marked public. Every upstream call carries an explicit timeout; a per-tool parameter allowlist rejects unexpected arguments; and concurrency/rate limits shed load with HTTP 429.

Why it matters

Fail-open in an authorization resolver is the single highest-risk class in the framework — it's how a degraded check silently becomes "allow." Here it is bounded, but it is still the one line that errs toward exposure. It is also a one-line fix.

What we looked at

How servers and tools are registered, whether the running tool set is reconstructable from source control, and whether onboarding enforces governance.

What we found

A declarative config block exists, but MCP servers can also be created at runtime via an authenticated REST endpoint that writes to a database — a dual, mutable source of truth where running state can drift from source control. The curated third-party catalog references stdio servers via unpinned floating-tag commands (e.g. npx -y @vendor/mcp-server) with no version, digest, or checksum. CI is otherwise strong (schema-sync, CodeQL, supply-chain scorecard, unit tests) but has no required-field gate on a tool-registration PR.

Why it matters

You cannot fully reconstruct the live tool set from source control when servers can be added via the API, and an unpinned third-party server is a tampered-package away from a supply-chain incident (OWASP LLM05). Onboarding is the natural enforcement point for the controls that are bolt-on today.

What we looked at

Whether per-model/token/cost attribution and end-to-end request reconstruction are possible, and whether trace context survives hops.

What we found

OpenTelemetry is a first-class integration with dedicated GenAI semantic-convention mapping (operation name, token usage, cache metrics), multiple exporters, and inbound W3C traceparent extraction.

Why it matters

The building blocks for end-to-end reconstruction and per-team cost attribution are present and standards-aligned. Caveat: context propagation into every MCP hop is operator-configurable (not guaranteed by default), and no live trace could be pulled in a read-only assessment — so "reconstruct one real request" is noted as unverified.

What we looked at

Whether routing is a declarative policy and whether cost is enforced or merely alerted.

What we found

Routing is a central declarative mapping from virtual model names to physical deployments, with router-level retries and timeouts. Budget caps are enforced — overruns raise a budget-exceeded exception, not just an alert. Rate limits (rpm/tpm) are expressible per key, per model, and per MCP server.

Why it matters

This is the dimension the platform is purpose-built for, and it shows — there is a real, enforced path against bill-shock and cost-based DoS (OWASP LLM10).

What we looked at

Whether secrets are inlined, where identity is rooted, and whether the MCP token model matches the spec.

What we found

No inline secret values — the only key-shaped hits are docstring API examples; real config uses environment references. Identity is JWT/OIDC-rooted on the gateway call path (not just dashboard login), and the end-user identity propagates through to MCP handling and spend logs rather than collapsing to one shared credential. MCP tokens use OAuth token-exchange (RFC 8693) with audience and scope binding. A deliberate policy avoids leaking caller bearer tokens upstream.

Why it matters

This is the strongest dimension — the secure defaults are in the code, and the token model is the one the MCP spec asks for. Who-can-do-what is answerable and revocable at the IDP.

What we looked at

Guardrails, alerting, the ability to stop a misbehaving tool, and staged-rollout discipline.

What we found

A dedicated MCP security guardrail defaults to block (not alert), alongside a catalog of injection/jailbreak guardrails and MCP-specific permission guardrails. Slack/email/Prometheus alerting (including hanging-request detection) ships. Tools and servers can be disabled via config without a binary redeploy. Helm chart, hardened compose, and Terraform ship for staged deploy.

Why it matters

The operational levers exist, so an incident is recoverable — but with friction. There is no single tested global kill-switch surfaced in code, no enforced canary/blue-green, and the red-team guardrails are available but operator-wired, not a standing CI gate.

Verdict & Top Risks

Top risks — ranked by impact

Authorization resolver returns the allow-all set on an unexpected error — a bounded fail-open path

2 Low High

Unpinned floating-tag third-party MCP servers — supply-chain exposure (OWASP LLM05)

3 Medium High

Per-tool least-privilege off by default — a write/external tool callable by any key with server access

1 Medium Medium

Running tool set can drift from source control via runtime DB writes

3 Medium Medium

No single tested global kill-switch / no enforced staged rollout

7 Medium Medium

90-Day Roadmap

A sequenced fix plan derived directly from the gap matrix, ordered by risk-reduction-per-effort. Each item traces to a yellow finding and its cited evidence.

Return an empty set (deny) instead of the allow-all set on an unexpected exception; add a regression test that asserts deny-on-error. The authorization resolver fails closed everywhere, with a standing test preventing regression.

Replace every floating-tag command with a version + digest pin and allowlist the pinned set. No floating-tag MCP server can resolve; the third-party set is reproducible and tamper-evident.

Require a per-server tool allowlist so the permission check denies absent an explicit grant; gate the "allow all keys" foot-gun behind an audited override. A write/external tool is callable only by identities with an explicit grant.

Fail any tool/server PR unless tool allowlist, RBAC grant, rate limit, and pinned source are present; resolve the dual source-of-truth so running state cannot drift. Governance is enforced at onboarding with a full audit trail.

Disable a tool, a server, or the whole MCP gateway without a redeploy, plus an enforced staged rollout (canary/blue-green) with a tested one-command rollback. A misbehaving tool or the whole gateway can be stopped in seconds, verified by test.

Add a refusal-rate threshold on every tool/server PR. Injection, jailbreak, and tool-misuse regressions are caught before merge, not in production.

Net effect: Phase 1 removes the only fail-open path and the floating supply chain. Phase 2 makes least-privilege and governance default-on. Phase 3 makes readiness durable with a tested stop lever and a standing safety gate.

What You Receive

A fixed-scope, read-only engagement delivering four artifacts designed to be read together, plus a live walkthrough.

The full narrative: executive summary, scope, methodology, per-dimension findings, verdict, top risks, and prioritized recommendations.

The one-screen view: all 7 dimensions with status, one-line finding, severity, effort, and a pointer to the roadmap item that closes each gap.

The sequenced fix plan, phased by risk-reduction-per-effort, with owners, effort, expected outcomes, and dependencies. Doubles as the scoping input for a fixed-price implementation engagement.

Every finding cited to a specific artifact (file and line), with the observed behavior, the rubric line it matches, its color, and its severity. Every gap-matrix color traces back to at least one row here.

Plus a live review session

A walkthrough of the findings and roadmap with your engineering and security stakeholders, so the report lands as a shared plan, not a PDF on a shelf.

Free open-source tool

Want a free first read of your own stack?

This sample is a paid audit deliverable — specific findings, evidence index, cited roadmap. Before that, run the open-source scanner for a free heuristic read across all 7 dimensions. It's read-only, takes seconds, and tells you exactly where to look.

Get the free scanner Book an audit →

Run it on your repo

npx mcp-gateway-scan ./your-gateway

100% read-only · no network calls · secret values redacted (location only)

Ready to see your stack's real scores?

Book a 15-minute discovery call. We map your stack and scope a fixed-price audit — you decide whether to proceed with the number in front of you. No deck, no obligation.

Book your audit Email [email protected]

Fixed scope · Fixed price · Written scope statement before any audit work begins

Sample MCP Gateway Readiness Audit

Executive Summary

Scope & Methodology

Scored Gap Matrix

Per-Dimension Findings

Dimension 01 — Tool-Access Governance & RBAC

Dimension 02 — Fail-Close vs Fail-Open

Dimension 03 — MCP / Agent Onboarding Flow

Dimension 04 — Observability & Tracing

Dimension 05 — Multi-LLM Routing & Cost Controls

Dimension 06 — Security, Secrets & Identity (IDP)

Dimension 07 — Production-Readiness Gaps

Verdict & Top Risks

90-Day Roadmap

What You Receive

Readiness Report

Scored Gap Matrix

90-Day Roadmap

Evidence Index

Want a free first read of your own stack?

Ready to see your stack's real scores?