This is a sample. Condensed, anonymized version of the report you receive — based on a real read-only assessment of a public production LLM/agent gateway. Names, paths, and identifiers have been genericized. The findings, scores, and verdict are real. Your report is specific to your stack.
Book your audit

Provenwright MCP Gateway Readiness Audit

Sample report · 2026-06-09

Sample MCP Gateway Readiness Audit

An illustrative, anonymized sample of the report you receive — based on a real, read-only assessment of a public production LLM/agent gateway (a mid-size SaaS platform running multiple MCP servers across multiple teams).

Production-Ready with caveats
4 Green
3 Yellow
0 Red

Verdict: Production-Ready with caveats

For the gateway as written, assuming an operator who turns on the controls the platform provides.

This gateway is architecturally sound on the dimensions that matter most for safety. Authorization is enforced at the gateway on caller identity — resolved as a strict intersection of key, team, end-user, agent, and org permissions — and the model never decides what it may call (no authorization is expressed in prompts). That single property is what defeats prompt-injection-to-tool-call. Secrets are referenced, never inlined. Identity is JWT/OIDC-rooted on the call path and propagates end-to-end. MCP tokens use OAuth token-exchange (RFC 8693) with audience and scope binding, matching the MCP 2025-06-18 resource-server model. Cost is genuinely bounded — budget overruns raise an enforced exception rather than merely alerting.

The central risk is not a missing control but a defaulting one. Per-tool least-privilege and third-party-server pinning are opt-in, and one top-level authorization resolver path fails open on an unexpected exception. The single highest-leverage move is to flip those defaults to fail-closed and required-at-onboarding.

The two highest-leverage moves

Change the one fail-open authorization line

So the resolver returns "no servers" (not "all allow-all servers") on an unexpected error, and add a regression test. Effort: small. This removes the only fail-open path in an otherwise fail-closed authorization resolver.

Pin and allowlist the third-party MCP catalog

By version + digest, closing the floating-tag supply-chain exposure. Effort: medium.


We assess 7 dimensions of MCP gateway readiness through read-only inspection — source code, configuration, CI workflows, and deploy artifacts at a pinned commit. Every finding cites a specific artifact (file and line) captured in an evidence index, and every gap-matrix color traces to at least one cited row.

Where a control is present in code but operator-configured (budgets, rate limits, allowlists, guardrails, IDP), we assess the code-level default and flag the dependency explicitly; where a live check could not run (fault injection, real trace pull), we mark it static-only and report the limitation rather than implying coverage we don't have.

Read-only inspection

Source code, config, CI workflows, and deploy artifacts at a pinned commit. No write access, no system modification.

Evidence-backed findings

Every finding cites a specific artifact (file and line). Every gap-matrix color traces to at least one evidence-index row.

Stated limitations

Live checks that could not run (fault injection, real trace pull) are marked static-only. We report the limitation rather than implying coverage we don't have.


All 7 dimensions at a glance. Status colors trace directly to the per-dimension findings in Section 4.

Scored Gap Matrix — Sample Assessment

Production-grade Partial Absent / unsafe
# Dimension Status Finding Severity
01 Tool-access governance & RBAC Partial Strong gateway-enforced intersection RBAC — but per-tool least-privilege is opt-in. High
02 Fail-close vs fail-open Partial Per-level authz failures fail closed; one top-level resolver path fails open to allow-all servers. Critical
03 MCP / agent onboarding flow Partial Dual source-of-truth (declarative config and runtime DB); third-party servers unpinned; no tool-PR CI gate. High
04 Observability & tracing Pass First-class OpenTelemetry + GenAI semconv; W3C trace-context extracted (propagation operator-configurable). Medium
05 Multi-LLM routing & cost controls Pass Declarative routing; budget caps enforced (not alert-only); rate limits per key/model/MCP server. Low
06 Security, secrets & identity (IDP) Pass Zero inline secrets; JWT/OIDC on call path; end-user identity propagates; OAuth token-exchange with audience+scope. Low
07 Production-readiness gaps Partial Block-by-default guardrail, rate-limit 429s, alerting present — but no single tested global kill-switch or enforced canary. Medium

Legend: Production-grade = control exists, is enforced, and is verifiable. Partial = intent exists but has gaps — opt-in enforcement, manual steps, or no audit trail; known, bounded risk. Absent/unsafe = no effective control, or the control fails open.


Dimension 01 — Tool-Access Governance & RBAC

Partial
What we looked at

Where the authorization decision lives, whether the model ever participates in it, and how granular the grants are.

What we found

Authorization is enforced at the gateway on caller identity, resolved as a strict intersection of key → team → end-user → agent → org permissions, with org as a ceiling. A full-tree search found no in-prompt tool gating — no "only call this if the user is an admin" logic in any system prompt. Deny-by-default holds for unmapped callers. Per-tool granularity exists, but it is opt-in per server: with no allowlist configured, the tool-permission check returns "allow."

Why it matters

The hard part — keeping the model out of the authorization decision — is done correctly. The residual risk is operator misconfiguration: a write or external tool on a server with no tool allowlist is callable by anyone with server access.

Dimension 02 — Fail-Close vs Fail-Open

Partial
What we looked at

Every exception handler on the authorization and call path — does a degraded check deny or allow?

What we found

Every per-level permission resolver fails closed: on an unexpected exception it logs and returns an empty set, resolving to "no access" downstream. One exception: the top-level policy resolver returns the set of allow-all servers (not an empty set) on an unexpected error — a partial fail-open, bounded to servers an operator already marked public. Every upstream call carries an explicit timeout; a per-tool parameter allowlist rejects unexpected arguments; and concurrency/rate limits shed load with HTTP 429.

Why it matters

Fail-open in an authorization resolver is the single highest-risk class in the framework — it's how a degraded check silently becomes "allow." Here it is bounded, but it is still the one line that errs toward exposure. It is also a one-line fix.

Dimension 03 — MCP / Agent Onboarding Flow

Partial
What we looked at

How servers and tools are registered, whether the running tool set is reconstructable from source control, and whether onboarding enforces governance.

What we found

A declarative config block exists, but MCP servers can also be created at runtime via an authenticated REST endpoint that writes to a database — a dual, mutable source of truth where running state can drift from source control. The curated third-party catalog references stdio servers via unpinned floating-tag commands (e.g. npx -y @vendor/mcp-server) with no version, digest, or checksum. CI is otherwise strong (schema-sync, CodeQL, supply-chain scorecard, unit tests) but has no required-field gate on a tool-registration PR.

Why it matters

You cannot fully reconstruct the live tool set from source control when servers can be added via the API, and an unpinned third-party server is a tampered-package away from a supply-chain incident (OWASP LLM05). Onboarding is the natural enforcement point for the controls that are bolt-on today.

Dimension 04 — Observability & Tracing

Pass
What we looked at

Whether per-model/token/cost attribution and end-to-end request reconstruction are possible, and whether trace context survives hops.

What we found

OpenTelemetry is a first-class integration with dedicated GenAI semantic-convention mapping (operation name, token usage, cache metrics), multiple exporters, and inbound W3C traceparent extraction.

Why it matters

The building blocks for end-to-end reconstruction and per-team cost attribution are present and standards-aligned. Caveat: context propagation into every MCP hop is operator-configurable (not guaranteed by default), and no live trace could be pulled in a read-only assessment — so "reconstruct one real request" is noted as unverified.

Dimension 05 — Multi-LLM Routing & Cost Controls

Pass
What we looked at

Whether routing is a declarative policy and whether cost is enforced or merely alerted.

What we found

Routing is a central declarative mapping from virtual model names to physical deployments, with router-level retries and timeouts. Budget caps are enforced — overruns raise a budget-exceeded exception, not just an alert. Rate limits (rpm/tpm) are expressible per key, per model, and per MCP server.

Why it matters

This is the dimension the platform is purpose-built for, and it shows — there is a real, enforced path against bill-shock and cost-based DoS (OWASP LLM10).

Dimension 06 — Security, Secrets & Identity (IDP)

Pass
What we looked at

Whether secrets are inlined, where identity is rooted, and whether the MCP token model matches the spec.

What we found

No inline secret values — the only key-shaped hits are docstring API examples; real config uses environment references. Identity is JWT/OIDC-rooted on the gateway call path (not just dashboard login), and the end-user identity propagates through to MCP handling and spend logs rather than collapsing to one shared credential. MCP tokens use OAuth token-exchange (RFC 8693) with audience and scope binding. A deliberate policy avoids leaking caller bearer tokens upstream.

Why it matters

This is the strongest dimension — the secure defaults are in the code, and the token model is the one the MCP spec asks for. Who-can-do-what is answerable and revocable at the IDP.

Dimension 07 — Production-Readiness Gaps

Partial
What we looked at

Guardrails, alerting, the ability to stop a misbehaving tool, and staged-rollout discipline.

What we found

A dedicated MCP security guardrail defaults to block (not alert), alongside a catalog of injection/jailbreak guardrails and MCP-specific permission guardrails. Slack/email/Prometheus alerting (including hanging-request detection) ships. Tools and servers can be disabled via config without a binary redeploy. Helm chart, hardened compose, and Terraform ship for staged deploy.

Why it matters

The operational levers exist, so an incident is recoverable — but with friction. There is no single tested global kill-switch surfaced in code, no enforced canary/blue-green, and the red-team guardrails are available but operator-wired, not a standing CI gate.


Verdict

Production-Ready with caveats. No dimension is red. The three safety-critical dimensions — RBAC (the model is kept out of the authorization path), fail-close (correct everywhere except one bounded line), and identity/secrets (green) — are all solid. The yellows are defaulting and operational gaps, not absent controls: exactly the profile of a capable platform that needs config hardening, not re-architecture.

Top risks — ranked by impact

1

Authorization resolver returns the allow-all set on an unexpected error — a bounded fail-open path

2 Low High
2

Unpinned floating-tag third-party MCP servers — supply-chain exposure (OWASP LLM05)

3 Medium High
3

Per-tool least-privilege off by default — a write/external tool callable by any key with server access

1 Medium Medium
4

Running tool set can drift from source control via runtime DB writes

3 Medium Medium
5

No single tested global kill-switch / no enforced staged rollout

7 Medium Medium

A sequenced fix plan derived directly from the gap matrix, ordered by risk-reduction-per-effort. Each item traces to a yellow finding and its cited evidence.

Phase 1 Weeks 0–2

Launch-blocking / highest-leverage

The only fail-open path and the only floating supply chain. Small effort, highest risk reduction.

Change the one fail-open resolver line

S

Return an empty set (deny) instead of the allow-all set on an unexpected exception; add a regression test that asserts deny-on-error. The authorization resolver fails closed everywhere, with a standing test preventing regression.

Pin the third-party MCP catalog

M

Replace every floating-tag command with a version + digest pin and allowlist the pinned set. No floating-tag MCP server can resolve; the third-party set is reproducible and tamper-evident.

Phase 2 Weeks 2–6

Default-on the controls

Convert opt-in safety controls to default-on, and shift governance left to onboarding.

Make per-tool least-privilege default-on

M

Require a per-server tool allowlist so the permission check denies absent an explicit grant; gate the "allow all keys" foot-gun behind an audited override. A write/external tool is callable only by identities with an explicit grant.

Add a tool-PR CI validation gate

M

Fail any tool/server PR unless tool allowlist, RBAC grant, rate limit, and pinned source are present; resolve the dual source-of-truth so running state cannot drift. Governance is enforced at onboarding with a full audit trail.

Phase 3 Weeks 6–12

Operationalize & prove

The operational table-stakes that make readiness durable.

Ship a single tested global kill-switch

M/L

Disable a tool, a server, or the whole MCP gateway without a redeploy, plus an enforced staged rollout (canary/blue-green) with a tested one-command rollback. A misbehaving tool or the whole gateway can be stopped in seconds, verified by test.

Wire red-team / injection guardrails into a standing CI eval gate

M

Add a refusal-rate threshold on every tool/server PR. Injection, jailbreak, and tool-misuse regressions are caught before merge, not in production.

Net effect: Phase 1 removes the only fail-open path and the floating supply chain. Phase 2 makes least-privilege and governance default-on. Phase 3 makes readiness durable with a tested stop lever and a standing safety gate.


A fixed-scope, read-only engagement delivering four artifacts designed to be read together, plus a live walkthrough.

Readiness Report

The full narrative: executive summary, scope, methodology, per-dimension findings, verdict, top risks, and prioritized recommendations.

Scored Gap Matrix

The one-screen view: all 7 dimensions with status, one-line finding, severity, effort, and a pointer to the roadmap item that closes each gap.

90-Day Roadmap

The sequenced fix plan, phased by risk-reduction-per-effort, with owners, effort, expected outcomes, and dependencies. Doubles as the scoping input for a fixed-price implementation engagement.

Evidence Index

Every finding cited to a specific artifact (file and line), with the observed behavior, the rubric line it matches, its color, and its severity. Every gap-matrix color traces back to at least one row here.

Plus a live review session

A walkthrough of the findings and roadmap with your engineering and security stakeholders, so the report lands as a shared plan, not a PDF on a shelf.


Free open-source tool

Want a free first read of your own stack?

This sample is a paid audit deliverable — specific findings, evidence index, cited roadmap. Before that, run the open-source scanner for a free heuristic read across all 7 dimensions. It's read-only, takes seconds, and tells you exactly where to look.

Run it on your repo

npx mcp-gateway-scan ./your-gateway

100% read-only · no network calls · secret values redacted (location only)


Ready to see your stack's real scores?

Book a 15-minute discovery call. We map your stack and scope a fixed-price audit — you decide whether to proceed with the number in front of you. No deck, no obligation.

Fixed scope · Fixed price · Written scope statement before any audit work begins