Sample MCP Gateway Readiness Audit

01

Executive Summary

Verdict: Production-Ready with caveats

For the gateway as written, assuming an operator who turns on the controls the platform provides.

This gateway is architecturally sound on the dimensions that matter most for safety. Authorization is enforced at the gateway on caller identity — resolved as a strict intersection of key, team, end-user, agent, and org permissions — and the model never decides what it may call. That single property is what defeats prompt-injection-to-tool-call. Secrets are referenced, never inlined. Identity is JWT/OIDC-rooted on the call path and propagates end-to-end. Cost is genuinely bounded — budget overruns raise an enforced exception rather than merely alerting.

The central risk is not a missing control but a defaulting one. Per-tool least-privilege and third-party-server pinning are opt-in, and one top-level authorization resolver path fails open on an unexpected exception. The single highest-leverage move is to flip those defaults to fail-closed and required-at-onboarding.

The two highest-leverage moves

1

Change the one fail-open authorization line

Return "no servers" (not "all allow-all servers") on an unexpected error, and add a regression test. Effort: small.

2

Pin and allowlist the third-party MCP catalog

By version + digest, closing the floating-tag supply-chain exposure. Effort: medium.

02

Scope & Methodology

We assess 7 dimensions of MCP gateway readiness through read-only inspection — source code, configuration, CI workflows, and deploy artifacts at a pinned commit. Every finding cites a specific artifact (file and line) captured in an evidence index.

Read-only inspection

Source code, config, CI workflows, and deploy artifacts at a pinned commit. No write access, no system modification.

Evidence-backed findings

Every finding cites a specific artifact (file and line). Every gap-matrix color traces to at least one evidence-index row.

Stated limitations

Live checks that could not run are marked static-only. We report the limitation rather than implying coverage we don't have.

03

Scored Gap Matrix

All 7 dimensions at a glance. Status colors trace directly to the per-dimension findings in Section 4.

Scored Gap Matrix — Sample Assessment

Production-grade Partial Absent / unsafe

#	Dimension	Status	Finding	Severity
01	Tool-access governance & RBAC	Partial	Strong gateway-enforced intersection RBAC — but per-tool least-privilege is opt-in.	High
02	Fail-close vs fail-open	Partial	Per-level authz failures fail closed; one top-level resolver path fails open to allow-all servers.	Critical
03	MCP / agent onboarding flow	Partial	Dual source-of-truth (declarative config and runtime DB); third-party servers unpinned; no tool-PR CI gate.	High
04	Observability & tracing	Pass	First-class OpenTelemetry + GenAI semconv; W3C trace-context extracted (propagation operator-configurable).	Medium
05	Multi-LLM routing & cost controls	Pass	Declarative routing; budget caps enforced (not alert-only); rate limits per key/model/MCP server.	Low
06	Security, secrets & identity (IDP)	Pass	Zero inline secrets; JWT/OIDC on call path; end-user identity propagates; OAuth token-exchange with audience+scope.	Low
07	Production-readiness gaps	Partial	Block-by-default guardrail, rate-limit 429s, alerting present — but no single tested global kill-switch or enforced canary.	Medium

Legend: Production-grade = control exists, is enforced, and is verifiable. Partial = intent exists but has gaps. Absent/unsafe = no effective control, or the control fails open.

04

Per-Dimension Findings

What we looked at

Where the authorization decision lives, whether the model ever participates in it, and how granular the grants are.

What we found

Authorization is enforced at the gateway on caller identity, resolved as a strict intersection of key → team → end-user → agent → org permissions. Deny-by-default holds for unmapped callers. Per-tool granularity exists, but it is opt-in per server: with no allowlist configured, the tool-permission check returns "allow."

Why it matters

The hard part — keeping the model out of the authorization decision — is done correctly. The residual risk is operator misconfiguration: a write or external tool on a server with no tool allowlist is callable by anyone with server access.

What we looked at

Every exception handler on the authorization and call path — does a degraded check deny or allow?

What we found

Every per-level permission resolver fails closed. One exception: the top-level policy resolver returns the set of allow-all servers on an unexpected error — a partial fail-open, bounded to servers an operator already marked public.

Why it matters

Fail-open in an authorization resolver is the single highest-risk class — it's how a degraded check silently becomes "allow." Here it is bounded, but it is still the one line that errs toward exposure. It is also a one-line fix.

What we looked at

How servers and tools are registered, whether the running tool set is reconstructable from source control, and whether onboarding enforces governance.

What we found

A declarative config block exists, but MCP servers can also be created at runtime via an authenticated REST endpoint that writes to a database — a dual, mutable source of truth. The curated third-party catalog references servers via unpinned floating-tag commands with no version, digest, or checksum.

Why it matters

You cannot fully reconstruct the live tool set from source control when servers can be added via the API, and an unpinned third-party server is a tampered-package away from a supply-chain incident (OWASP LLM05).

What we looked at

Whether per-model/token/cost attribution and end-to-end request reconstruction are possible, and whether trace context survives hops.

What we found

OpenTelemetry is a first-class integration with dedicated GenAI semantic-convention mapping (operation name, token usage, cache metrics), multiple exporters, and inbound W3C traceparent extraction.

Why it matters

The building blocks for end-to-end reconstruction and per-team cost attribution are present and standards-aligned. Caveat: context propagation into every MCP hop is operator-configurable (not guaranteed by default).

What we looked at

Whether routing is a declarative policy and whether cost is enforced or merely alerted.

What we found

Routing is a central declarative mapping from virtual model names to physical deployments, with router-level retries and timeouts. Budget caps are enforced — overruns raise a budget-exceeded exception, not just an alert.

Why it matters

This is the dimension the platform is purpose-built for, and it shows — there is a real, enforced path against bill-shock and cost-based DoS (OWASP LLM10).

What we looked at

Whether secrets are inlined, where identity is rooted, and whether the MCP token model matches the spec.

What we found

No inline secret values. Identity is JWT/OIDC-rooted on the gateway call path, and the end-user identity propagates through to MCP handling and spend logs. MCP tokens use OAuth token-exchange (RFC 8693) with audience and scope binding.

Why it matters

This is the strongest dimension — the secure defaults are in the code, and the token model is the one the MCP spec asks for. Who-can-do-what is answerable and revocable at the IDP.

What we looked at

Guardrails, alerting, the ability to stop a misbehaving tool, and staged-rollout discipline.

What we found

A dedicated MCP security guardrail defaults to block (not alert). Slack/email/Prometheus alerting ships. Tools and servers can be disabled via config without a binary redeploy. Helm chart, hardened compose, and Terraform ship for staged deploy.

Why it matters

The operational levers exist, so an incident is recoverable — but with friction. There is no single tested global kill-switch surfaced in code, no enforced canary/blue-green, and the red-team guardrails are available but operator-wired, not a standing CI gate.

05

Verdict & Top Risks

Top risks — ranked by impact

# Risk Dim Likelihood Impact

1

Authorization resolver returns the allow-all set on an unexpected error — a bounded fail-open path

2 Low High

2

Unpinned floating-tag third-party MCP servers — supply-chain exposure (OWASP LLM05)

3 Medium High

3

Per-tool least-privilege off by default — a write/external tool callable by any key with server access

1 Medium Medium

4

Running tool set can drift from source control via runtime DB writes

3 Medium Medium

5

No single tested global kill-switch / no enforced staged rollout

7 Medium Medium

06

90-Day Roadmap

A sequenced fix plan ordered by risk-reduction-per-effort. Each item traces to a yellow finding and its cited evidence.

Phase 1 Weeks 0–2

Launch-blocking / highest-leverage

Return an empty set (deny) instead of the allow-all set on an unexpected exception; add a regression test that asserts deny-on-error.

Replace every floating-tag command with a version + digest pin and allowlist the pinned set. No floating-tag MCP server can resolve.

Phase 2 Weeks 2–6

Default-on the controls

Require a per-server tool allowlist so the permission check denies absent an explicit grant; gate the "allow all keys" foot-gun behind an audited override.

Fail any tool/server PR unless tool allowlist, RBAC grant, rate limit, and pinned source are present; resolve the dual source-of-truth.

Phase 3 Weeks 6–12

Operationalize & prove

Disable a tool, a server, or the whole MCP gateway without a redeploy, plus an enforced staged rollout (canary/blue-green) with a tested one-command rollback.

Add a refusal-rate threshold on every tool/server PR. Injection, jailbreak, and tool-misuse regressions are caught before merge.

Net effect: Phase 1 removes the only fail-open path and the floating supply chain. Phase 2 makes least-privilege and governance default-on. Phase 3 makes readiness durable with a tested stop lever and a standing safety gate.

07

What You Receive

A fixed-scope, read-only engagement delivering four artifacts designed to be read together, plus a live walkthrough.

The full narrative: executive summary, scope, methodology, per-dimension findings, verdict, top risks, and prioritized recommendations.

All 7 dimensions with status, one-line finding, severity, effort, and a pointer to the roadmap item that closes each gap.

The sequenced fix plan, phased by risk-reduction-per-effort, with owners, effort, expected outcomes, and dependencies.

Every finding cited to a specific artifact (file and line), with the observed behavior, the rubric line it matches, its color, and its severity.

Plus a live review session

A walkthrough of the findings and roadmap with your engineering and security stakeholders, so the report lands as a shared plan, not a PDF on a shelf.

Sample MCP Gateway Readiness Audit

Executive Summary

Scope & Methodology

Scored Gap Matrix

Per-Dimension Findings

Dimension 01 — Tool-Access Governance & RBAC

Dimension 02 — Fail-Close vs Fail-Open

Dimension 03 — MCP / Agent Onboarding Flow

Dimension 04 — Observability & Tracing

Dimension 05 — Multi-LLM Routing & Cost Controls

Dimension 06 — Security, Secrets & Identity (IDP)

Dimension 07 — Production-Readiness Gaps

Verdict & Top Risks

90-Day Roadmap

What You Receive

Readiness Report

Scored Gap Matrix

90-Day Roadmap

Evidence Index