QUANTA
by Ryshe
All papers
White Paper VJune 2026 · 19 min read

The Architecture of Secure AI Context Control

A reference architecture for an Azure-native, customer-owned AI context gateway

Ryshe · AI, Cloud & Security

Marble herm
Abstract

Every prompt an enterprise sends to a language model carries proprietary context: customer records, internal documents, system instructions, and the reasoning of agents acting on the business. A context gateway sits on that path, so it inherits the strongest trust requirement in the stack. This paper specifies a reference architecture for that gateway as an Azure-native system that can run inside the customer's own subscription, under the customer's own identity and keys, with an append-only record of everything it does. It covers design goals, placement and data flow, the Azure building blocks, tenancy models, identity and network controls, data handling, auditability, a threat model, deployment, managed operation, and compliance alignment.

Key takeaways
  • 1.The gateway is the most privileged component in the stack, so it must be designed for security before features.
  • 2.Four properties are treated as non-negotiable: control, privacy, auditability, and no unnecessary data egress.
  • 3.An Azure-native design maps each concern to a managed service, reducing the surface the customer must operate.
  • 4.A customer-owned, in-subscription deployment keeps data residency, key custody, and isolation under the customer's control.
  • 5.Identity and Key Vault replace stored secrets; private endpoints keep model and service traffic off the public internet.
  • 6.An append-only, hash-chained audit log turns 'what did the AI send' from a question into exportable evidence.
  • 7.Ryshe operates the gateway inside the customer tenant against published service levels, without taking custody of customer data.

Executive summary

A context gateway is the most privileged component in an enterprise AI stack. Everything sent to a model passes through it, which means every secret, every customer record, and every internal instruction passes through it too. That privilege is the reason the gateway must be designed for security first and features second. An architecture that compresses prompts brilliantly but cannot prove what it forwarded, or that quietly copies sensitive context out of the customer's control, has solved the wrong problem.

This paper treats four properties as non-negotiable: control over what is sent to which model, privacy of the context in transit and at rest, auditability of every decision, and no unnecessary movement of data out of the customer's trust boundary. Each is stated as a design goal that constrains the architecture, not a feature that can be toggled off. The Azure-native building blocks, the tenancy models, and the operational model all follow from those constraints rather than the other way around.

100%
of requests pass through the gateway, so it is the natural place to enforce and record policy (by construction)
0
secrets stored in application configuration; identity and Key Vault replace them (target state)
1
append-only, hash-chained audit log per environment, exportable as evidence (illustrative design)

The result is a gateway that an enterprise can deploy inside its own Azure subscription, bind to its own Entra ID tenant, and back with its own Key Vault, while Ryshe operates it as a managed service that never requires data to leave the customer boundary. The remainder of this paper specifies how each layer is built and why each choice serves the four design goals.

I. Design goals

A reference architecture is only as good as the constraints it refuses to relax. Before naming a single service, the gateway commits to four properties that hold on every request and that no feature is allowed to compromise. Everything downstream is a consequence of these goals, and any design choice that violates one is rejected regardless of the convenience it offers.

Control over what is sent

The gateway decides, per request, which model receives which context. Policy can route a workload to an approved model, strip fields a workload is not permitted to send, cap the context a caller may forward, and refuse requests that fall outside a workload's declared scope. Control is enforced at the point the request is made, not described in a document that applications are trusted to honor.

Privacy of context

Context is sensitive by default. It is encrypted in transit to the gateway and onward to the provider, encrypted at rest wherever it is persisted, and redacted at the edge when a workload's policy requires it. The default posture is to capture as little as a workload needs and to keep what is captured inside the customer's trust boundary.

Auditability of decisions

Every consequential action the gateway takes is recorded in a way that can be replayed and verified: which policy applied, what was compressed, which model was chosen, and what was forwarded. The record is append-only and tamper-evident, so an auditor can trust not only the contents but the claim that the contents were not altered after the fact.

No unnecessary data egress

Data does not leave the customer's trust boundary unless the workload's purpose requires it. Calling a model is a deliberate egress the workload asked for. Copying prompts to a vendor's telemetry system, a shared analytics store, or a multi-tenant database is not, and the architecture is built so those copies never need to happen.

Goals constrain features, not the reverse

Each section that follows is written to satisfy these four goals. When a tradeoff appears, the goal wins. A gateway that cannot prove what it sent, or that moves data it did not need to move, has failed at its primary job no matter how well it performs the secondary ones.

II. Placement and data flow

The gateway sits on the one path that every AI request must cross: between the applications, agents, and copilots that originate context and the model providers that consume it. Because all context passes through this single point, it is the natural place to enforce policy, apply compression, and record evidence. Nothing reaches a model without passing the gateway, and nothing the gateway does is invisible to the audit log.

Request lifecycle through the context gateway
Apps, agents, and copilots send a request to the gateway over a private endpoint. The gateway authenticates the caller through Entra ID, resolves the workload's policy, redacts and compresses the context, then forwards to Azure OpenAI or another approved provider over a private path. The response is scored against the workload's evaluation set, the full decision is written to the append-only audit log, and the result is returned to the caller. Apps -> Gateway (authenticate -> policy -> redact -> compress -> forward) -> Provider -> Gateway (score -> log -> return) -> Apps.

The request lifecycle, step by step

A single request moves through a fixed sequence of stages. Each stage is observable, each is governed by the workload's policy, and each contributes to the record written at the end. The order matters: authentication and policy precede any handling of payload, and logging happens before the response is returned so that no forwarded request can escape the record.

  1. 1Authenticate. The caller presents an Entra ID token. The gateway validates it and resolves the caller to a workload identity with an associated policy. Unauthenticated or unscoped requests are rejected before any payload is read.
  2. 2Apply policy. The workload's policy determines the permitted models, the fields that must be redacted, the maximum context size, and whether payload capture is enabled. Policy is resolved per request, so a change takes effect on the next call.
  3. 3Redact and compress. Sensitive fields are masked at the edge per policy, then compression reduces the context to what the task needs, behind an evaluation gate so quality is preserved.
  4. 4Forward. The gateway calls the approved provider over a private path, using a managed identity or a Key Vault-held credential. The raw provider credential is never exposed to the calling application.
  5. 5Score. The response is evaluated against the workload's quality checks where configured, producing a pass or fail signal that travels with the record.
  6. 6Log. The policy decision, the compression result, the model chosen, the token counts, and a hash of what was forwarded are written to the append-only audit log before the response leaves the gateway.
  7. 7Return. The completion is returned to the caller. From the application's point of view the gateway is a drop-in endpoint; everything above happened in the path of a single call.

Because the lifecycle is fixed and every stage is recorded, the gateway can answer the question that matters most to a regulated enterprise after the fact: for any request, what context reached which model, under which policy, and was the result within quality bounds.

III. Azure-native building blocks

The gateway is assembled from managed Azure services rather than bespoke infrastructure, so the customer operates as little undifferentiated plumbing as possible. Each concern in the architecture maps to a service chosen for its fit with the four design goals, its support for private networking and managed identity, and its place in the Well-Architected Framework. The mapping below is the canonical reference for what runs where.

ConcernAzure serviceRole in the gateway
Gateway runtimeAzure Container AppsRuns the stateless request path that authenticates, applies policy, compresses, and forwards
Admin consoleAzure App ServiceHosts the operator and customer console for policy, dashboards, and audit export
IdentityMicrosoft Entra IDAuthenticates callers and operators, issues workload identities, enforces RBAC
SecretsAzure Key VaultHolds provider credentials and signing material; nothing sensitive lives in config
NetworkAzure Private EndpointsKeeps gateway, provider, and data traffic on private paths off the public internet
TelemetryApplication Insights / Log AnalyticsCaptures metrics, traces, and the audit stream within the customer workspace
RetrievalAzure AI SearchServes grounding passages for retrieval workloads under the same policy and logging
Background jobsAzure Durable FunctionsRuns evaluation, purge, audit export, and long compression tasks out of the request path
InfrastructureBicepDeclares every resource as code so an environment is reproducible and reviewable
DeliveryGitHub ActionsBuilds, tests, and deploys the gateway with environment gates and rollback
Mapping of architectural concern to Azure service.

Why managed services, and why these

Each choice reduces the customer's operational surface and inherits Azure's own controls. Container Apps gives a scalable, stateless runtime without a cluster to patch. Key Vault and Entra ID remove stored secrets from the design entirely. Private Endpoints keep traffic on the Azure backbone. App Insights and Log Analytics keep telemetry in a workspace the customer owns. The common thread is that the gateway composes services the enterprise already trusts and already audits, rather than asking it to trust new infrastructure.

IV. Tenancy

The single most consequential deployment decision is where the gateway runs. Two models are supported. In the managed single-tenant model, Ryshe runs a dedicated instance for the customer in Ryshe's own subscription. In the customer-owned model, the same software runs inside the customer's Azure subscription, bound to the customer's Entra ID tenant and Key Vault, and Ryshe operates it there. Both are single-tenant; they differ in who holds the subscription and the keys.

DimensionManaged single-tenantCustomer-owned in-subscription
Data residencyRyshe-selected regionCustomer-chosen region in customer subscription
Key custodyRyshe-held Key VaultCustomer-held Key Vault
IsolationDedicated instance, Ryshe boundaryFull isolation within customer boundary
Onboarding timeDaysOne to three weeks
OperationsRyshe-operated end to endRyshe-operated inside customer tenant
Data egress to RysheTelemetry within Ryshe boundaryNone required
Comparison of the two tenancy models.

When each fits

The managed model suits teams that want value quickly and are comfortable with a dedicated instance in Ryshe's boundary, often for early adoption or lower-sensitivity workloads. The customer-owned model suits regulated enterprises that require data residency, key custody, and isolation to stay inside their own subscription. For those organizations the gateway is software they own and Ryshe operates, not a service they send data to. The architecture is identical across both; only the boundary moves.

V. Identity, secrets, and network

The three controls that most determine whether a gateway can be trusted are how it knows who is calling, how it holds its secrets, and how its traffic flows. The architecture takes the strict position on all three: identity is always Entra ID, secrets always live in Key Vault, and service traffic always travels private paths.

Entra ID for sign-in and authorization

Operators and console users sign in through Entra ID single sign-on, inheriting the enterprise's own conditional access, multi-factor, and lifecycle controls. Authorization is role-based: who may edit policy, who may read audit logs, and who may export evidence are distinct roles, granted through the customer's existing identity governance rather than a separate user store the customer must manage.

Managed identity for service-to-service

Components authenticate to each other and to Azure resources with managed identities rather than shared keys. The gateway reads from Key Vault, writes to Log Analytics, and queries AI Search as itself, with access granted through Azure RBAC. There is no service account password to rotate because there is no password.

Key Vault for every secret

Provider credentials, signing keys for the audit chain, and any other sensitive material live in Key Vault, retrieved at runtime through managed identity and never written to configuration, environment files, or images. Rotation is a Key Vault operation that the running gateway picks up without redeployment.

Private endpoints and no public exposure

The gateway reaches Azure OpenAI, AI Search, Key Vault, and storage over private endpoints, so that traffic stays on the Azure backbone and never traverses the public internet. The administrative surface is restricted to the customer's network. The result is that the sensitive paths, context to model and gateway to secret store, are private by construction.

No secrets in config, ever

A secret in an environment variable or a checked-in file is a secret that will eventually leak through a log, a backup, or a screen share. The architecture removes the category: identity replaces credentials for service-to-service calls, and Key Vault holds the few credentials that must exist. If a design needs a secret in config to work, it is the wrong design.

VI. Data handling and retention

Because the gateway sees the most sensitive part of every request, how it handles and retains payload is a first-order security decision, not an afterthought. The default is to capture as little as a workload needs, to redact at the edge before anything is stored, and to keep retention short and purge enforceable.

Payload capture is a policy choice

Capturing full request and response bodies is useful for debugging and for evaluation, and risky for privacy. The architecture makes it a per-workload setting with three positions: capture nothing beyond metadata, capture hashes and token counts only, or capture full payloads for a bounded window. A workload defaults to the least capture that lets it function, and raising capture is an audited change.

Redaction at the edge

When a workload's policy names sensitive fields, the gateway masks them before the context is forwarded and before any capture is written. Redaction happens at the edge of the gateway, so a masked value never reaches storage in the clear, and the audit record notes that redaction occurred without preserving what was removed.

Retention and purge

Captured payloads carry a retention window after which a Durable Functions job purges them. Purge is enforced by the platform, not by operator discipline, and the purge action is itself recorded so an auditor can confirm that data which should be gone is gone. Audit log entries, which contain hashes rather than payloads, are retained far longer than payloads because they hold no sensitive content.

Data classification

Workloads declare the classification of the context they send, and policy can bind handling rules to classification: a workload tagged for regulated data can be forced to redaction-on and full-capture-off regardless of the developer's preference. Classification turns an organizational policy into an enforced gateway behavior.

VII. Auditability

Auditability is the property that distinguishes a governed gateway from a fast one. It is not enough to log what happened; the log must be trustworthy enough that an auditor will accept it as evidence that nothing was altered after the fact. The architecture achieves this with an append-only, hash-chained record mirrored to immutable storage.

How the hash chain works

Each audit entry includes a cryptographic hash of its own contents combined with the hash of the entry before it. Because every entry's hash depends on all prior entries, changing or removing any past entry changes every hash that follows, which is detectable by recomputing the chain. The first entry anchors the chain, and the latest hash is a compact fingerprint of the entire history. Tampering is not prevented by access control alone; it is made evident by mathematics.

FieldContentsSensitive?
SequenceMonotonic entry numberNo
TimestampServer time of the decisionNo
WorkloadCaller and policy identifiersNo
DecisionModel chosen, redaction and compression appliedNo
Payload hashHash of forwarded context, not the contextNo
Token countsInput and output token totalsNo
Prev hashHash of the previous entryNo
Entry hashHash of this entry plus prev hashNo
Anatomy of an append-only audit entry.

WORM mirroring and exportable evidence

The chain is mirrored to write-once, read-many (WORM) storage with an immutability policy, so that even an operator with broad permissions cannot rewrite history within the retention window. The log is exportable as a self-contained evidence bundle: the entries, the chain, and the verification method, which an external auditor can validate independently without access to the running system. The record contains hashes and metadata, not payloads, so it can be shared without exposing the context it describes.

VIII. Threat model and mitigations

A security architecture is judged by the threats it names and the controls it places against them, not by general assurances. The gateway's position on the request path makes it a high-value target, so the threat model is explicit and each threat is paired with a concrete control already present in the architecture above.

ThreatControl
Prompt injectionPolicy-scoped tools, output evaluation, and forwarded-context logging for after-the-fact review
Data exfiltrationEdge redaction, no unnecessary egress, private endpoints, and least-capture defaults
Secret leakageKey Vault for all secrets, managed identity for service calls, no secrets in config
Supply chainBicep-declared infrastructure, pinned dependencies, signed builds through GitHub Actions
Insider misuseEntra ID RBAC, separation of policy and audit roles, append-only hash-chained log
Denial of walletPer-workload context caps, rate limits, and spend attribution to halt runaway cost
Threats to the gateway and the controls that address them.

Reading the model

Two entries deserve emphasis. Denial of wallet, where an attacker or a misconfigured agent drives cost through unbounded calls or oversized context, is a threat unique to metered AI, and the gateway is the only place to cap it because it is the only place that sees every call. Insider misuse is addressed less by prevention than by evidence: the append-only log means that a privileged action leaves a trace that the actor cannot quietly erase, which changes the calculus for anyone considering one.

IX. Deployment

An architecture that cannot be deployed reproducibly cannot be trusted, because each hand-built environment is a chance for an unreviewed difference to undermine the controls above. The gateway is delivered entirely through infrastructure-as-code and a gated pipeline, so that what runs in production is exactly what was reviewed and approved.

Infrastructure as code

Every resource, from Container Apps to Key Vault to private endpoints, is declared in Bicep. An environment is created by applying the templates, not by clicking through the portal, which means it is reviewable in a pull request, reproducible in a new subscription, and identical between environments save for parameters. Drift is detectable because the desired state is written down.

Environments and promotion

Changes flow through separate development, staging, and production environments, each its own set of resources with its own audit chain. Promotion from one to the next is a pipeline step with its own approval, so nothing reaches production without passing the gates before it.

CI/CD with blue-green and rollback

GitHub Actions builds and tests every change, then deploys with a blue-green strategy: the new revision runs alongside the current one and receives traffic only after health checks pass. If a problem appears, traffic shifts back to the prior revision in one step, because the prior revision is still running. Because the gateway's request path is stateless and the audit chain is append-only, a rollback affects routing without ever rewriting history.

X. Operations as a managed service

The customer-owned tenancy model raises a fair question: if the gateway runs inside the customer's subscription, who keeps it running. The answer is that Ryshe operates it there, as a managed service that works inside the customer's boundary rather than pulling data out of it. The customer owns the subscription, the keys, and the data; Ryshe owns the burden of running the software well.

Operating inside the customer tenant

Ryshe operators reach the gateway through the customer's Entra ID with scoped, audited roles, the same identity plane the customer's own staff use. Operational actions are subject to the same RBAC and the same append-only log as any other privileged action, so the customer can see exactly what was done on their behalf. Nothing about managed operation requires customer data to leave the boundary.

Service-level objectives

The managed service is held to published objectives so that 'managed' is a measurable commitment rather than a marketing word.

99.9%
request-path availability objective, measured monthly (target)
< 1 day
objective to apply a critical security patch after release (target)
100%
of privileged operator actions written to the append-only audit log (by design)

These objectives are illustrative of the operating posture and are set per engagement. The point is that managed operation is governed by the same instruments as the rest of the architecture: identity, RBAC, and an audit log that the customer, not Ryshe, ultimately controls.

XI. Compliance alignment

The architecture is built to map cleanly onto the control families that regulated enterprises are already audited against, so that adopting the gateway extends an existing compliance posture rather than opening a new gap to be assessed. The mapping below connects common frameworks to the gateway capabilities that satisfy them.

Control families mapped to gateway capabilities

SOC 2: the access controls (Entra ID RBAC), change management (Bicep plus gated CI/CD), and monitoring (App Insights plus the audit log) speak directly to the Security and Availability trust criteria. ISO/IEC 27001: identity, secrets in Key Vault, network isolation, and logging map to access control, cryptography, operations security, and communications security control sets. HIPAA-adjacent: edge redaction, least-capture defaults, enforced retention and purge, encryption in transit and at rest, and an append-only access record support the technical safeguards expected for sensitive data, even where the deployment is not itself a covered system. In every case the gateway supplies evidence, the hash-chained log, that auditors can verify rather than take on trust.

No architecture grants compliance on its own; compliance is a property of the whole system and its operation. What the gateway provides is alignment by construction: its controls are the same controls these frameworks ask for, instrumented in a way that produces the evidence an audit needs.

XII. Conclusion

A context gateway earns its place by being the one component that sees every AI request, and that same position is why it carries the heaviest trust requirement in the stack. The architecture in this paper meets that requirement by refusing to relax four goals: control over what is sent, privacy of the context, auditability of every decision, and no unnecessary movement of data. Each Azure-native choice, each tenancy option, and each operational practice is downstream of those goals. The outcome is a gateway an enterprise can own, run in its own subscription under its own keys, and prove the behavior of, on every request. Secure context control is not a feature added at the end. It is the shape of the architecture from the first decision.

References

  1. [1]Microsoft, Azure Architecture Center and the Azure Well-Architected Framework.
  2. [2]NIST, Special Publication 800-53: Security and Privacy Controls for Information Systems and Organizations.
  3. [3]ISO/IEC 27001, Information security, cybersecurity and privacy protection: Information security management systems.
  4. [4]OWASP, Top 10 for Large Language Model Applications.
  5. [5]Center for Internet Security, CIS Benchmarks for Microsoft Azure.
Read next · White Paper VI
Context Compression Without Quality Loss

Artwork: Marble herm, public domain (CC0), The Met, Open Access.