Every prompt an enterprise sends to a language model carries proprietary context: customer records, internal documents, system instructions, and the reasoning of agents acting on the business. A context gateway sits on that path, so it inherits the strongest trust requirement in the stack. This paper specifies a reference architecture for that gateway as an Azure-native system that can run inside the customer's own subscription, under the customer's own identity and keys, with an append-only record of everything it does. It covers design goals, placement and data flow, the Azure building blocks, tenancy models, identity and network controls, data handling, auditability, a threat model, deployment, managed operation, and compliance alignment.
- 1.The gateway is the most privileged component in the stack, so it must be designed for security before features.
- 2.Four properties are treated as non-negotiable: control, privacy, auditability, and no unnecessary data egress.
- 3.An Azure-native design maps each concern to a managed service, reducing the surface the customer must operate.
- 4.A customer-owned, in-subscription deployment keeps data residency, key custody, and isolation under the customer's control.
- 5.Identity and Key Vault replace stored secrets; private endpoints keep model and service traffic off the public internet.
- 6.An append-only, hash-chained audit log turns 'what did the AI send' from a question into exportable evidence.
- 7.Ryshe operates the gateway inside the customer tenant against published service levels, without taking custody of customer data.
Executive summary
A context gateway is the most privileged component in an enterprise AI stack. Everything sent to a model passes through it, which means every secret, every customer record, and every internal instruction passes through it too. That privilege is the reason the gateway must be designed for security first and features second. An architecture that compresses prompts brilliantly but cannot prove what it forwarded, or that quietly copies sensitive context out of the customer's control, has solved the wrong problem.
This paper treats four properties as non-negotiable: control over what is sent to which model, privacy of the context in transit and at rest, auditability of every decision, and no unnecessary movement of data out of the customer's trust boundary. Each is stated as a design goal that constrains the architecture, not a feature that can be toggled off. The Azure-native building blocks, the tenancy models, and the operational model all follow from those constraints rather than the other way around.
The result is a gateway that an enterprise can deploy inside its own Azure subscription, bind to its own Entra ID tenant, and back with its own Key Vault, while Ryshe operates it as a managed service that never requires data to leave the customer boundary. The remainder of this paper specifies how each layer is built and why each choice serves the four design goals.
I. Design goals
A reference architecture is only as good as the constraints it refuses to relax. Before naming a single service, the gateway commits to four properties that hold on every request and that no feature is allowed to compromise. Everything downstream is a consequence of these goals, and any design choice that violates one is rejected regardless of the convenience it offers.
Control over what is sent
The gateway decides, per request, which model receives which context. Policy can route a workload to an approved model, strip fields a workload is not permitted to send, cap the context a caller may forward, and refuse requests that fall outside a workload's declared scope. Control is enforced at the point the request is made, not described in a document that applications are trusted to honor.
Privacy of context
Context is sensitive by default. It is encrypted in transit to the gateway and onward to the provider, encrypted at rest wherever it is persisted, and redacted at the edge when a workload's policy requires it. The default posture is to capture as little as a workload needs and to keep what is captured inside the customer's trust boundary.
Auditability of decisions
Every consequential action the gateway takes is recorded in a way that can be replayed and verified: which policy applied, what was compressed, which model was chosen, and what was forwarded. The record is append-only and tamper-evident, so an auditor can trust not only the contents but the claim that the contents were not altered after the fact.
No unnecessary data egress
Data does not leave the customer's trust boundary unless the workload's purpose requires it. Calling a model is a deliberate egress the workload asked for. Copying prompts to a vendor's telemetry system, a shared analytics store, or a multi-tenant database is not, and the architecture is built so those copies never need to happen.
Each section that follows is written to satisfy these four goals. When a tradeoff appears, the goal wins. A gateway that cannot prove what it sent, or that moves data it did not need to move, has failed at its primary job no matter how well it performs the secondary ones.
II. Placement and data flow
The gateway sits on the one path that every AI request must cross: between the applications, agents, and copilots that originate context and the model providers that consume it. Because all context passes through this single point, it is the natural place to enforce policy, apply compression, and record evidence. Nothing reaches a model without passing the gateway, and nothing the gateway does is invisible to the audit log.
The request lifecycle, step by step
A single request moves through a fixed sequence of stages. Each stage is observable, each is governed by the workload's policy, and each contributes to the record written at the end. The order matters: authentication and policy precede any handling of payload, and logging happens before the response is returned so that no forwarded request can escape the record.
- 1Authenticate. The caller presents an Entra ID token. The gateway validates it and resolves the caller to a workload identity with an associated policy. Unauthenticated or unscoped requests are rejected before any payload is read.
- 2Apply policy. The workload's policy determines the permitted models, the fields that must be redacted, the maximum context size, and whether payload capture is enabled. Policy is resolved per request, so a change takes effect on the next call.
- 3Redact and compress. Sensitive fields are masked at the edge per policy, then compression reduces the context to what the task needs, behind an evaluation gate so quality is preserved.
- 4Forward. The gateway calls the approved provider over a private path, using a managed identity or a Key Vault-held credential. The raw provider credential is never exposed to the calling application.
- 5Score. The response is evaluated against the workload's quality checks where configured, producing a pass or fail signal that travels with the record.
- 6Log. The policy decision, the compression result, the model chosen, the token counts, and a hash of what was forwarded are written to the append-only audit log before the response leaves the gateway.
- 7Return. The completion is returned to the caller. From the application's point of view the gateway is a drop-in endpoint; everything above happened in the path of a single call.
Because the lifecycle is fixed and every stage is recorded, the gateway can answer the question that matters most to a regulated enterprise after the fact: for any request, what context reached which model, under which policy, and was the result within quality bounds.
III. Azure-native building blocks
The gateway is assembled from managed Azure services rather than bespoke infrastructure, so the customer operates as little undifferentiated plumbing as possible. Each concern in the architecture maps to a service chosen for its fit with the four design goals, its support for private networking and managed identity, and its place in the Well-Architected Framework. The mapping below is the canonical reference for what runs where.
| Concern | Azure service | Role in the gateway |
|---|---|---|
| Gateway runtime | Azure Container Apps | Runs the stateless request path that authenticates, applies policy, compresses, and forwards |
| Admin console | Azure App Service | Hosts the operator and customer console for policy, dashboards, and audit export |
| Identity | Microsoft Entra ID | Authenticates callers and operators, issues workload identities, enforces RBAC |
| Secrets | Azure Key Vault | Holds provider credentials and signing material; nothing sensitive lives in config |
| Network | Azure Private Endpoints | Keeps gateway, provider, and data traffic on private paths off the public internet |
| Telemetry | Application Insights / Log Analytics | Captures metrics, traces, and the audit stream within the customer workspace |
| Retrieval | Azure AI Search | Serves grounding passages for retrieval workloads under the same policy and logging |
| Background jobs | Azure Durable Functions | Runs evaluation, purge, audit export, and long compression tasks out of the request path |
| Infrastructure | Bicep | Declares every resource as code so an environment is reproducible and reviewable |
| Delivery | GitHub Actions | Builds, tests, and deploys the gateway with environment gates and rollback |
Why managed services, and why these
Each choice reduces the customer's operational surface and inherits Azure's own controls. Container Apps gives a scalable, stateless runtime without a cluster to patch. Key Vault and Entra ID remove stored secrets from the design entirely. Private Endpoints keep traffic on the Azure backbone. App Insights and Log Analytics keep telemetry in a workspace the customer owns. The common thread is that the gateway composes services the enterprise already trusts and already audits, rather than asking it to trust new infrastructure.
IV. Tenancy
The single most consequential deployment decision is where the gateway runs. Two models are supported. In the managed single-tenant model, Ryshe runs a dedicated instance for the customer in Ryshe's own subscription. In the customer-owned model, the same software runs inside the customer's Azure subscription, bound to the customer's Entra ID tenant and Key Vault, and Ryshe operates it there. Both are single-tenant; they differ in who holds the subscription and the keys.
| Dimension | Managed single-tenant | Customer-owned in-subscription |
|---|---|---|
| Data residency | Ryshe-selected region | Customer-chosen region in customer subscription |
| Key custody | Ryshe-held Key Vault | Customer-held Key Vault |
| Isolation | Dedicated instance, Ryshe boundary | Full isolation within customer boundary |
| Onboarding time | Days | One to three weeks |
| Operations | Ryshe-operated end to end | Ryshe-operated inside customer tenant |
| Data egress to Ryshe | Telemetry within Ryshe boundary | None required |
When each fits
The managed model suits teams that want value quickly and are comfortable with a dedicated instance in Ryshe's boundary, often for early adoption or lower-sensitivity workloads. The customer-owned model suits regulated enterprises that require data residency, key custody, and isolation to stay inside their own subscription. For those organizations the gateway is software they own and Ryshe operates, not a service they send data to. The architecture is identical across both; only the boundary moves.
V. Identity, secrets, and network
The three controls that most determine whether a gateway can be trusted are how it knows who is calling, how it holds its secrets, and how its traffic flows. The architecture takes the strict position on all three: identity is always Entra ID, secrets always live in Key Vault, and service traffic always travels private paths.
Entra ID for sign-in and authorization
Operators and console users sign in through Entra ID single sign-on, inheriting the enterprise's own conditional access, multi-factor, and lifecycle controls. Authorization is role-based: who may edit policy, who may read audit logs, and who may export evidence are distinct roles, granted through the customer's existing identity governance rather than a separate user store the customer must manage.
Managed identity for service-to-service
Components authenticate to each other and to Azure resources with managed identities rather than shared keys. The gateway reads from Key Vault, writes to Log Analytics, and queries AI Search as itself, with access granted through Azure RBAC. There is no service account password to rotate because there is no password.
Key Vault for every secret
Provider credentials, signing keys for the audit chain, and any other sensitive material live in Key Vault, retrieved at runtime through managed identity and never written to configuration, environment files, or images. Rotation is a Key Vault operation that the running gateway picks up without redeployment.
Private endpoints and no public exposure
The gateway reaches Azure OpenAI, AI Search, Key Vault, and storage over private endpoints, so that traffic stays on the Azure backbone and never traverses the public internet. The administrative surface is restricted to the customer's network. The result is that the sensitive paths, context to model and gateway to secret store, are private by construction.
A secret in an environment variable or a checked-in file is a secret that will eventually leak through a log, a backup, or a screen share. The architecture removes the category: identity replaces credentials for service-to-service calls, and Key Vault holds the few credentials that must exist. If a design needs a secret in config to work, it is the wrong design.
VI. Data handling and retention
Because the gateway sees the most sensitive part of every request, how it handles and retains payload is a first-order security decision, not an afterthought. The default is to capture as little as a workload needs, to redact at the edge before anything is stored, and to keep retention short and purge enforceable.
Payload capture is a policy choice
Capturing full request and response bodies is useful for debugging and for evaluation, and risky for privacy. The architecture makes it a per-workload setting with three positions: capture nothing beyond metadata, capture hashes and token counts only, or capture full payloads for a bounded window. A workload defaults to the least capture that lets it function, and raising capture is an audited change.
Redaction at the edge
When a workload's policy names sensitive fields, the gateway masks them before the context is forwarded and before any capture is written. Redaction happens at the edge of the gateway, so a masked value never reaches storage in the clear, and the audit record notes that redaction occurred without preserving what was removed.
Retention and purge
Captured payloads carry a retention window after which a Durable Functions job purges them. Purge is enforced by the platform, not by operator discipline, and the purge action is itself recorded so an auditor can confirm that data which should be gone is gone. Audit log entries, which contain hashes rather than payloads, are retained far longer than payloads because they hold no sensitive content.
Data classification
Workloads declare the classification of the context they send, and policy can bind handling rules to classification: a workload tagged for regulated data can be forced to redaction-on and full-capture-off regardless of the developer's preference. Classification turns an organizational policy into an enforced gateway behavior.
VII. Auditability
Auditability is the property that distinguishes a governed gateway from a fast one. It is not enough to log what happened; the log must be trustworthy enough that an auditor will accept it as evidence that nothing was altered after the fact. The architecture achieves this with an append-only, hash-chained record mirrored to immutable storage.
How the hash chain works
Each audit entry includes a cryptographic hash of its own contents combined with the hash of the entry before it. Because every entry's hash depends on all prior entries, changing or removing any past entry changes every hash that follows, which is detectable by recomputing the chain. The first entry anchors the chain, and the latest hash is a compact fingerprint of the entire history. Tampering is not prevented by access control alone; it is made evident by mathematics.
| Field | Contents | Sensitive? |
|---|---|---|
| Sequence | Monotonic entry number | No |
| Timestamp | Server time of the decision | No |
| Workload | Caller and policy identifiers | No |
| Decision | Model chosen, redaction and compression applied | No |
| Payload hash | Hash of forwarded context, not the context | No |
| Token counts | Input and output token totals | No |
| Prev hash | Hash of the previous entry | No |
| Entry hash | Hash of this entry plus prev hash | No |
WORM mirroring and exportable evidence
The chain is mirrored to write-once, read-many (WORM) storage with an immutability policy, so that even an operator with broad permissions cannot rewrite history within the retention window. The log is exportable as a self-contained evidence bundle: the entries, the chain, and the verification method, which an external auditor can validate independently without access to the running system. The record contains hashes and metadata, not payloads, so it can be shared without exposing the context it describes.
VIII. Threat model and mitigations
A security architecture is judged by the threats it names and the controls it places against them, not by general assurances. The gateway's position on the request path makes it a high-value target, so the threat model is explicit and each threat is paired with a concrete control already present in the architecture above.
| Threat | Control |
|---|---|
| Prompt injection | Policy-scoped tools, output evaluation, and forwarded-context logging for after-the-fact review |
| Data exfiltration | Edge redaction, no unnecessary egress, private endpoints, and least-capture defaults |
| Secret leakage | Key Vault for all secrets, managed identity for service calls, no secrets in config |
| Supply chain | Bicep-declared infrastructure, pinned dependencies, signed builds through GitHub Actions |
| Insider misuse | Entra ID RBAC, separation of policy and audit roles, append-only hash-chained log |
| Denial of wallet | Per-workload context caps, rate limits, and spend attribution to halt runaway cost |
Reading the model
Two entries deserve emphasis. Denial of wallet, where an attacker or a misconfigured agent drives cost through unbounded calls or oversized context, is a threat unique to metered AI, and the gateway is the only place to cap it because it is the only place that sees every call. Insider misuse is addressed less by prevention than by evidence: the append-only log means that a privileged action leaves a trace that the actor cannot quietly erase, which changes the calculus for anyone considering one.
IX. Deployment
An architecture that cannot be deployed reproducibly cannot be trusted, because each hand-built environment is a chance for an unreviewed difference to undermine the controls above. The gateway is delivered entirely through infrastructure-as-code and a gated pipeline, so that what runs in production is exactly what was reviewed and approved.
Infrastructure as code
Every resource, from Container Apps to Key Vault to private endpoints, is declared in Bicep. An environment is created by applying the templates, not by clicking through the portal, which means it is reviewable in a pull request, reproducible in a new subscription, and identical between environments save for parameters. Drift is detectable because the desired state is written down.
Environments and promotion
Changes flow through separate development, staging, and production environments, each its own set of resources with its own audit chain. Promotion from one to the next is a pipeline step with its own approval, so nothing reaches production without passing the gates before it.
CI/CD with blue-green and rollback
GitHub Actions builds and tests every change, then deploys with a blue-green strategy: the new revision runs alongside the current one and receives traffic only after health checks pass. If a problem appears, traffic shifts back to the prior revision in one step, because the prior revision is still running. Because the gateway's request path is stateless and the audit chain is append-only, a rollback affects routing without ever rewriting history.
X. Operations as a managed service
The customer-owned tenancy model raises a fair question: if the gateway runs inside the customer's subscription, who keeps it running. The answer is that Ryshe operates it there, as a managed service that works inside the customer's boundary rather than pulling data out of it. The customer owns the subscription, the keys, and the data; Ryshe owns the burden of running the software well.
Operating inside the customer tenant
Ryshe operators reach the gateway through the customer's Entra ID with scoped, audited roles, the same identity plane the customer's own staff use. Operational actions are subject to the same RBAC and the same append-only log as any other privileged action, so the customer can see exactly what was done on their behalf. Nothing about managed operation requires customer data to leave the boundary.
Service-level objectives
The managed service is held to published objectives so that 'managed' is a measurable commitment rather than a marketing word.
These objectives are illustrative of the operating posture and are set per engagement. The point is that managed operation is governed by the same instruments as the rest of the architecture: identity, RBAC, and an audit log that the customer, not Ryshe, ultimately controls.
XI. Compliance alignment
The architecture is built to map cleanly onto the control families that regulated enterprises are already audited against, so that adopting the gateway extends an existing compliance posture rather than opening a new gap to be assessed. The mapping below connects common frameworks to the gateway capabilities that satisfy them.
SOC 2: the access controls (Entra ID RBAC), change management (Bicep plus gated CI/CD), and monitoring (App Insights plus the audit log) speak directly to the Security and Availability trust criteria. ISO/IEC 27001: identity, secrets in Key Vault, network isolation, and logging map to access control, cryptography, operations security, and communications security control sets. HIPAA-adjacent: edge redaction, least-capture defaults, enforced retention and purge, encryption in transit and at rest, and an append-only access record support the technical safeguards expected for sensitive data, even where the deployment is not itself a covered system. In every case the gateway supplies evidence, the hash-chained log, that auditors can verify rather than take on trust.
No architecture grants compliance on its own; compliance is a property of the whole system and its operation. What the gateway provides is alignment by construction: its controls are the same controls these frameworks ask for, instrumented in a way that produces the evidence an audit needs.
XII. Conclusion
A context gateway earns its place by being the one component that sees every AI request, and that same position is why it carries the heaviest trust requirement in the stack. The architecture in this paper meets that requirement by refusing to relax four goals: control over what is sent, privacy of the context, auditability of every decision, and no unnecessary movement of data. Each Azure-native choice, each tenancy option, and each operational practice is downstream of those goals. The outcome is a gateway an enterprise can own, run in its own subscription under its own keys, and prove the behavior of, on every request. Secure context control is not a feature added at the end. It is the shape of the architecture from the first decision.
References
- [1]Microsoft, Azure Architecture Center and the Azure Well-Architected Framework.
- [2]NIST, Special Publication 800-53: Security and Privacy Controls for Information Systems and Organizations.
- [3]ISO/IEC 27001, Information security, cybersecurity and privacy protection: Information security management systems.
- [4]OWASP, Top 10 for Large Language Model Applications.
- [5]Center for Internet Security, CIS Benchmarks for Microsoft Azure.

