QUANTA
by Ryshe
Research

White papers

How we think about controlling, compressing, governing, and measuring every call your enterprise sends to an LLM.

Marble head of Hermes
White Paper IJune 2026 · 18 min read

The Hidden Cost of AI Context Bloat

How oversized prompts and unmanaged context quietly inflate enterprise LLM spend, and what a gateway recovers

Enterprises pay for every token they send to a model, and most send far more than the task requires. System prompts, retrieved passages, conversation history, and tool schemas accumulate into context that is rarely audited and almost never optimized. This paper quantifies where context bloat comes from, models how it compounds as adoption scales, and shows what a gateway-level approach recovers in cost, latency, and control. It includes a worked monthly cost example, a measurement framework, and an implementation playbook.

Read paper
Marble relief of Hermes with the caduceus
White Paper IIJune 2026 · 16 min read

Why Enterprise AI Needs a Context Gateway

The missing control plane between enterprise applications and large language models

Enterprises already place gateways in front of every other critical dependency. APIs pass through an API gateway, traffic through a load balancer and firewall, identity through a single sign-on broker. Large language model traffic is the exception. Applications call models directly, with no shared point where policy can be enforced, cost attributed, context inspected, or behavior recorded. This paper defines the context gateway as the control plane that closes that gap: a single policy-enforcing path that every application points at instead of the model. It explains the four capabilities a gateway consolidates, why those capabilities cannot live in application code, the SDK, or a provider dashboard, and how an OpenAI-compatible gateway is adopted as a configuration change rather than a migration. It closes with a staged rollout, a point-solution comparison, and honest answers to the objections an architect should raise.

Read paper
Bronze statuette of Hermes
White Paper IIIJune 2026 · 17 min read

AI FinOps Beyond Token Dashboards

From counting tokens to controlling the cost of enterprise AI

Most enterprises now have a token dashboard. Very few have AI FinOps. A dashboard reports what was spent after it was spent, in an aggregate that no team can act on. AI FinOps is a discipline: it attributes spend to the workflows and customers that caused it, forecasts where the curve is heading, applies reduction levers behind a quality gate, and assigns ownership across Finance, Engineering, Governance, and Product. This paper adapts the FinOps inform, optimize, and operate cycle to large language model spend, defines cost per successful outcome as the metric that matters, and gives a 90-day rollout. It is written for the CIO and CFO who have seen the AI line item and now need a way to govern it.

Read paper
Bronze statuette of Mercury
White Paper IVJune 2026 · 16 min read

Governing Agentic Workflows Before They Scale

Why autonomous AI raises the stakes on context control and auditability

An agent is not a bigger chatbot. It is a system that turns one user action into many model calls, selects its own tools, reaches across enterprise data, and decides for itself what context goes into each request. That autonomy is exactly what makes agents useful and exactly what makes them hard to govern. This paper maps the new risk surface agents introduce, explains why per-application controls fail at agent scale, and presents a control framework that lives at the context gateway beneath every agent. It includes a phased rollout that moves an organization from observe-only to fully governed without rewriting a single agent.

Read paper
Marble herm
White Paper VJune 2026 · 19 min read

The Architecture of Secure AI Context Control

A reference architecture for an Azure-native, customer-owned AI context gateway

Every prompt an enterprise sends to a language model carries proprietary context: customer records, internal documents, system instructions, and the reasoning of agents acting on the business. A context gateway sits on that path, so it inherits the strongest trust requirement in the stack. This paper specifies a reference architecture for that gateway as an Azure-native system that can run inside the customer's own subscription, under the customer's own identity and keys, with an append-only record of everything it does. It covers design goals, placement and data flow, the Azure building blocks, tenancy models, identity and network controls, data handling, auditability, a threat model, deployment, managed operation, and compliance alignment.

Read paper
Marble head of Hermes
White Paper VIJune 2026 · 17 min read

Context Compression Without Quality Loss

Methods, measurement, and guardrails for reducing prompt context while preserving answer quality

Most prompts sent to a language model carry far more context than the task in front of it requires. Compression removes the surplus, but done carelessly it removes the answer along with the noise. This paper treats compression as a quality-constrained optimization rather than a race to the smallest prompt. It gives a taxonomy of methods, a comparison of their reductions and risks, an evaluation gate that decides what ships, an honest before-and-after measurement template, the failure modes that catch teams out, a tuning loop, a per-workload strategy, and the cases where the right amount of compression is none.

Read paper
Marble relief of Hermes
White Paper VIIJune 2026 · 18 min read

Prompt and Context Governance for Regulated Industries

Controlling what AI sees and sends in financial services, healthcare, and government

Regulated enterprises are adopting language models faster than they are governing them. The prompt is where the risk lives: it is the surface on which sensitive customer records, protected health information, and controlled government data are assembled and sent to a model that the enterprise does not own. This paper sets out the control requirements that financial services, healthcare, and government share, maps each to a gateway control that addresses it, and describes a deployment posture where regulated data never leaves the customer tenant. It covers data classification and class-based egress, redaction before any external call, and the hash-stamped audit record an examiner expects. It describes control intent rather than legal requirements, and it is not legal advice.

Read paper
Bronze statuette of Mercury
White Paper VIIIJune 2026 · 16 min read

The Enterprise LLM Observability Maturity Model

Five stages from blind spend to governed, forecasted, quality-aware AI operations

Most enterprises monitor their language model usage the way they monitored their first cloud bill: a single aggregate number that arrives late, explains nothing, and prompts a meeting. This paper presents a five-level maturity model for LLM observability, from no visibility at all to a governed practice that attributes cost and quality to the workflow that caused them and forecasts both. It defines what is visible at each level, what remains hidden, the action each level makes possible, and the concrete steps to advance. It treats observability not as charts but as the precondition for control.

Read paper

Artwork: classical sculpture of Hermes & Mercury, public domain (CC0), The Met, Open Access.