Spec Review Assistant
Gateway live
Spec Review Assistantv1.4 · in production 56 days

Executive Dashboard

How this AI workflow is performing, and how much it has improved since launch.

Review accuracy
1.2 pts
94.1%
12.9 pts since launch
Cost / review
2.2%
$0.0089
$0.0103 at launch
p95 latency
70.0ms
1,840ms
per spec clause
Reviewer hours saved / wk
9.4%
142 hrs
≈ $14.2k labor / wk

Improvement over time

Review accuracy by week. Markers show versions promoted into production.

+12.9 pts
v1.0Apr 22v1.1Apr 29v1.2May 13v1.3May 27v1.4Jun 10
This week

28 reviewer corrections became evaluation data and drove +1.2 pts of accuracy.

The loop is closing on its own, every correction your team makes is permanent, tested, and shipped.

Failure rate

5.9%
2.3 pts

Down from 18.8% at launch.

Where the AI still struggles

Accuracy by spec division, the loop targets the weakest slices first.

Recent activity

M. Okafor version promoted
Promoted v1.4 to production after eval pass (+2.3 pts).
5d ago
M. Okafor review corrected
Corrected fire-rating verdict; added to eval set.
6d ago
system failure detected
LLM-judge flagged wrong_standard (groundedness 0.41).
38m ago
D. Vance data reindex
Re-indexed standards library; dropped superseded editions.
6h ago
M. Okafor version promoted
Promoted citation guardrail.
2w ago
a.ryan@northforge.com access login
Entra ID SSO sign-in from 10.4.x (East US 2).
2h ago

Trend and historical metrics on this dashboard are illustrative sample data. Live actions, spec-review runs and the eval benchmark, are genuinely measured when a model key is connected (see Improvements → Test against benchmark).