Spec Review Assistantv1.4 · in production 56 days

Executive Dashboard

How this AI workflow is performing, and how much it has improved since launch.

Review accuracy

▲1.2 pts

94.1%

12.9 pts since launch

Cost / review

▼2.2%

$0.0089

$0.0103 at launch

p95 latency

▼70.0ms

1,840ms

per spec clause

Reviewer hours saved / wk

▲9.4%

142 hrs

≈ $14.2k labor / wk

Improvement over time

Review accuracy by week. Markers show versions promoted into production.

+12.9 pts

v1.0Apr 22v1.1Apr 29v1.2May 13v1.3May 27v1.4Jun 10

This week

28 reviewer corrections became evaluation data and drove +1.2 pts of accuracy.

The loop is closing on its own, every correction your team makes is permanent, tested, and shipped.

Failure rate

5.9%

▼2.3 pts

Down from 18.8% at launch.

Where the AI still struggles

Accuracy by spec division, the loop targets the weakest slices first.

Recent activity

Audit log

M. Okafor version promoted

Promoted v1.4 to production after eval pass (+2.3 pts).

5d ago

M. Okafor review corrected

Corrected fire-rating verdict; added to eval set.

6d ago

system failure detected

LLM-judge flagged wrong_standard (groundedness 0.41).

38m ago

D. Vance data reindex

Re-indexed standards library; dropped superseded editions.

6h ago

M. Okafor version promoted

Promoted citation guardrail.

2w ago

a.ryan@northforge.com access login

Entra ID SSO sign-in from 10.4.x (East US 2).

2h ago

Trend and historical metrics on this dashboard are illustrative sample data. Live actions, spec-review runs and the eval benchmark, are genuinely measured when a model key is connected (see Improvements → Test against benchmark).