Spec Review Assistant
Gateway live

Improvement Recommendations

Evidence-backed changes derived from real failures and corrections. Every suggestion is tested against your eval set before it can ship, no guesswork.

Add fire-rating edition check from reviewer corrections

Derived from 7 reviewer corrections · fire-rating + citation-edition slices

Prompthigh confidence

5 of the last 9 high-severity failures are Division 07 fire-rating clauses where the model cited a superseded ASTM/UL edition. Reviewer corrections converge on the same fix: confirm the hourly rating and require the current test-method edition.

− Current

For fire-rating clauses, confirm the assembly is fire-rated and cite the relevant standard.

+ Proposed

For fire-rating clauses, you MUST (1) confirm the required hourly rating (F-rating and T-rating) is explicitly met, and (2) cite only the current edition of the test method retrieved from the index (e.g. ASTM E814-21, UL 1479). If the retrieved edition is superseded, return needs_review.

Projected impact (untested)
verdict accuracy+3.2 pts
Test before applying, Quanta won't ship an unproven change.

Boost recency weighting on standards index

Observed in 4 flagged logs this week · wrong_standard + hallucinated_citation

Retrievalmedium confidence

Hallucinated and wrong-edition citations correlate with the index returning multiple editions of the same standard. Applying a recency boost so the current edition ranks first should reduce wrong-edition citations.

− Current

Hybrid search, top-k 6, no recency weighting.

+ Proposed

Hybrid search, top-k 6, recency boost on edition metadata, dedupe superseded editions.

Projected impact (untested)
verdict accuracy+1.8 pts
Test before applying, Quanta won't ship an unproven change.

Route low-groundedness clauses to gpt-5.5, simple clauses to gpt-5.4-mini

Cost model over 1,240 logs · no accuracy regression in shadow eval

Routingmedium confidence

62% of clauses are straightforward and pass on the cheaper model. Reserve the stronger model for clauses scoring below 0.7 groundedness on first pass. Projected 31% cost reduction at equal accuracy.

− Current

All requests → gpt-5.5.

+ Proposed

First pass → gpt-5.4-mini; escalate to gpt-5.5 when groundedness < 0.7 or verdict = needs_review.

Projected impact (untested)
verdict accuracy+0.1 pts
Test before applying, Quanta won't ship an unproven change.