May 12, 20266 minAI Engineering
⚡
Shipping AI Products That Don't Break in Prod
The notebook lies
Every model looks brilliant in a notebook. The cold truth shows up the moment a real user types something you didn't anticipate, and the eval set you spent two weeks curating turns out to cover 30% of reality.
"
If it isn't observable, it isn't shipped.
— an SRE, probably
350ms
p99 latency budget
~40%
Eval coverage gap
$0.18
Cost / 1k req
Boring infra that saved us
- →Per-request token + cost logging
- →Shadow eval on a sample of live traffic
- →Hard timeouts, soft fallbacks
- →Prompt versioning in git, not Notion
ts
// always wrap the model call
const out = await withTimeout(
llm.invoke(prompt),
{ ms: 8000, fallback: cachedAnswer },
);🧭
Build the dashboard before the demo. You will need it on day three, not day thirty.