May 12, 20266 minAI Engineering

Shipping AI Products That Don't Break in Prod

The notebook lies

Every model looks brilliant in a notebook. The cold truth shows up the moment a real user types something you didn't anticipate, and the eval set you spent two weeks curating turns out to cover 30% of reality.

"

If it isn't observable, it isn't shipped.

an SRE, probably

350ms

p99 latency budget

~40%

Eval coverage gap

$0.18

Cost / 1k req

Boring infra that saved us

  • Per-request token + cost logging
  • Shadow eval on a sample of live traffic
  • Hard timeouts, soft fallbacks
  • Prompt versioning in git, not Notion
ts
// always wrap the model call
const out = await withTimeout(
  llm.invoke(prompt),
  { ms: 8000, fallback: cachedAnswer },
);
🧭

Build the dashboard before the demo. You will need it on day three, not day thirty.