Flaky tests are reliability debt that compounds silently across every merge. Treat them like production incidents: capture a signature, reproduce deterministically, classify the failure mode, then stabilize with clear ownership before the team normalizes random red builds as background noise.
Extract the Exact Failing Signature
# In CI logs, capture:
# - test name
# - stack trace
# - worker/shard
# - runtime and seed (if available)
No signature, no diagnosis.
Reproduce Under CI-Like Constraints
CI=1 TZ=UTC pnpm test -- --runInBand
CI=1 TZ=UTC pnpm test -- --repeatEach=20
Match environment and reduce nondeterminism.
Classify Failure Type
- Timing/race: missing awaits, leaking async work
- State bleed: shared global/module state
- External dependency: clock/network/file-system assumptions
- Order dependency: tests pass only in a specific sequence
Stabilize, Then Harden
# Prefer deterministic setup/teardown
# Replace sleep-based waits with explicit polling/assertions
If a test is critical and still flaky, quarantine it temporarily with a tracked issue and owner.
Add Guardrails
- Run failed suite repeatedly in CI nightly.
- Alert when flake rate crosses threshold.
- Fail PRs on newly introduced flaky signatures.