GitHub Actions Flaky Test Isolation Quick Snip | Naseebullah (Senior Software Engineer)

Flaky tests are reliability debt that compounds silently across every merge. Treat them like production incidents: capture a signature, reproduce deterministically, classify the failure mode, then stabilize with clear ownership before the team normalizes random red builds as background noise.

Extract the Exact Failing Signature

# In CI logs, capture:
# - test name
# - stack trace
# - worker/shard
# - runtime and seed (if available)

No signature, no diagnosis.

Reproduce Under CI-Like Constraints

CI=1 TZ=UTC pnpm test -- --runInBand
CI=1 TZ=UTC pnpm test -- --repeatEach=20

Match environment and reduce nondeterminism.

Classify Failure Type

Timing/race: missing awaits, leaking async work
State bleed: shared global/module state
External dependency: clock/network/file-system assumptions
Order dependency: tests pass only in a specific sequence

Stabilize, Then Harden

# Prefer deterministic setup/teardown
# Replace sleep-based waits with explicit polling/assertions

If a test is critical and still flaky, quarantine it temporarily with a tracked issue and owner.

Add Guardrails

Run failed suite repeatedly in CI nightly.
Alert when flake rate crosses threshold.
Fail PRs on newly introduced flaky signatures.