Skip to main content

Flaky Test Detection

A flaky test is one that passes and fails inconsistently - without any code changes. Harness automatically detects these tests and marks them with a FLAKY badge so your team knows which tests are unreliable.

How Detection Works

Harness automatically detects flaky tests by analyzing test results across pipeline executions. A test is identified by its suite name + class name + repository + test name.

Detection Criteria

A test is marked flaky when both conditions are met:

  1. Same-commit inconsistency: The test has both passed AND failed on the same commit
  2. Within the observation window: The inconsistency occurred within the last 14 days

Example: If a build for commit abc123 ran twice - once where test passed and once where it failed - the test is marked flaky.

Why 14 days?

The 14-day window defines how far back Harness looks when detecting flaky tests. Test behavior older than 14 days is ignored. This ensures flaky detection reflects recent, relevant behavior—not issues that were fixed weeks ago.

Auto-Recovery

Flaky status is automatically cleared in two ways:

  1. 5 consecutive passes: After 5 successful runs with no flaky occurrences, the status clears immediately
  2. Time expiry: If no same-commit inconsistency occurs for 14 days, the flaky status naturally expires

Detection Parameters

ParameterValue
Observation window14 days
Flaky triggerPass + fail on same commit
Auto-recovery5 consecutive passes
ScopePer repository

View Flaky Tests

In the Harness UI

  1. Open your pipeline execution
  2. Click the Tests tab
  3. Look for the FLAKY badge next to test names
  4. Use Filter → Flaky to show only flaky tests

Via CLI

List all flaky tests for a repository:

hcli test-management flaky get \
--account-id="$HARNESS_ACCOUNT_ID" \
--repo="https://github.com/your-org/your-repo.git" \
--api-key="$HARNESS_API_KEY" \
--endpoint="https://app.harness.io/gateway/ti-service"

Example output:

Found 6 flaky test(s):
- com.example.PaymentTest::testRefundTimeout
- com.example.ApiTest::testWebhookRetry
- tests.integration.test_api::test_concurrent_requests

Manually Mark a Test

Sometimes you know a test is flaky before automatic detection catches it. Mark it manually:

hcli test-management flaky set \
--account-id="$HARNESS_ACCOUNT_ID" \
--repo="https://github.com/your-org/your-repo.git" \
--api-key="$HARNESS_API_KEY" \
--endpoint="https://app.harness.io/gateway/ti-service" \
--class-name="com.example.PaymentTest" \
--test-name="testRefundTimeout" \
--marking=true

Marking Options

--marking valueEffect
trueForce mark as flaky (overrides auto-detection)
falseForce mark as stable (overrides auto-detection)
unsetRemove manual marking, let auto-detection decide

Flaky vs Quarantine

FlakyQuarantine
Test runs?YesYes
Failure blocks pipeline?Yes (unless quarantined)No
PurposeTrack unreliable testsUnblock deployments
RecoveryAutomatic (5 passes)Manual removal or automatic via policies

A test can be both flaky AND quarantined. When quarantined, the flaky test runs but doesn't block the pipeline.

Common Causes of Flaky Tests

CauseExampleFix
Race conditionsTest depends on thread timingAdd synchronization or waits
External servicesTest calls real APIsMock external dependencies
Shared stateTests don't clean up after themselvesIsolate test data
Time sensitivityTest checks "now" vs. expected timeUse fixed test clocks
Resource contentionTests compete for ports/filesUse unique resources per test

Automate with Policies

Instead of manually managing flaky tests, use policies to automate quarantine:

[
{
"when": ["test is flaky", "test failed"],
"action": ["mark quarantine"]
}
]

This policy automatically quarantines any flaky test that fails, preventing it from blocking your pipeline while still tracking its status.

Next Steps