Flaky Test Detection
A flaky test is one that passes and fails inconsistently - without any code changes. Harness automatically detects these tests and marks them with a FLAKY badge so your team knows which tests are unreliable.
How Detection Works
Harness automatically detects flaky tests by analyzing test results across pipeline executions. A test is identified by its suite name + class name + repository + test name.
Detection Criteria
A test is marked flaky when both conditions are met:
- Same-commit inconsistency: The test has both passed AND failed on the same commit
- Within the observation window: The inconsistency occurred within the last 14 days
Example: If a build for commit abc123 ran twice - once where test passed and once where it failed - the test is marked flaky.
The 14-day window defines how far back Harness looks when detecting flaky tests. Test behavior older than 14 days is ignored. This ensures flaky detection reflects recent, relevant behavior—not issues that were fixed weeks ago.
Auto-Recovery
Flaky status is automatically cleared in two ways:
- 5 consecutive passes: After 5 successful runs with no flaky occurrences, the status clears immediately
- Time expiry: If no same-commit inconsistency occurs for 14 days, the flaky status naturally expires
Detection Parameters
| Parameter | Value |
|---|---|
| Observation window | 14 days |
| Flaky trigger | Pass + fail on same commit |
| Auto-recovery | 5 consecutive passes |
| Scope | Per repository |
View Flaky Tests
In the Harness UI
- Open your pipeline execution
- Click the Tests tab
- Look for the FLAKY badge next to test names
- Use Filter → Flaky to show only flaky tests
Via CLI
List all flaky tests for a repository:
hcli test-management flaky get \
--account-id="$HARNESS_ACCOUNT_ID" \
--repo="https://github.com/your-org/your-repo.git" \
--api-key="$HARNESS_API_KEY" \
--endpoint="https://app.harness.io/gateway/ti-service"
Example output:
Found 6 flaky test(s):
- com.example.PaymentTest::testRefundTimeout
- com.example.ApiTest::testWebhookRetry
- tests.integration.test_api::test_concurrent_requests
Manually Mark a Test
Sometimes you know a test is flaky before automatic detection catches it. Mark it manually:
hcli test-management flaky set \
--account-id="$HARNESS_ACCOUNT_ID" \
--repo="https://github.com/your-org/your-repo.git" \
--api-key="$HARNESS_API_KEY" \
--endpoint="https://app.harness.io/gateway/ti-service" \
--class-name="com.example.PaymentTest" \
--test-name="testRefundTimeout" \
--marking=true
Marking Options
--marking value | Effect |
|---|---|
true | Force mark as flaky (overrides auto-detection) |
false | Force mark as stable (overrides auto-detection) |
unset | Remove manual marking, let auto-detection decide |
Flaky vs Quarantine
| Flaky | Quarantine | |
|---|---|---|
| Test runs? | Yes | Yes |
| Failure blocks pipeline? | Yes (unless quarantined) | No |
| Purpose | Track unreliable tests | Unblock deployments |
| Recovery | Automatic (5 passes) | Manual removal or automatic via policies |
A test can be both flaky AND quarantined. When quarantined, the flaky test runs but doesn't block the pipeline.
Common Causes of Flaky Tests
| Cause | Example | Fix |
|---|---|---|
| Race conditions | Test depends on thread timing | Add synchronization or waits |
| External services | Test calls real APIs | Mock external dependencies |
| Shared state | Tests don't clean up after themselves | Isolate test data |
| Time sensitivity | Test checks "now" vs. expected time | Use fixed test clocks |
| Resource contention | Tests compete for ports/files | Use unique resources per test |
Automate with Policies
Instead of manually managing flaky tests, use policies to automate quarantine:
[
{
"when": ["test is flaky", "test failed"],
"action": ["mark quarantine"]
}
]
This policy automatically quarantines any flaky test that fails, preventing it from blocking your pipeline while still tracking its status.
Next Steps
- Quarantine flaky tests to unblock deployments
- Set up policies to automate test management