Linux time chaos
Linux time chaos is a chaos fault that skews the system clock on the target Linux machine by OFFSET for DURATION, then restores the original time. When DISABLE_NTP is true, NTP synchronization is paused before the skew so the clock does not snap back; NTP is re-enabled during revert. The fault runs through the Linux Chaos Infrastructure (LCI) systemd service installed on the target VM.
Use this fault to test how a workload behaves when time jumps unexpectedly: whether short-lived TLS certificates are rejected, whether HMAC/JWT validation surfaces clean errors, whether scheduled jobs misfire, whether database transactions and replication tolerate the skew, and whether monitoring detects the drift within the alerting SLA.
If you have not installed the Linux Chaos Infrastructure yet, go to Linux Chaos Infrastructure to install the agent and connect the VM to the control plane.
Use cases
Run this fault when you want to answer concrete questions like:
- Certificate validation: When the clock jumps forward by
OFFSET, do TLS certificates that just expired fail cleanly? - JWT/HMAC tolerance: Are JWTs or HMAC tokens rejected for
iat/expskew, and does the application surface a clear error? - Scheduled jobs: Do cron, systemd timers, or quartz schedulers double-fire (or skip) when time jumps?
- Database safety: Does the database/replication layer tolerate the time skew, or does it abort transactions with "clock went backwards" errors?
Prerequisites
- Linux Chaos Infrastructure installed: The
linux-chaos-infrastructuresystemd service isactiveon the target VM and the infrastructure is inCONNECTEDstate. Go to Linux Chaos Infrastructure to install it. - NTP daemon controllable: When
DISABLE_NTP=true, the fault stops and restartssystemd-timesyncd,chronyd, orntpd. Confirm one of these is the active time sync daemon withtimedatectl status.
Supported environments
The fault has been tested on the following Linux distributions. Go to Linux fault requirements to see the full compatibility matrix.
| Platform | Support status |
|---|---|
| Ubuntu 16+, Debian 10+ | Supported |
| CentOS 7+, RHEL 7+, Fedora 30+ | Supported |
| openSUSE LEAP 15.4+ / SUSE Linux Enterprise 15+ | Supported |
Permissions required
This fault is classified as an Advanced Linux fault. It requires the Linux Chaos Infrastructure systemd service to run with the root user and root user group on the target VM so it can call date (or clock_settime) and manage the NTP daemon. No cloud credentials are needed.
Fault tunables
Configure the following fault parameters when you add Linux time chaos to an experiment in Chaos Studio. Defaults are shown for reference.
Chaos parameters
| Tunable | Description | Default |
|---|---|---|
DURATION | Total duration of the fault. Accepts [hours]h[minutes]m[seconds]s format. | 30s |
OFFSET | Signed time offset to apply (for example, +24h, -30m, +5m). | +24h |
DISABLE_NTP | Stop the NTP daemon before applying the skew (so NTP does not snap the clock back) and re-enable it during revert. | true |
RAMP_TIME | Wait period in seconds before and after the fault. Go to ramp time to read how it is applied. | 0 |
Tunables that apply to every fault are documented in common tunables for all faults.
Fault execution in brief
(Optionally) stops the NTP daemon, sets the system clock to now + OFFSET, holds for DURATION, restores the original time, and re-enables the NTP daemon.
Expected behavior during fault execution
- The system clock reports a time offset by
OFFSETfrom real time for the duration of the fault. dateandtimedatectlreflect the skewed time.- Applications that read the system clock (TLS, JWT, scheduled jobs, distributed coordination) see the skewed time.
- After the duration ends, the clock is set back to the (NTP-corrected or pre-fault) real time, and NTP resumes if it was disabled.
The chaos pod restores the original time and re-enables the NTP daemon. NTP-driven re-synchronization may take a few seconds to converge.
Signals to watch
Attach resilience probes to assert each layer:
- System clock: Use a command probe running
date -uon the target VM and assert the value is offset byOFFSET. - TLS errors: Use a Prometheus probe on TLS-error counters for clients that hit certificate-expiry boundaries.
- End-to-end availability: Use an HTTP probe on a user-visible endpoint that depends on time (token validation, scheduling).
Verify the fault execution effect
-
Confirm the clock skew on the target VM.
date -utimedatectl statusdatereports a time offset byOFFSETduring the chaos window and returns to real time afterwards. -
Confirm NTP state.
timedatectl status | grep -E "synchronized|NTP service"sudo systemctl status systemd-timesyncd chronyd ntpd 2>/dev/null | grep ActiveWith
DISABLE_NTP=true, NTP is inactive during the chaos window and active afterwards. -
Inspect Linux Chaos Infrastructure logs.
sudo journalctl -u linux-chaos-infrastructure -n 100 --no-pager
Recovery and cleanup
- End of duration: The chaos pod sets the clock back and re-enables NTP when
DURATIONelapses. - Abort the experiment: Stopping the experiment from Chaos Studio triggers the same restore.
- Manual recovery: If the clock is left skewed, run
sudo timedatectl set-ntp trueand (on systemd-timesyncd)sudo systemctl restart systemd-timesyncdto force re-sync. - Workload recovery: Applications that cached the skewed time should be restarted if they hold long-lived clock-derived state (for example, JWT issuance windows).
Limitations
- Forward skew is more disruptive: A negative
OFFSETmay break monotonic-clock assumptions in some libraries and is the highest-risk variant. - NTP daemon dependency: If the active NTP daemon is none of
systemd-timesyncd,chronyd, orntpd, theDISABLE_NTPtoggle has no effect. - Single VM scope: Each fault run targets one VM.
- Snap-back risk: With
DISABLE_NTP=false, NTP may correct the skew within seconds, dampening the chaos effect. - Subsecond precision:
OFFSETis applied viadate/clock_settime; subsecond drift relative to other machines is not controlled.
Troubleshooting
Linux time chaos fault did not skew the clock in Harness Chaos Engineering
If DISABLE_NTP=false, the NTP daemon may have snapped the clock back within seconds. Set DISABLE_NTP=true and re-run. Confirm with date -u that the clock is skewed during the chaos window.
NTP did not re-enable after the experiment
The fault restarts the NTP daemon during revert. If NTP is still off, enable it manually with sudo timedatectl set-ntp true and sudo systemctl start systemd-timesyncd (or chronyd/ntpd, whichever your distro uses).
Application kept seeing the old time after the experiment
Some applications cache clock readings or derive deadlines from a wall clock at startup. Restart the application after the experiment to flush cached time-based state.
Related faults
- Linux service restart: Restart the application to flush time-derived state.
- Linux DNS spoof: Redirect dependencies instead of skewing time.
- Linux network latency: Add network latency instead of skewing time.