Pod network partition
Pod network partition is a Kubernetes pod-level chaos fault that isolates a target pod by creating a temporary NetworkPolicy that blocks ingress, egress, or both for a configurable duration. The partition is enforced by the cluster's CNI at the dataplane, so this fault has lighter prerequisites than the other pod network faults — it works wherever the cluster CNI supports NetworkPolicy (Calico, Cilium, AWS VPC CNI in policy mode, Azure CNI, etc.). When the fault ends, the policy is deleted and connectivity is restored immediately.
Use this fault to test how a service behaves when it is fully cut off from a specific peer or dependency: a network partition between leader and follower, a failed gateway, a split namespace, or an annotated dependency you want to isolate without changing application code.
If you have not configured the chaos infrastructure yet, go to Quickstart to install the chaos infrastructure and run an experiment end to end.
Use cases
Run this fault when you want to answer concrete questions like:
- Leader-follower partition: Cut off the leader of a stateful service from its followers. Does the follower set elect a new leader within the configured timeout? Does the original leader step down cleanly when the partition heals?
- Cross-namespace isolation: Block one workload from a specific peer namespace and confirm the workload either fails fast or degrades to a documented fallback (cache, default value, error response).
- Dependency cutoff: Isolate a workload from its database, message bus, or external API and confirm the circuit breaker engages, the cache serves stale data, and callers are not exposed to long timeouts.
- Probe and ejection behavior: Does the kubelet mark the pod
NotReadybecause its outbound health check fails? Does the service mesh outlier-detection eject the pod from upstream pools within the SLO?
Prerequisites
- Kubernetes version: 1.21 or later. Go to What's supported to confirm distribution support.
- CNI with NetworkPolicy enforcement: The cluster's CNI plugin must enforce
NetworkPolicy. Plain Flannel and the default AWS VPC CNI in non-policy mode do not. Verify withkubectl api-resources | grep networkpolicy. - Target pods are Running: The application pods you intend to target are in the
Runningstate before the fault is launched. - Workload selector defined: The chaos experiment knows the target workload (
Deployment,StatefulSet, etc.) by kind, namespace, and either names or labels.
Supported environments
| Platform | Support status |
|---|---|
| Amazon EKS (with Calico, Cilium, or VPC CNI in policy mode) | Supported |
| Azure AKS | Supported |
| Google GKE | Supported |
| Red Hat OpenShift | Supported |
| Rancher | Supported |
| VMware Tanzu | Supported |
| Self-managed Kubernetes with policy-enforcing CNI | Supported |
| GKE Autopilot | Supported with Autopilot setup |
| Clusters with no NetworkPolicy enforcement | Not supported (the policy is created but has no effect) |
Permissions required
The fault runs under the chaos infrastructure's service account.
Resource (apiGroup) | Verbs | Why it is needed |
|---|---|---|
networkpolicies (networking.k8s.io) | get, list, create, delete, patch | Create and delete the chaos NetworkPolicy in the target namespace |
pods ("") | get, list | Discover target pods and confirm they are Running |
deployments, statefulsets, replicasets, daemonsets (apps) | get, list | Resolve the target workload to the pods it owns |
events ("") | get, list, create, patch, update | Record fault progress as Kubernetes events |
jobs (batch) | get, list, create, delete, deletecollection | Run the chaos job that drives the fault |
The default Harness chaos infrastructure service account already includes these permissions.
Fault tunables
Configure the following fault parameters when you add Pod network partition to an experiment in Chaos Studio. Defaults are shown for reference.
Chaos parameters
| Tunable | Description | Default |
|---|---|---|
POLICY_TYPES | Which traffic direction to block. One of ingress, egress, or all. | all |
TOTAL_CHAOS_DURATION | Duration of the fault in seconds. | 60 |
Partition scope (what is blocked)
If you set none of the following, the partition blocks all traffic in the chosen direction. Combine them to narrow the partition.
| Tunable | Description | Default |
|---|---|---|
DESTINATION_IPS | Comma-separated CIDRs to block (for example, 10.0.0.0/16,192.168.1.0/24). | "" |
DESTINATION_HOSTS | Comma-separated hostnames to block. The helper resolves them and adds the resolved IPs to the policy. | "" |
POD_SELECTOR | Label selector matching peer pods to block, in key=value,key=value form. | "" |
NAMESPACE_SELECTOR | Label selector matching peer namespaces to block, in key=value,key=value form. | "" |
PORTS | Comma-separated ports (or port/protocol pairs) the policy should restrict to. | "" |
Runtime and helper
| Tunable | Description | Default |
|---|---|---|
RAMP_TIME | Wait period in seconds before and after the fault. Go to ramp time to read how it is applied. | 0 |
Tunables that apply to every chaos fault are documented in common tunables for all faults.
With no DESTINATION_*, POD_SELECTOR, or NAMESPACE_SELECTOR set, the chaos NetworkPolicy denies all traffic in the chosen direction, including DNS to kube-system. If you only want to cut off a single dependency, always set at least one scope tunable.
Fault execution in brief
Creates a Kubernetes NetworkPolicy in the target pod's namespace that denies ingress, egress, or both for the configured duration, optionally scoped to specific destinations, namespaces, pod selectors, or ports so other traffic remains unaffected.
Expected behavior during fault execution
- TCP connections in the blocked direction stop progressing. Existing sockets time out per their
tcp_user_timeoutand OS keepalive settings (often 5 to 15 minutes). New connection attempts fail immediately withconnection refusedorhost unreachable. - UDP traffic (DNS, QUIC) drops silently. If you blocked egress without explicitly allowing DNS, name resolution fails.
- The target pod can still reach anything not covered by the policy. With
POD_SELECTOR=app=foo, only pods with that label are blocked. - Health probes from the kubelet still work (kubelet talks to the pod over the node's network, which is not subject to pod-level
NetworkPolicy). - Service meshes that use sidecar-to-sidecar TCP detect the failures and may eject the pod from upstream pools.
The NetworkPolicy is deleted, and connectivity is restored as fast as the CNI can reconcile (typically within one second). TCP connections that timed out during the partition do not automatically reconnect; the application must handle that itself.
Signals to watch
Attach resilience probes to assert each layer:
- Application failover: Use an HTTP probe against the dependency to confirm the partition is in effect, and a second HTTP probe against the application's user-facing endpoint to confirm graceful degradation.
- Cluster events: Use a Kubernetes probe to confirm the
NetworkPolicywas created and deleted as expected. - Mesh ejection: Use a Prometheus probe on Istio/Envoy
cluster.outlier_detection.ejections_activeor your mesh's equivalent.
Verify the fault execution effect
While the experiment is running, confirm that the partition is actually in effect:
-
Inspect the chaos NetworkPolicy.
kubectl get networkpolicy -n <namespace> -l harness.io/chaos=pod-network-partition -o yamlYou should see the policy with
policyTypesand the rules you configured. -
Test connectivity from the target pod.
kubectl exec -n <namespace> <pod-name> -- nc -zv -w 2 <blocked-host> <port>The connection should fail. For an unblocked destination, the same command should succeed.
Recovery and cleanup
-
End of duration: The chaos
NetworkPolicyis deleted automatically. The CNI reconciles within seconds and connectivity returns. -
Abort the experiment: Stopping the experiment from Chaos Studio triggers the same cleanup path.
-
Failed cleanup: If automated cleanup did not complete, delete the chaos NetworkPolicy manually:
kubectl delete networkpolicy -n <namespace> -l harness.io/chaos=pod-network-partition -
Stuck connections: TCP sockets that timed out during the partition stay closed; the application reconnects on its own schedule.
Limitations
This fault is not appropriate in the following scenarios:
- CNI plugins without NetworkPolicy enforcement: Plain Flannel, default AWS VPC CNI (without policy mode), and a few others do not enforce policies. The fault creates the policy but it has no effect.
hostNetworkpods: Pods using the host network namespace are not subject to namespacedNetworkPolicy. The fault refuses to inject onhostNetwork: truepods.- Cluster-scoped resources: This fault works at the pod level. To partition a node from the rest of the cluster, use Node network loss with
NETWORK_PACKET_LOSS_PERCENTAGE: 100. - kubelet-to-pod traffic:
NetworkPolicydoes not apply to kubelet probe traffic in most CNI implementations, so liveness and readiness probes still work even under "block all" rules. To break probes, target a different layer.
Troubleshooting
Pod network partition created but the target pod can still reach the blocked destination
The cluster CNI does not enforce NetworkPolicy. Verify with kubectl describe pod -n kube-system -l k8s-app=<cni> or your CNI's docs. Plain Flannel and the default AWS VPC CNI in non-policy mode do not enforce policies. Switch to Calico, Cilium, or enable AWS VPC CNI policy mode.
Pod network partition blocks more traffic than expected in Harness Chaos Engineering
With no DESTINATION_*, POD_SELECTOR, or NAMESPACE_SELECTOR set, the chaos NetworkPolicy denies all traffic in the chosen direction, including DNS to kube-system. Either set scope tunables to narrow the partition, or add an explicit allow rule outside this fault for DNS (UDP/53 to kube-system pods).
NetworkPolicy not deleted after pod-network-partition ends
Automated cleanup did not complete. Delete the chaos NetworkPolicy manually with kubectl delete networkpolicy -n <namespace> -l harness.io/chaos=pod-network-partition. Verify connectivity returns with a test from inside the target pod.
Permission denied creating NetworkPolicy for pod-network-partition
The chaos service account lacks the create verb on networkpolicies in the networking.k8s.io group for the target namespace. Verify with kubectl auth can-i create networkpolicies --as=system:serviceaccount:<chaos-ns>:<sa-name> -n <target-ns>. Grant the missing role binding.
Related faults
- Pod network loss: Drop a percentage of packets without creating a NetworkPolicy. Useful when the CNI does not enforce policies.
- Pod network latency: Add delay rather than blocking traffic.
- Node network loss: Partition an entire node from the rest of the cluster.
- Common pod fault tunables: Shared environment variables for selecting target pods and workloads.