Pod JVM Kafka Latency
Last updated on
Pod JVM Kafka Latency fault simulates latency in Kafka producer/consumer operations by introducing delays for Kafka operations executed by the Java process running inside a Kubernetes pod. This helps test the application's behavior and resilience against Kafka performance degradation.
tip
JVM chaos faults use the Byteman utility to inject chaos faults into the JVM.
Use cases
Pod JVM Kafka latency:
- Validate the application's resilience by simulating Kafka latency to ensure it can handle slow message processing, implement proper timeouts, and maintain functionality under degraded Kafka performance.
- Assess if the monitoring systems and alerting mechanisms can accurately detect and report Kafka performance degradation in real-time.
- Test timeout configurations and retry mechanisms when Kafka operations are slow.
- Verify that the application maintains acceptable performance levels when Kafka operations experience latency.
- Validate queue management and backpressure handling when message processing is delayed.
Permissions required
Below is a sample Kubernetes role that defines the permissions required to execute the fault.
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: hce
name: pod-jvm-kafka-latency
spec:
definition:
scope: Namespaced
permissions:
- apiGroups: [""]
resources: ["pods"]
verbs: ["create", "delete", "get", "list", "patch", "deletecollection", "update"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "get", "list", "patch", "update"]
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["deployments, statefulsets"]
verbs: ["get", "list"]
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["create", "delete", "get", "list", "deletecollection"]
Java requirements
This fault requires the following Java-specific prerequisites:
- The Java process must allow agent attachment (Attach API must be available).
- Utilities like
ps,pgrep, andbashmust be available in the target container. - File permissions must allow the JVM to read and execute agent files.
- Agent attachment must not be restricted by user or security context configurations.
- The target container image must not use a restricted/minimal Java runtime that removes attach-related modules.
Supported environments
| Platform | Support Status |
|---|---|
| GKE (Google Kubernetes Engine) | ✅ Supported |
| EKS (Amazon Elastic Kubernetes Service) | ✅ Supported |
| AKS (Azure Kubernetes Service) | ✅ Supported |
| GKE Autopilot | ✅ Supported |
| Self-managed Kubernetes | ✅ Supported |
Mandatory tunables
| Tunable | Description | Notes |
|---|---|---|
| KAFKA_MODE | The Kafka operation mode to target (producer or consumer). | Supported values: producer, consumer. For more information, go to Parameters |
| KAFKA_TOPIC | The name of the Kafka topic to be targeted. | For more information, go to Parameters |
| LATENCY | The latency (in milliseconds) to inject into Kafka operations. | For example, 2000 (for 2 seconds). For more information, go to Parameters |
Optional tunables
| Tunable | Description | Notes |
|---|---|---|
| TOTAL_CHAOS_DURATION | Duration through which chaos is injected into the target resource. Should be provided in [numeric-hours]h[numeric-minutes]m[numeric-seconds]s format. | Default: 30s. Examples: 1m25s, 1h3m2s, 1h3s. For more information, go to duration of the chaos. |
| TRANSACTION_PERCENTAGE | The percentage of total Kafka operations to be targeted. | Supports percentage in (0.00,1.00] range. If not provided, it targets all Kafka operations. For more information, go to Parameters |
| POD_AFFECTED_PERCENTAGE | Percentage of total pods to target. Provide numeric values. | Default: 0 (corresponds to 1 replica). For more information, go to pods affected percentage. |
| JAVA_HOME | Path to the Java installation directory. | For example, /tmp/dir/jdk. |
| BYTEMAN_PORT | Port used by the Byteman agent. | Default: 9091. |
| CONTAINER_RUNTIME | Container runtime interface for the cluster. | Default: containerd. Support values: docker, containerd and crio. For more information, go to container runtime. |
| SOCKET_PATH | Path of the containerd or crio or docker socket file. | Default: /run/containerd/containerd.sock. For more information, go to socket path. |
| RAMP_TIME | Period to wait before and after injecting chaos. Should be provided in [numeric-hours]h[numeric-minutes]m[numeric-seconds]s format. | Default: 0s. Examples: 1m25s, 1h3m2s, 1h3s. For more information, go to ramp time. |
| SEQUENCE | Sequence of chaos execution for multiple target pods. | Default: parallel. Supports serial and parallel. For more information, go to sequence of chaos execution. |
| TARGET_CONTAINER | Name of the container subject to Kafka latency injection. | None. For more information, go to target specific container |
| TARGET_PODS | Comma-separated list of application pod names subject to pod JVM Kafka latency. | If not provided, the fault selects target pods randomly based on provided appLabels. For more information, go to target specific pods. |
| NODE_LABEL | It filters the target pods that are scheduled on nodes matching the specified NODE_LABEL. | For more information, go to node label. |
| LIB_IMAGE | Image used to inject chaos. | Default: harness/chaos-ddcr-faults:1.72.0. For more information, go to image used by the helper pod. |
Parameters
The following YAML snippet illustrates the use of these tunables:
kind: KubernetesChaosExperiment
apiVersion: litmuschaos.io/v1alpha1
metadata:
name: pod-jvm-kafka-latency
namespace: hce
spec:
tasks:
- definition:
chaos:
env:
- name: TOTAL_CHAOS_DURATION
value: "60"
# Kafka mode: producer or consumer
- name: KAFKA_MODE
value: "producer"
# name of the Kafka topic to be targeted
- name: KAFKA_TOPIC
value: "orders"
# latency in milliseconds
- name: LATENCY
value: "2000"
# provide the transaction percentage
- name: TRANSACTION_PERCENTAGE
value: "50"
# provide the Byteman port
- name: BYTEMAN_PORT
value: "9091"