Get started with HCE
Before you begin, review the following:
Harness CE is available in two ways:
- SaaS (Software-as-a-Service)
- SMP (Self-Managed Platform)
Feature availability on HCE SaaS and SMP are on par, with minor timeline changes in the SMP feature releases.
SaaS
HCE module is provided as service that you can use by either signing up or getting invited to a specific project. HCE provides you the assistance required to manage the cluster. You can also create a project if you have the necessary permissions. The control plane (set of microservices that help the domain function) is hosted by Harness. For more information on how to use SaaS, go to SaaS documentation For a video tutorial, go to Get started with HCE.
SMP
You will need to create, manage and maintain your clusters. You will be responsible for providing permissions to projects and handling the issues associated with them.
The control plane is hosted within your domain, for example, harness.your-domain.io
.
Contact Harness Support and SMP documentation for more information.
HCE and LitmusChaos
HCE and LitmusChaos
Common capabilities of HCE and LitmusChaos
Following are the common features between Litmus and HCE:
- Scalable platform
- Declarative chaos fault architecture
- Kubernetes chaos faults
- Chaos faults as CRDs
- Chaos metrics
- Chaos hubs
- Chaos infrastructure architecture
- Chaos experiments structure
- Scheduling chaos experiments
- Resilience probes
Additional capabilities of HCE
HCE module has the following additional capabilities:
- Kubelet density chaos fault
- VMware chaos faults
- AWS chaos faults
- GCP chaos faults
- Azure chaos faults
- Linux chaos faults
- Windows chaos faults
- Cloud Foundry chaos faults
- Load chaos faults
- SSH chaos faults
- Kubernetes chaos faults
- Bring Your Own Chaos
- Teaming around ChaosHubs
- Resilience probes (Dynatrace and Datadog)
- ChaosGuard
Integrations with other Harness modules
In addition to the above features, HCE integrates with these Harness modules:
- Continuous Delivery (CD): Go to Use HCE with CD.
- Feature Flags (FF): Go to Use HCE with FF.
- Service Reliability Management: Go to Use HCE SRM.
- Execute experiments as pipelines: Go to Pipelines with Chaos experiments.
HCE free plans
Harness offers a free hosted LitmusChaos if you are trying to get started with LitmusChaos or Chaos Engineering in general. The HCE free plan is a free-forever plan that has features equivalent to LitmusChaos, and also bundles Harness platform features such as RBAC and hosted logging— all for free.
Sign up at app.harness.io to get started with a free plan for either the hosted HCE or the hosted LitmusChaos.
The HCE free plan replaces cloud.chaosnative.com. New sign ups are not being accepted at cloud.chaosnative.com. Users on that platform are advised to move to the HCE free plan before August 31, 2023. For free support on migration, contact Harness Support.
HCE versus LitmusChaos
This section describes the differences between Harness Chaos Engineering (HCE) and the open-source CNCF incubation project, LitmusChaos.
Chaos orchestration
Feature | Litmus | HCE (SaaS) |
Centralized chaos portal | ✅ | ✅ |
Support for resilience probes | ✅ | ✅ |
Chaos hubs | Public hub | Enterprise hub |
Chaos metrics to Prometheus | ✅ | ✅ |
Launch chaos experiments directly from the ChaosHub | ✅ | ✅ |
YAML-based support for chaos experiments | ✅ | ✅ |
Run chaos faults in parallel in a chaos experiment | ✅ | ✅ |
Experiment control parameters through the tag in the UI | ❌ | ✅ |
Event driven chaos injection | ✅ | ✅ (via Harness webhooks) |
Ready-to-use chaos experiment templates | ❌ | ✅ (via Harness webhooks) |
Halt an ongoing chaos experiments using the halt button | ❌ | ✅ |
BYOC (Bring Your Own Chaos) | ✅ | ✅ |
Tagging support in the UI for selecting chaos targets | ❌ | ✅ |
Chaos experiment for targeting across Kubernetes clusters | ❌ | ✅ |
Deployment modes and agents
Feature | Litmus | HCE |
SaaS | ❌ | ✅ |
On-Prem (Self-managed platform) | ✅ | ✅ |
Kubernetes native chaos agent | ❌ | ✅ |
Linux native chaos agent | ❌ | ✅ |
Windows native chaos agent | ❌ | ✅ |
Scope-based isolation for Kubernetes (Cluster and namespace modes) | ✅ | ✅ |
Chaos management - Advanced
Feature | Litmus | HCE (SaaS) |
UI support for chaos experiments CRUD | ✅ | ✅ |
Chaos experiments for multiple clusters | ❌ | ✅ (GameDays and pipelines) |
Run chaos experiments in parallel | ❌ | ✅ (GameDays and pipelines) |
Out-of-the-box chaos experiments | ❌ | ✅ |
Ready to use chaos experiment templates | ❌ | ✅ |
Export chaos experiments to ChaosHubs | ❌ | ✅ |
Schedule chaos scenarios directly from a chaos hub | ❌ | ✅ |
Chaos GameDay portal | ❌ | ✅ |
ChaosGuard | ❌ | ✅ |
Administration
Feature | Litmus | HCE (SaaS) |
Rest/GraphQL APIs | ✅ | ✅ |
Built-in user management and authentication | Basic | ✅ |
Single Sign-On (SSO) with OAuth 2.0 | ❌ | ✅ |
Single Sign-On (SSO) with SAML | ❌ | ✅ |
Provision users with Okta (SCM) | ❌ | ✅ |
Provision Azure AD Users and Groups (SCIM) | ❌ | ✅ |
Provision users and groups with OneLogin (SCIM) | ❌ | ✅ |
Multiple projects | ❌ | ✅ |
Multiple organisations | ❌ | ✅ |
Authentication and authorization
Feature | Litmus | HCE (SaaS) |
Username-based authentication | ✅ | ✅ |
LDAP provider | ❌ | ✅ |
SAML provider | ❌ | ✅ |
Public OAuth providers | ❌ | ✅ |
RBAC (Role-based access control) | ✅ | ✅ |
Chaos discovery, auto-creation, and recommendations
Feature | Litmus | HCE (SaaS) |
Auto discover the target services with relationship on Kubernetes | ❌ | ✅ |
Auto create the possible chaos experiments | ❌ | ✅ |
Recommend chaos experiments to run- Manual | ❌ | ✅ |
Recommend chaos experiments to run - Based on traffic | ❌ | ✅ |
Chaos governance
Feature | Litmus | HCE (SaaS) |
RBACs around ChaosHub | ✅ | ✅ |
RBACs around Chaos Infrastructure | ✅ | ✅ |
RBACs around Chaos Experiments CRUD | ✅ | ✅ |
RBACs around Chaos GameDays | ❌ | ✅ |
RBACs for running chaos experiments against specific targets | ❌ | ✅ |
RBACs for running chaos experiments with specific faults | ❌ | ✅ |
RBACs for running chaos experiments by specific users | ❌ | ✅ |
RBACs for running chaos experiments in a particular time window | ❌ | ✅ |
RBACs for running chaos experiments with a specific ServiceAccount | ❌ | ✅ |
Security
Feature | Litmus | HCE (SaaS) |
Two-factor authentication | ❌ | ✅ |
Support for Kubernetes local secrets | ✅ | ✅ |
Support for external secrets managers | ✅ | ✅ |
RBAC (Role Based Access Control)- Built-in roles | Basic | ✅ |
RBAC (Role Based Access Control)- Custom roles | ❌ | ✅ |
Audit trail (2 year data retention) | ❌ | ✅ |
Integrated secrets management with Harness Secrets Manager | ❌ | ✅ |
IP Address whitelist management | ❌ | ✅ |
Integrations
Feature | Litmus | HCE (SaaS) |
Integration with Harness Continuous Deployment Pipelines | ❌ | ✅ |
Integration with Jenkins Pipelines | ❌ | ✅ |
Integration with GitLab Pipelines | ❌ | ✅ |
Integration with Harness Continuous Verification | ❌ | ✅ |
Integration with Harness Feature Flags | ❌ | ✅ |
Integration with Service Reliability Management | ❌ | ✅ |
Integration with Native Resilience Probe for Harness SRM | ❌ | ✅ |
Create custom faults through SDK | ❌ | ✅ |
Install, create and orchestrate chaos through API | ❌ | ✅ |
Postman Provider for chaos orchestration | ❌ | ✅ |
Terraform Provider to install, create and orchestrate chaos | ❌ | ✅ |
APM integrations
Feature | Litmus | HCE (SaaS) |
Native resilience probe for Prometheus | ❌ | ✅ |
Native resilience probe for Datadog | ❌ | ✅ |
Native resilience probe for Dynatrace | ❌ | ✅ |
Kubernetes pod-level chaos faults
Go to Pod faults for more information.
Feature | Litmus | HCE (SaaS) |
Container kill | ✅ | ✅ |
Disk fill | ✅ | ✅ |
fs fill | ❌ | ✅ |
Pod API block | ❌ | ✅ |
Pod API latency | ❌ | ✅ |
Pod API modify body | ❌ | ✅ |
Pod API modify header | ❌ | ✅ |
Pod API status code | ❌ | ✅ |
Pod autoscaler | ✅ | ✅ |
Pod CPU hog exec | ✅ | ✅ |
Pod CPU hog | ✅ | ✅ |
Pod delete | ✅ | ✅ |
Pod DNS error | ✅ | ✅ |
Pod DNS spoof | ✅ | ✅ |
Pod HTTP reset peer | ✅ | ✅ |
Pod HTTP status code | ✅ | ✅ |
Pod I/O attribute override | ❌ | ✅ |
Pod HTTP modify body | ✅ | ✅ |
Pod HTTP modify header | ✅ | ✅ |
Pod HTTP latency | ✅ | ✅ |
Pod I/O error | ❌ | ✅ |
Pod I/O latency | ❌ | ✅ |
Pod I/O stress | ✅ | ✅ |
Pod I/O mistake | ❌ | ✅ |
Pod memory hog exec | ✅ | ✅ |
Pod memory hog | ✅ | ✅ |
Pod network corruption | ✅ | ✅ |
Pod network duplication | ✅ | ✅ |
Pod network latency | ✅ | ✅ |
Pod network loss | ✅ | ✅ |
Pod network partition | ✅ | ✅ |
Pod network rate limit | ❌ | ✅ |
Time chaos | ❌ | ✅ |
Kubernetes node-level chaos faults
Go to Node faults for more information.
Feature | Litmus | HCE (SaaS) |
Kubelet service kill | ✅ | ✅ |
Node drain | ✅ | ✅ |
Node I/O stress | ✅ | ✅ |
Node CPU hog | ✅ | ✅ |
Node memory hog | ✅ | ✅ |
Node restart | ✅ | ✅ |
Node taint | ✅ | ✅ |
Node network latency | ❌ | ✅ |
Node network loss | ❌ | ✅ |
Kubernetes stress - Kubelet density | ❌ | ✅ |
Kubernetes advanced faults
Feature | Litmus | HCE (SaaS) |
HTTP API faults with URL filters | ❌ | ✅ |
Filesystem IO chaos | ❌ | ✅ |
AWS chaos faults
Go to Chaos faults for AWS for more information.
Feature | Litmus | HCE (SaaS) |
ALB AZ down | ❌ | ✅ |
CLB AZ down | ❌ | ✅ |
NLB AZ down | ❌ | ✅ |
EBS loss by ID | ✅ | ✅ |
EBS loss by tag | ✅ | ✅ |
EC2 DNS chaos | ❌ | ✅ |
EC2 instance stop by ID | ✅ | ✅ |
EC2 instance stop by tag | ✅ | ✅ |
AWS SSM chaos by ID | ✅ | ✅ |
AWS SSM chaos by tag | ✅ | ✅ |
EC2 network loss | ❌ | ✅ |
EC2 process kill | ❌ | ✅ |
EC2 stop by ID | ❌ | ✅ |
EC2 stop by tag | ❌ | ✅ |
EC2 network latency (Jitter/Abort) | ❌ | ✅ |
EC2 CPU hog | ❌ | ✅ |
EC2 memory hog | ❌ | ✅ |
EC2 I/O stress | ❌ | ✅ |
EC2 HTTP latency | ❌ | ✅ |
EC2 HTTP modify body | ❌ | ✅ |
EC2 HTTP modify header | ❌ | ✅ |
EC2 HTTP reset peer | ❌ | ✅ |
EC2 HTTP status code | ❌ | ✅ |
EC2 I/O stress | ❌ | ✅ |
RDS instance delete | ❌ | ✅ |
RDS instance reboot | ❌ | ✅ |
ECS instance kill | ❌ | ✅ |
ECS instance stop | ❌ | ✅ |
ECS task stop | ❌ | ✅ |
ECS task scale | ❌ | ✅ |
ECS invalid container image | ❌ | ✅ |
ECS network restrict | ❌ | ✅ |
ECS container network latency | ❌ | ✅ |
ECS container network loss | ❌ | ✅ |
ECS container volume detach | ❌ | ✅ |
ECS agent stop | ❌ | ✅ |
ECS container CPU hog | ❌ | ✅ |
ECS container HTTP latency | ❌ | ✅ |
ECS container HTTP modify body | ❌ | ✅ |
ECS container HTTP reset peer | ❌ | ✅ |
ECS container HTTP status code | ❌ | ✅ |
ECS container memory hog | ❌ | ✅ |
ECS container I/O stress | ❌ | ✅ |
ECS Fargate CPU hog | ❌ | ✅ |
ECS Fargate memory hog | ❌ | ✅ |
ECS update container resource limit | ❌ | ✅ |
ECS update container timeout | ❌ | ✅ |
ECS update task role | ❌ | ✅ |
Windows EC2 blackhole chaos | ❌ | ✅ |
Windows EC2 CPU hog | ❌ | ✅ |
Windows EC2 memory hog | ❌ | ✅ |
AWS serverless chaos faults
Go to Chaos faults for AWS for more information.
Feature | Litmus | HCE (SaaS) |
Lambda delete function concurrency | ❌ | ✅ |
Lambda toggle event mapping state | ❌ | ✅ |
Lambda delete event source mapping | ❌ | ✅ |
Lambda update function memory | ❌ | ✅ |
Lambda update function timeout | ❌ | ✅ |
Lambda update role permission | ❌ | ✅ |
Resource access restrict | ❌ | ✅ |
DynamoDB replication pause | ❌ | ✅ |
Generic FIS experiment template | ❌ | ✅ |
GCP chaos faults
Go to Chaos faults for GCP for more information.
Feature | Litmus | HCE (SaaS) |
GCP disk loss | ✅ | ✅ |
GCP disk loss by label | ❌ | ✅ |
GCP VM instance stop | ✅ | ✅ |
GCP VM instance stop by label | ❌ | ✅ |
GCP VM service kill | ❌ | ✅ |
Azure chaos faults
Got to Chaos faults for Azure for more information.
Feature | Litmus | HCE (SaaS) |
Azure instance stop | ✅ | ✅ |
Azure disk loss | ✅ | ✅ |
Azure instance CPU hog | ❌ | ✅ |
Azure instance memory hog | ❌ | ✅ |
Azure instance I/O stress | ❌ | ✅ |
Azure web app stop | ❌ | ✅ |
Web app access restriction | ❌ | ✅ |
VMware chaos faults
Go to Chaos faults for VMware for more information.
Feature | Litmus | HCE (SaaS) |
VMware VM power off | ✅ | ✅ |
VMware CPU hog | ❌ | ✅ |
VMware memory hog | ❌ | ✅ |
VMware I/O stress | ❌ | ✅ |
VMware DNS chaos | ❌ | ✅ |
VMware host reboot | ❌ | ✅ |
VMware HTTP latency | ❌ | ✅ |
VMware HTTP reset peer | ❌ | ✅ |
VMware HTTP modify response | ❌ | ✅ |
VMware network loss | ❌ | ✅ |
VMware network rate limit | ❌ | ✅ |
VMware network latency | ❌ | ✅ |
VMware process kill | ❌ | ✅ |
VMware service stop | ❌ | ✅ |
VMware Windows CPU hog | ❌ | ✅ |
VMware Windows memory hog | ❌ | ✅ |
VMware disk loss | ❌ | ✅ |
VMware Windows blackhole chaos | ❌ | ✅ |
VMware Windows disk stress | ❌ | ✅ |
VMware Windows network corruption | ❌ | ✅ |
VMware Windows network duplication | ❌ | ✅ |
VMware Windows network latency | ❌ | ✅ |
VMware Windows network loss | ❌ | ✅ |
VMware Windows process kill | ❌ | ✅ |
VMware Windows service stop | ❌ | ✅ |
VMware Windows time chaos | ❌ | ✅ |
ALFI for Springboot
Feature | Litmus | HCE (SaaS) |
Latency | ✅ | ✅ |
Multiple faults injection | ✅ | ✅ |
Exceptions | ✅ | ✅ |
Memory stress | ✅ | ✅ |
CPU stress | ✅ | ✅ |
App kill | ✅ | ✅ |
Load chaos faults
Go to Chaos faults for load for more information.
Feature | Litmus | HCE (SaaS) |
K6 loadgen | ❌ | ✅ |
Locust loadgen | ❌ | ✅ |
SSH chaos faults
Go to Chaos faults for SSH for more information.
Feature | Litmus | CE (SaaS) |
SSH chaos | ❌ | ✅ |
Linux chaos faults
Go to Chaos faults for Linux for more information.
Feature | Litmus | HCE (SaaS) |
Linux API block | ❌ | ✅ |
Linux API latency | ❌ | ✅ |
Linux API modify body | ❌ | ✅ |
Linux API modify header | ❌ | ✅ |
Linux API status code | ❌ | ✅ |
Linux JVM CPU stress | ❌ | ✅ |
Linux JVM memory stress | ❌ | ✅ |
Linux JVM method exception | ❌ | ✅ |
Linux JVM method latency | ❌ | ✅ |
Linux JVM modify return | ❌ | ✅ |
Linux JVM trigger GC | ❌ | ✅ |
Linux CPU stress | ❌ | ✅ |
Linux disk fill | ❌ | ✅ |
Linux disk I/O stress | ❌ | ✅ |
Linux DNS error | ❌ | ✅ |
Linux DNS spoof | ❌ | ✅ |
Linux memory stress | ❌ | ✅ |
Linux network corruption | ❌ | ✅ |
Linux network duplication | ❌ | ✅ |
Linux network latency | ❌ | ✅ |
Linux network loss | ❌ | ✅ |
Linux network rate limit | ❌ | ✅ |
Linux process kill | ❌ | ✅ |
Linux service restart | ❌ | ✅ |
Linux time chaos | ❌ | ✅ |
Linux fs fill | ❌ | ✅ |
Redis cache expire | ❌ | ✅ |
Redis cache limit | ❌ | ✅ |
Redis cache penetration | ❌ | ✅ |
Redis Sentinel stop | ❌ | ✅ |
Windows chaos faults
Feature | Litmus | HCE (SaaS) |
Windows CPU stress | ❌ | ✅ |
Windows memory stress | ❌ | ✅ |
Windows network blackhole chaos | ❌ | ✅ |
Cloud Foundry chaos faults
Go to Chaos faults for Linux for more information.
Feature | Litmus | HCE (SaaS) |
CF app container kill | ❌ | ✅ |
CF app JVM CPU stress | ❌ | ✅ |
CF app JVM memory stress | ❌ | ✅ |
CF app JVM method exception | ❌ | ✅ |
CF app JVM method latency | ❌ | ✅ |
CF app JVM modify return | ❌ | ✅ |
CF app JVM trigger GC | ❌ | ✅ |
CF app network corruption | ❌ | ✅ |
CF app network duplication | ❌ | ✅ |
CF app network latency | ❌ | ✅ |
CF app network loss | ❌ | ✅ |
CF app route unmap | ❌ | ✅ |
CF app stop | ❌ | ✅ |
Onboarding
If you want to get hands-on experience by executing chaos experiments without explicitly fulfilling the prerequisites, automated or guided onboarding is for you. Go to Introduction to Onboarding to know more.