Skip to main content

Get started with HCE

Before you begin, review the following:

Harness CE is available in two ways:

  1. SaaS (Software-as-a-Service)
  2. SMP (Self-Managed Platform)
tip

Feature availability on HCE SaaS and SMP are on par, with minor timeline changes in the SMP feature releases.

SaaS

HCE module is provided as service that you can use by either signing up or getting invited to a specific project. HCE provides you the assistance required to manage the cluster. You can also create a project if you have the necessary permissions. The control plane (set of microservices that help the domain function) is hosted by Harness. For more information on how to use SaaS, go to SaaS documentation For a video tutorial, go to Get started with HCE.

SMP

You will need to create, manage and maintain your clusters. You will be responsible for providing permissions to projects and handling the issues associated with them. The control plane is hosted within your domain, for example, harness.your-domain.io. Contact Harness Support and SMP documentation for more information.

HCE and LitmusChaos

HCE and LitmusChaos

Common capabilities of HCE and LitmusChaos

Following are the common features between Litmus and HCE:

  1. Scalable platform
  2. Declarative chaos fault architecture
  3. Kubernetes chaos faults
  4. Chaos faults as CRDs
  5. Chaos metrics
  6. Chaos hubs
  7. Chaos infrastructure architecture
  8. Chaos experiments structure
  9. Scheduling chaos experiments
  10. Resilience probes

Additional capabilities of HCE

HCE module has the following additional capabilities:

  1. Kubelet density chaos fault
  2. VMware chaos faults
  3. AWS chaos faults
  4. GCP chaos faults
  5. Azure chaos faults
  6. Linux chaos faults
  7. Windows chaos faults
  8. Cloud Foundry chaos faults
  9. Load chaos faults
  10. SSH chaos faults
  11. Kubernetes chaos faults
  12. Bring Your Own Chaos
  13. Teaming around ChaosHubs
  14. Resilience probes (Dynatrace and Datadog)
  15. ChaosGuard

Integrations with other Harness modules

In addition to the above features, HCE integrates with these Harness modules:

HCE free plans

Harness offers a free hosted LitmusChaos if you are trying to get started with LitmusChaos or Chaos Engineering in general. The HCE free plan is a free-forever plan that has features equivalent to LitmusChaos, and also bundles Harness platform features such as RBAC and hosted logging— all for free.

Sign up at app.harness.io to get started with a free plan for either the hosted HCE or the hosted LitmusChaos.

important

The HCE free plan replaces cloud.chaosnative.com. New sign ups are not being accepted at cloud.chaosnative.com. Users on that platform are advised to move to the HCE free plan before August 31, 2023. For free support on migration, contact Harness Support.

HCE versus LitmusChaos

This section describes the differences between Harness Chaos Engineering (HCE) and the open-source CNCF incubation project, LitmusChaos.

Chaos orchestration

FeatureLitmusHCE (SaaS)
Centralized chaos portal
Support for resilience probes
Chaos hubsPublic hubEnterprise hub
Chaos metrics to Prometheus
Launch chaos experiments directly from the ChaosHub
YAML-based support for chaos experiments
Run chaos faults in parallel in a chaos experiment
Experiment control parameters through the tag in the UI
Event driven chaos injection✅ (via Harness webhooks)
Ready-to-use chaos experiment templates✅ (via Harness webhooks)
Halt an ongoing chaos experiments using the halt button
BYOC (Bring Your Own Chaos)
Tagging support in the UI for selecting chaos targets
Chaos experiment for targeting across Kubernetes clusters

Deployment modes and agents

FeatureLitmusHCE
SaaS
On-Prem (Self-managed platform)
Kubernetes native chaos agent
Linux native chaos agent
Windows native chaos agent
Scope-based isolation for Kubernetes (Cluster and namespace modes)

Chaos management - Advanced

FeatureLitmusHCE (SaaS)
UI support for chaos experiments CRUD
Chaos experiments for multiple clusters✅ (GameDays and pipelines)
Run chaos experiments in parallel✅ (GameDays and pipelines)
Out-of-the-box chaos experiments
Ready to use chaos experiment templates
Export chaos experiments to ChaosHubs
Schedule chaos scenarios directly from a chaos hub
Chaos GameDay portal
ChaosGuard

Administration

FeatureLitmusHCE (SaaS)
Rest/GraphQL APIs
Built-in user management and authenticationBasic
Single Sign-On (SSO) with OAuth 2.0
Single Sign-On (SSO) with SAML
Provision users with Okta (SCM)
Provision Azure AD Users and Groups (SCIM)
Provision users and groups with OneLogin (SCIM)
Multiple projects
Multiple organisations

Authentication and authorization

FeatureLitmusHCE (SaaS)
Username-based authentication
LDAP provider
SAML provider
Public OAuth providers
RBAC (Role-based access control)

Chaos discovery, auto-creation, and recommendations

FeatureLitmusHCE (SaaS)
Auto discover the target services with relationship on Kubernetes
Auto create the possible chaos experiments
Recommend chaos experiments to run- Manual
Recommend chaos experiments to run - Based on traffic

Chaos governance

FeatureLitmusHCE (SaaS)
RBACs around ChaosHub
RBACs around Chaos Infrastructure
RBACs around Chaos Experiments CRUD
RBACs around Chaos GameDays
RBACs for running chaos experiments against specific targets
RBACs for running chaos experiments with specific faults
RBACs for running chaos experiments by specific users
RBACs for running chaos experiments in a particular time window
RBACs for running chaos experiments with a specific ServiceAccount

Security

FeatureLitmusHCE (SaaS)
Two-factor authentication
Support for Kubernetes local secrets
Support for external secrets managers
RBAC (Role Based Access Control)- Built-in rolesBasic
RBAC (Role Based Access Control)- Custom roles
Audit trail (2 year data retention)
Integrated secrets management with Harness Secrets Manager
IP Address whitelist management

Integrations

FeatureLitmusHCE (SaaS)
Integration with Harness Continuous Deployment Pipelines
Integration with Jenkins Pipelines
Integration with GitLab Pipelines
Integration with Harness Continuous Verification
Integration with Harness Feature Flags
Integration with Service Reliability Management
Integration with Native Resilience Probe for Harness SRM
Create custom faults through SDK
Install, create and orchestrate chaos through API
Postman Provider for chaos orchestration
Terraform Provider to install, create and orchestrate chaos

APM integrations

FeatureLitmusHCE (SaaS)
Native resilience probe for Prometheus
Native resilience probe for Datadog
Native resilience probe for Dynatrace

Kubernetes pod-level chaos faults

Go to Pod faults for more information.

FeatureLitmusHCE (SaaS)
Container kill
Disk fill
fs fill
Pod API block
Pod API latency
Pod API modify body
Pod API modify header
Pod API status code
Pod autoscaler
Pod CPU hog exec
Pod CPU hog
Pod delete
Pod DNS error
Pod DNS spoof
Pod HTTP reset peer
Pod HTTP status code
Pod I/O attribute override
Pod HTTP modify body
Pod HTTP modify header
Pod HTTP latency
Pod I/O error
Pod I/O latency
Pod I/O stress
Pod I/O mistake
Pod memory hog exec
Pod memory hog
Pod network corruption
Pod network duplication
Pod network latency
Pod network loss
Pod network partition
Pod network rate limit
Time chaos

Kubernetes node-level chaos faults

Go to Node faults for more information.

FeatureLitmusHCE (SaaS)
Kubelet service kill
Node drain
Node I/O stress
Node CPU hog
Node memory hog
Node restart
Node taint
Node network latency
Node network loss
Kubernetes stress - Kubelet density

Kubernetes advanced faults

FeatureLitmusHCE (SaaS)
HTTP API faults with URL filters
Filesystem IO chaos

AWS chaos faults

Go to Chaos faults for AWS for more information.

FeatureLitmusHCE (SaaS)
ALB AZ down
CLB AZ down
NLB AZ down
EBS loss by ID
EBS loss by tag
EC2 DNS chaos
EC2 instance stop by ID
EC2 instance stop by tag
AWS SSM chaos by ID
AWS SSM chaos by tag
EC2 network loss
EC2 process kill
EC2 stop by ID
EC2 stop by tag
EC2 network latency (Jitter/Abort)
EC2 CPU hog
EC2 memory hog
EC2 I/O stress
EC2 HTTP latency
EC2 HTTP modify body
EC2 HTTP modify header
EC2 HTTP reset peer
EC2 HTTP status code
EC2 I/O stress
RDS instance delete
RDS instance reboot
ECS instance kill
ECS instance stop
ECS task stop
ECS task scale
ECS invalid container image
ECS network restrict
ECS container network latency
ECS container network loss
ECS container volume detach
ECS agent stop
ECS container CPU hog
ECS container HTTP latency
ECS container HTTP modify body
ECS container HTTP reset peer
ECS container HTTP status code
ECS container memory hog
ECS container I/O stress
ECS Fargate CPU hog
ECS Fargate memory hog
ECS update container resource limit
ECS update container timeout
ECS update task role
Windows EC2 blackhole chaos
Windows EC2 CPU hog
Windows EC2 memory hog

AWS serverless chaos faults

Go to Chaos faults for AWS for more information.

FeatureLitmusHCE (SaaS)
Lambda delete function concurrency
Lambda toggle event mapping state
Lambda delete event source mapping
Lambda update function memory
Lambda update function timeout
Lambda update role permission
Resource access restrict
DynamoDB replication pause
Generic FIS experiment template

GCP chaos faults

Go to Chaos faults for GCP for more information.

FeatureLitmusHCE (SaaS)
GCP disk loss
GCP disk loss by label
GCP VM instance stop
GCP VM instance stop by label
GCP VM service kill

Azure chaos faults

Got to Chaos faults for Azure for more information.

FeatureLitmusHCE (SaaS)
Azure instance stop
Azure disk loss
Azure instance CPU hog
Azure instance memory hog
Azure instance I/O stress
Azure web app stop
Web app access restriction

VMware chaos faults

Go to Chaos faults for VMware for more information.

FeatureLitmusHCE (SaaS)
VMware VM power off
VMware CPU hog
VMware memory hog
VMware I/O stress
VMware DNS chaos
VMware host reboot
VMware HTTP latency
VMware HTTP reset peer
VMware HTTP modify response
VMware network loss
VMware network rate limit
VMware network latency
VMware process kill
VMware service stop
VMware Windows CPU hog
VMware Windows memory hog
VMware disk loss
VMware Windows blackhole chaos
VMware Windows disk stress
VMware Windows network corruption
VMware Windows network duplication
VMware Windows network latency
VMware Windows network loss
VMware Windows process kill
VMware Windows service stop
VMware Windows time chaos

ALFI for Springboot

FeatureLitmusHCE (SaaS)
Latency
Multiple faults injection
Exceptions
Memory stress
CPU stress
App kill

Load chaos faults

Go to Chaos faults for load for more information.

FeatureLitmusHCE (SaaS)
K6 loadgen
Locust loadgen

SSH chaos faults

Go to Chaos faults for SSH for more information.

FeatureLitmusCE (SaaS)
SSH chaos

Linux chaos faults

Go to Chaos faults for Linux for more information.

FeatureLitmusHCE (SaaS)
Linux API block
Linux API latency
Linux API modify body
Linux API modify header
Linux API status code
Linux JVM CPU stress
Linux JVM memory stress
Linux JVM method exception
Linux JVM method latency
Linux JVM modify return
Linux JVM trigger GC
Linux CPU stress
Linux disk fill
Linux disk I/O stress
Linux DNS error
Linux DNS spoof
Linux memory stress
Linux network corruption
Linux network duplication
Linux network latency
Linux network loss
Linux network rate limit
Linux process kill
Linux service restart
Linux time chaos
Linux fs fill
Redis cache expire
Redis cache limit
Redis cache penetration
Redis Sentinel stop

Windows chaos faults

FeatureLitmusHCE (SaaS)
Windows CPU stress
Windows memory stress
Windows network blackhole chaos

Cloud Foundry chaos faults

Go to Chaos faults for Linux for more information.

FeatureLitmusHCE (SaaS)
CF app container kill
CF app JVM CPU stress
CF app JVM memory stress
CF app JVM method exception
CF app JVM method latency
CF app JVM modify return
CF app JVM trigger GC
CF app network corruption
CF app network duplication
CF app network latency
CF app network loss
CF app route unmap
CF app stop

Onboarding

If you want to get hands-on experience by executing chaos experiments without explicitly fulfilling the prerequisites, automated or guided onboarding is for you. Go to Introduction to Onboarding to know more.