Skip to main content

General Spinnaker Optimization Suggestions

Introduction

The following is a list of optimizations our Armory team has previous suggested as a baseline to a Spinnaker environment.   These are general settings suggested for the majority of our customers. Feel free to adjust these settings to fit the needs of the environment.  

Prerequisites

Administration access to the environment

Instructions

Plugins

Many Armory customers install the following plugins: Armory Observability Plugin, AWS Lambda Plugin, and Policy Engine Plugin as needed.

In Operator

spec.spinnakerConfig.config

In the spec.spinnakerConfig.config section the following is suggested.  There are some notes in this section about some configurations that should be avoided, and what should be done instead.  As a note, the Observability Plugin is suggested in the configuration below.  Please look at https://github.com/armory-plugins/armory-observability-plugin for more information spec: spinnakerConfig:

spec.spinnakerConfig.config - This section contains the contents of a deployment found in a halconfig .deploymentConfigurations[0]

config: stats: enabled: false

This prevents recreating echo on every single deploy of spinnaker

instanceId: disable-echo-always-recreating telemetry: enabled: true endpoint: https://stats.spinnaker.io connectionTimeoutMillis: 3000 readTimeoutMillis: 5000 deploymentEnvironment: size: SMALL type: Distributed updateVersions: true vault: enabled: false livenessProbeConfig: enabled: true

Default to 10 minutes before starting liveness probes. Larger installs may need longer

initialDelaySeconds: 600 location: spinnaker

This is current default we'd PREFER deployed everyplace... can be overridden per cust

#version: 2.26.0

DO NOT turn on the monitoring daemon - it does NASTY things to spinnaker's system, and it uses a lot of resources. USE the observability plugin instead! (Separate files)

metricStores: prometheus: enabled: false enabled: false armory:

send diagnostics to Armory to reduce troubleshooting time

diagnostics: enabled: true logging: enabled: true endpoint: https://debug.armory.io/v1/logs  

spec.spinnakerConfig.profiles

Below are some settings suggested for the spec.spinnakerConfig.profiles section, and notes about the settings spec: spinnakerConfig: profiles:

This is always first otherwise they're in alphabetical order

spinnaker: ok-http-client:

connects should never really be increased beyond this

connectTimeoutMs: 15000

Start at 60 seconds for downloads, but may need to be bigger if using large files/artifacts.

readTimeoutMs: 60000 maxRequests: 256 maxRequestsPerHost: 256 clouddriver: serviceLimits: cloudProviderOverrides: aws: rateLimit: 10.0 implementationLimits:

Examples - default rate limit is 10, reducing these can reduce AWS API calls preventing throttling but it's FAR better to get a support ticket in... and increase the API limits.

AmazonAutoScaling: defaults: rateLimit: 3.0 AmazonElasticLoadBalancing: defaults: rateLimit: 5.0

NOTE we do NOT set redis NOR sql enabled/true/false here. What we DO set are specific settings underneath. This is HIGHLY tied

to the environment deployed to, but defaulting to sql & redis, these should be good common settings. WHEN only redis

is used, should override this as appropriate

sql: cache: enabled: true

These parameters were determined to be optimal via benchmark comparisons

in the Netflix production environment with Aurora. Setting these too low

or high may negatively impact performance. These values may be sub-optimal

in some environments.

readBatchSize: 500 writeBatchSize: 300 unknown-agent-cleanup-agent: enabled: true taskRepository: enabled: true

NOT setting this but for reference, we recommend disabling SQL locks and using redis

#scheduler: #enabled: false redis:

Still COULD redis for the scheduler locks - it changes some things if SQL locks are problematic for some reason.

#scheduler: #enabled: true deck:

These are here for reference BUT should be replaced by end users

settings-local.js: | window.spinnakerSettings.feature.kustomizeEnabled = true; window.spinnakerSettings.feature.artifactsRewrite = true; window.spinnakerSettings.feature.functions = true; window.spinnakerSettings.kubernetesAdHocInfraWritesEnabled = true;

DO NOT go without this... authz/authn enabled turn this on automatically as well.

window.spinnakerSettings.authEnabled = true;

The below SHOULD be overridden per customer. Leaving it here as a default though for reference

##window.spinnakerSettings.feature.terraform = true; echo: resilience4j.circuitbreaker: instances: telemetry:

This needs to stay false, because if the telemetry endpoint goes down, Echo goes unhealthy (no good!)

registerHealthIndicator: false gate: # is the contents of ~/.hal/default/profiles/gate.yml

This enables the X509 Client authentication system

default: apiPort: 8443

Here for reference - we recommend this as allows path routing configs vs separate domains.

servlet:

context-path: /api/v1

server: tomcat:

In 2.2.4+spring boot, needs to include KB for sizing due to large group payload potentials or similar issues

max-http-header-size: 256KB protocolHeader: X-Forwarded-Proto remoteIpHeader: X-Forwarded-For internalProxies: .* httpsServerPort: X-Forwarded-Port redis: configuration: secure: true proxies:

  • id: terraform uri: http://spin-terraformer:7088 methods:
    • GET igor: spinnaker: pollingSafeguard:

    The defaults for polling is 1000 - this 10x that to allow for lots of new images.

    itemUpperThreshold: 10000 orca: executionRepository: sql: enabled: true redis: enabled: false monitor: activeExecutions: redis: false sql:

Some sane defaults for connection sql pools for all customers These are "decent" starting settings.

connectionPools: connectionTimeout: 5000 maxLifetime: 30000 maxPoolSize: 100  

spec.spinnakerConfig.service-settings

Below are some settings suggested for the spec.spinnakerConfig.service-settings section, and notes about the settings.

Note: We GENERALLY recommend NOT using the embedded Redis. POC's can override this. ALSO make sure replicas is ALWAYS at least 2 (not currently set in this file) - for each service.

spec: spinnakerConfig: service-settings:

We GENERALLY recommend NOT using the embedded redis. POC's can override this. ALSO make sure replicas is ALWAYS at least 2 (not currently set in this file) - for each service.

redis: skipLifeCycleManagement: true clouddriver: kubernetes: useExecHealthCheck: false #Examples of common settings can be set/overriden. HIGHLY recommend an APM agent for monitoring various aspects including jvm details not available through other metrics.
#env: #JAVA_OPTS: -Dnewrelic.config.file=/opt/apmagent/config/newrelic.yml -javaagent:/opt/apmagent/newrelic/newrelic.jar -Djavax.net.ssl.trustStore=/etc/ssl/certs/java/cacerts -Djdk.tls.client.protocols=TLSv1.2 -XX:MaxRAMPercentage=50.0 dinghy: kubernetes: useExecHealthCheck: false echo: kubernetes: useExecHealthCheck: false front50: kubernetes: useExecHealthCheck: false fiat: kubernetes: useExecHealthCheck: false gate: kubernetes: useExecHealthCheck: false igor: kubernetes: useExecHealthCheck: false kayenta: kubernetes: useExecHealthCheck: false orca: kubernetes: useExecHealthCheck: false rosco: kubernetes: useExecHealthCheck: false terraformer: kubernetes: useExecHealthCheck: false securityContext: runAsUser: 1000 runAsGroup: 1000 fsGroup: 1000  

In Halyard

halconfig

The following changes will need to be made in the halconfig file.  Most of these settings can be adjusted in the config file directly. stats: enabled: false

This prevents recreating echo on every single deploy of spinnaker

instanceId: disable-echo-always-recreating telemetry:

add to default config

enabled: true endpoint: https://stats.spinnaker.io connectionTimeoutMillis: 3000 readTimeoutMillis: 5000 deploymentEnvironment: size: SMALL type: Distributed updateVersions: true vault: enabled: false livenessProbeConfig: enabled: true

Default to 10 minutes before starting liveness probes. Larger installs may need longer

initialDelaySeconds: 600 location: spinnaker

This is current default we'd PREFER deployed everyplace... can be overridden per cust

DO NOT turn on the monitoring daemon - it does NASTY things to spinnaker's system, and it uses a lot of resources. USE the observability plugin instead! (Separate files)

metricStores: prometheus: enabled: false enabled: false  

Profiles configuration

Each service will already have a -local.yml file existing in hal config profiles directory e.g. (~/.hal/default/profiles/), or users will need to create each file and add the following information to each file spinnaker-local.yml ok-http-client:

connects should never really be increased beyond this

connectTimeoutMs: 15000

Start at 60 seconds for downloads, but may need to be bigger if using large files/artifacts.

readTimeoutMs: 60000 maxRequests: 256 maxRequestsPerHost: 256 clouddriver-local.yml serviceLimits: cloudProviderOverrides: aws: rateLimit: 10.0 implementationLimits:

Examples - default rate limit is 10, reducing these can reduce AWS API calls preventing throttling but it's FAR better to get a support ticket in... and increase the API limits.

AmazonAutoScaling: defaults: rateLimit: 3.0 AmazonElasticLoadBalancing: defaults: rateLimit: 5.0

NOTE we do NOT set redis NOR sql enabled/true/false here. What we DO set are specific settings underneath. This is HIGHLY tied

to the environment deployed to, but defaulting to sql & redis, these should be good common settings. WHEN only redis

is used, should override this as appropriate

settings-local.js window.spinnakerSettings.feature.kustomizeEnabled = true; window.spinnakerSettings.feature.artifactsRewrite = true; window.spinnakerSettings.feature.functions = true; window.spinnakerSettings.kubernetesAdHocInfraWritesEnabled = true; // DO NOT go without this... authz/authn enabled turn this on automatically as well. window.spinnakerSettings.authEnabled = true; // The below SHOULD be overridden per customer. Leaving it here as a default though for reference //window.spinnakerSettings.feature.terraform = true; echo-local.yml resilience4j.circuitbreaker: instances: telemetry:

This needs to stay false, because if the telemetry endpoint goes down, Echo goes unhealthy (no good!)

registerHealthIndicator: false   gate-local.yml

This enables the X509 Client authentication system

default: apiPort: 8443

Here for reference - we recommend this as allows path routing configs vs separate domains.

servlet:

context-path: /api/v1

server: tomcat:

In 2.2.4+spring boot, needs to include KB for sizing due to large group payload potentials or similar issues

max-http-header-size: 256KB protocolHeader: X-Forwarded-Proto remoteIpHeader: X-Forwarded-For internalProxies: .* httpsServerPort: X-Forwarded-Port redis: configuration: secure: true proxies:

  • id: terraform uri: http://spin-terraformer:7088 methods:
    • GET igor-local.yml spinnaker: pollingSafeguard:

    The defaults for polling is 1000 - this 10x that to allow for lots of new images.

    itemUpperThreshold: 10000 orca-local.yml executionRepository: sql: enabled: true redis: enabled: false monitor: activeExecutions: redis: false sql:

Some sane defaults for connection sql pools for all customers These are "decent" starting settings.

connectionPools: connectionTimeout: 5000 maxLifetime: 30000 maxPoolSize: 100  

Service Settings Configuration

Each service will already have a .yml file existing in hal config service-settings directory e.g. (~/.hal/default/service-settings/), or users will need to create each file and add the following information to each file Note: We GENERALLY recommend NOT using the embedded Redis. POC's can override this. ALSO make sure replicas is ALWAYS at least 2 (not currently set in these files) - for each service. redis.yml skipLifeCycleManagement: true clouddriver.yml kubernetes: useExecHealthCheck: false #Examples of common settings can be set/overriden. HIGHLY recommend an APM agent for monitoring various aspects including jvm details not available through other metrics.
#env: #JAVA_OPTS: -Dnewrelic.config.file=/opt/apmagent/config/newrelic.yml -javaagent:/opt/apmagent/newrelic/newrelic.jar -Djavax.net.ssl.trustStore=/etc/ssl/certs/java/cacerts -Djdk.tls.client.protocols=TLSv1.2 -XX:MaxRAMPercentage=50.0 dinghy.ymlecho.yml```, front50.yml, `````````fiat.ymlgate.yml```, ```igor.yml```, kayenta.yml, `````````orca.yml, `````````rosco.ymleach need to have the following lines added into an existing file, or added to a newly created file kubernetes: useExecHealthCheck: falseterraformer.yml``````````````` kubernetes: useExecHealthCheck: false securityContext: runAsUser: 1000 runAsGroup: 1000 fsGroup: 1000