Skip to main content

Operator cannot connect to Halyard due to a TCP timeout

Issue

While attempting to deploy Spinnaker using Operator, the process may fail with the following TCP timeout error message displayed in Operator's pod logs: {"level":"error","ts":1636061223.53345,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"spinnakerservice-controller","request":"spinnaker-service/spinnaker","error":"Post http://localhost:/v1/config/deployments/manifests: dial tcp :: i/o timeout"

Cause

The Spinnaker Operator container is based on an Alpine Linux distribution. Alpine does not, by default, include /etc/nsswitch.conf. Golang, in the absence of /etc/nsswitch.conf, defaults to a DNS-first lookup and will exhaust DNS lookups on any hostname, including localhost, before it falls back to using /etc/hosts for hostname lookup. If an entry for localhost is configured/mapped within the DNS server for the cluster in question, when Operator starts, it performs a DNS look-up for localhost. This catches the DNS entry for localhost., which resolves to an IP address displayed in the timeout error message. This IP does not exist, and since it is not running Halyard, it results in a TCP timeout.