clouddriver-caching errors | ClusteredAgentScheduler - Unable to run agents
Issue
When attempting to run agents, the following errors can be found in the clouddriver-caching logs:
Exception: ERROR 1 --- [ClusteredAgentScheduler-0] c.n.s.c.r.c.ClusteredAgentScheduler : Unable to run agents redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool redis.clients.jedis.util.Pool.getResource(Pool.java:59) ~[jedis-3.1.0.jar:na] redis.clients.jedis.JedisPool.getResource(JedisPool.java:234) ~[jedis-3.1.0.jar:na] com.netflix.spinnaker.kork.jedis.telemetry.InstrumentedJedisPool.getResource(InstrumentedJedisPool.java:60) ~[kork-jedis-7.41.0.jar:na] com.netflix.spinnaker.kork.jedis.telemetry.InstrumentedJedisPool.getResource(InstrumentedJedisPool.java:26) ~[kork-jedis-7.41.0.jar:na] com.netflix.spinnaker.kork.jedis.JedisClientDelegate.withCommandsClient(JedisClientDelegate.java:54) ~[kork-jedis-7.41.0.jar:na] com.netflix.spinnaker.cats.redis.cluster.ClusteredAgentScheduler.acquireRunKey(ClusteredAgentScheduler.java:183) ~[cats-redis-GCSFIX.jar:na] com.netflix.spinnaker.cats.redis.cluster.ClusteredAgentScheduler.acquire(ClusteredAgentScheduler.java:136) ~[cats-redis-GCSFIX.jar:na] com.netflix.spinnaker.cats.redis.cluster.ClusteredAgentScheduler.runAgents(ClusteredAgentScheduler.java:163) ~[cats-redis-GCSFIX.jar:na] com.netflix.spinnaker.cats.redis.cluster.ClusteredAgentScheduler.run(ClusteredAgentScheduler.java:156) ~[cats-redis-GCSFIX.jar:na] java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[na:na] java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[na:na] java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[na:na] java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[na:na] java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[na:na] java.base/java.lang.Thread.run(Thread.java:834) ~[na:na] Caused by: redis.clients.jedis.exceptions.JedisConnectionException: Failed connecting to host redis-clouddriver-master:6379 redis.clients.jedis.Connection.connect(Connection.java:204) ~[jedis-3.1.0.jar:na] redis.clients.jedis.BinaryClient.connect(BinaryClient.java:100) ~[jedis-3.1.0.jar:na] redis.clients.jedis.BinaryJedis.connect(BinaryJedis.java:1866) ~[jedis-3.1.0.jar:na] redis.clients.jedis.JedisFactory.makeObject(JedisFactory.java:117) ~[jedis-3.1.0.jar:na] org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:889) ~[commons-pool2-2.7.0.jar:2.7.0] org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:424) ~[commons-pool2-2.7.0.jar:2.7.0] org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:349) ~[commons-pool2-2.7.0.jar:2.7.0] redis.clients.jedis.util.Pool.getResource(Pool.java:50) ~[jedis-3.1.0.jar:na] \t... 14 common frames omitted Caused by: java.net.SocketTimeoutException: connect timed out java.base/java.net.PlainSocketImpl.socketConnect(Native Method) ~[na:na] java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399) ~[na:na] java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242) ~[na:na] java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224) ~[na:na] java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403) ~[na:na] java.base/java.net.Socket.connect(Socket.java:609) ~[na:na] redis.clients.jedis.Connection.connect(Connection.java:181) ~[jedis-3.1.0.jar:na] \t... 21 common frames omitted
Cause
Rare Redis issue where when two pods try to start at the same time. Because they start at the same time, they end up competing for shared resources, and only one pod ends up starting.