Skip to main content

Network latency between Clouddriver and Agent causing Timeout issues

Issue

Users may run into failures because of network latency between Clouddriver and the Agent instances.  Users may observe the following error within the Clouddriver logs when executing a pipeline on a Spinnaker instance that uses Armory Agent for Kubernetes: message="com.netflix.spinnaker.clouddriver.exceptions.OperationTimedOutException: Timeout exceeded for operation 01FHVJ66QXEHX51DMY9PDSTPY0, type: 6, account: xxx, time: 30022 ms, timeout: 30000 ms at io.armory.kubesvc.services.ops.KubesvcOperations.performOperation(KubesvcOperations.java:96) at io.armory.kubesvc.services.ops.cluster.ClusteredKubesvcOperations.performOperation(ClusteredKubesvcOperations.java:70) at io.armory.kubesvc.util.OperationUtils.perform(OperationUtils.java:76) at io.armory.kubesvc.services.ops.executor.KubesvcExecutor.deploy(KubesvcExecutor.java:301) at jdk.internal.reflect.GeneratedMethodAccessor703.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at io.armory.kubesvc.services.registration.clouddriver.KubesvcCredentialsParser.lambda$makeKubesvcJobExecutor$0(KubesvcCredentialsParser.java:72) at com.netflix.spinnaker.clouddriver.kubernetes.op.job.KubectlJobExecutor$$EnhancerByCGLIB$$4f088312.deploy() at com.netflix.spinnaker.clouddriver.kubernetes.security.KubernetesCredentials.lambda$deploy$14(KubernetesCredentials.java:508) at com.netflix.spinnaker.clouddriver.kubernetes.security.KubernetesCredentials.runAndRecordMetrics(KubernetesCredentials.java:618) at com.netflix.spinnaker.clouddriver.kubernetes.security.KubernetesCredentials.runAndRecordMetrics(KubernetesCredentials.java:603) at com.netflix.spinnaker.clouddriver.kubernetes.security.KubernetesCredentials.deploy(KubernetesCredentials.java:504) at com.netflix.spinnaker.clouddriver.kubernetes.op.handler.CanDeploy.deploy(CanDeploy.java:58) at com.netflix.spinnaker.clouddriver.kubernetes.op.manifest.KubernetesDeployManifestOperation.operate(KubernetesDeployManifestOperation.java:209) at com.netflix.spinnaker.clouddriver.kubernetes.op.manifest.KubernetesDeployManifestOperation.operate(KubernetesDeployManifestOperation.java:46) at com.netflix.spinnaker.clouddriver.orchestration.AtomicOperation$operate.call(Unknown Source) at com.netflix.spinnaker.clouddriver.orchestration.DefaultOrchestrationProcessor$_process_closure1$_closure2.doCall(DefaultOrchestrationProcessor.groovy:124) at com.netflix.spinnaker.clouddriver.orchestration.DefaultOrchestrationProcessor$_process_closure1$_closure2.doCall(DefaultOrchestrationProcessor.groovy) at jdk.internal.reflect.GeneratedMethodAccessor543.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:101) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:263) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1041) at groovy.lang.Closure.call(Closure.java:405) at groovy.lang.Closure.call(Closure.java:399) at com.netflix.spinnaker.clouddriver.metrics.TimedCallable$CallableWrapper.call(TimedCallable.java:81) at com.netflix.spinnaker.clouddriver.metrics.TimedCallable.call(TimedCallable.java:45) at java_util_concurrent_Callable$call.call(Unknown Source) at com.netflix.spinnaker.clouddriver.orchestration.DefaultOrchestrationProcessor$_process_closure1.doCall(DefaultOrchestrationProcessor.groovy:123) at com.netflix.spinnaker.clouddriver.orchestration.DefaultOrchestrationProcessor$_process_closure1.doCall(DefaultOrchestrationProcessor.groovy) at jdk.internal.reflect.GeneratedMethodAccessor537.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:101) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:263) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1041) at groovy.lang.Closure.call(Closure.java:405) at groovy.lang.Closure.call(Closure.java:399) at com.netflix.spinnaker.security.AuthenticatedRequest.lambda$wrapCallableForPrincipal$0(AuthenticatedRequest.java:272) at com.netflix.spinnaker.clouddriver.metrics.TimedCallable.call(TimedCallable.java:45) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829)"

Cause

After forwarding the deploy operation to the Agent, Clouddriver waits for the Agent to respond within 30 seconds.  However, if the response from the Agent takes more than 30 sec, then users may find the error above.  Administrators can override the default value of 30 seconds.