Skip to main content

Objects disappearing under the Cluster tab and a slow Deck UI Cluster page

Issue

The objects under Clusters tab disappear after a certain time (3-6 hours). In the duration that the cluster data shown in the UI is absent, the Clouddriver pods do not restart. Another symptom is that the Cluster page is slow at updating the pod being updated. The following errors are seen running the caching agents: WARN 1 --- [utionAction-174] c.n.s.c.cache.LoggingInstrumentation : kubernetes:rr-workflow-.....-3/KubernetesCoreCachingAgent[1/1] completed with one or more failures in 0.378s

Cause

Caching agents being unable to complete will prevent the cache from being up to date as new deployments are not cached. The output of making a request to http://:7002/cache/introspection would be for example like:

[
2 {
3 "details": {},
4 "id": "...../CustomKubernetes(DestinationRule)[1/1]",
5 "lastExecutionDurationMs": 413,
6 "lastExecutionStartDate": "2021-08-17 21:44:26.928",
7 "lastExecutionStartMs": 1629150266928,
8 "provider": "kubernetes",
9 "totalAdditions": 0,
10 "totalEvictions": 0
11 },
12 {
13 "details": {},
14 "id": "...../KubernetesCoreCachingAgent[1/1]",
15 "lastExecutionDurationMs": 1934,
16 "lastExecutionStartDate": "2021-08-17 21:44:21.928",
17 "lastExecutionStartMs": 1629150271928,
18 "provider": "kubernetes",
19 "totalAdditions": 121,
20 "totalEvictions": 0
21 },
22 {
23 "details": {},
24 "id": "......KubernetesUnregisteredCustomResourceCachingAgent[1/1]",
25 "lastExecutionDurationMs": 1,
26 "lastExecutionStartDate": "2021-08-17 21:44:19.927",
27 "lastExecutionStartMs": 1629150249927,
28 "provider": "kubernetes",
29 "totalAdditions": 0,
30 "totalEvictions": 0
31 }

This endpoint contains statistics about how much time it took for each caching agent to complete.  All of them should run under 10 minutes (successfully) or data will disappear from UI.