worklog 06-19

symptoms

ap1 and us1 cluster have memory pressure all the time

analysis

us1

memory pressure coming from two nodes in scopedb-ingest nodepool

text

> kubectl top pods -nguanced | grep scopedb-ingest
scopedb-ingest-5bcc5c7c48-fcwz5 1285m 12978Mi
scopedb-ingest-5bcc5c7c48-ftw6b 1516m 1688Mi
scopedb-ingest-5bcc5c7c48-kz4cd 1135m 13144Mi

can tell that memory pressure is unbalanced on 3 pods the avg of the memory usages still below hpa threshold, thus not triggering scaleout, while two nodes having memory pressure.

after restarting deployment, all memory comes down to 2Gb, no more mem pressure

action items

figure out why memory pressure unbalanced in the first place.
why is there no memory limit on this thing?

ap1

pods in prd-business are scheduled unbalanced on different nodes in the node pool

we figured out which node has the pressure, which still with enough memory we also found out how to find the pod consuming the most memory on each nodes

thru cordon, delete, rollout restart, we rescheduled workload and it's all unbalanced now.

action items

understand how does kubernetes decide which node to schedule on. without cordon, the pods will still be rescheduled to node with less free memory, why?
investigate whether we can set limit on the pods of truewatch. does limit mess with CI/CD, does limit affect performance?

worklog 06-19 ​

symptoms ​

analysis ​

us1 ​

action items ​

ap1 ​

action items ​

worklog 06-19

symptoms

analysis

us1

action items

ap1

action items