worklog 06-19
symptoms
ap1 and us1 cluster have memory pressure all the time
analysis
us1
- memory pressure coming from two nodes in scopedb-ingest nodepool
text
> kubectl top pods -nguanced | grep scopedb-ingest
scopedb-ingest-5bcc5c7c48-fcwz5 1285m 12978Mi
scopedb-ingest-5bcc5c7c48-ftw6b 1516m 1688Mi
scopedb-ingest-5bcc5c7c48-kz4cd 1135m 13144Mican tell that memory pressure is unbalanced on 3 pods the avg of the memory usages still below hpa threshold, thus not triggering scaleout, while two nodes having memory pressure.
after restarting deployment, all memory comes down to 2Gb, no more mem pressure
action items
- figure out why memory pressure unbalanced in the first place.
- why is there no memory limit on this thing?
ap1
- pods in
prd-businessare scheduled unbalanced on different nodes in the node pool
we figured out which node has the pressure, which still with enough memory we also found out how to find the pod consuming the most memory on each nodes
thru cordon, delete, rollout restart, we rescheduled workload and it's all unbalanced now.
action items
- understand how does kubernetes decide which node to schedule on. without cordon, the pods will still be rescheduled to node with less free memory, why?
- investigate whether we can set limit on the pods of truewatch. does limit mess with CI/CD, does limit affect performance?