kubernetes node disk space pressure
what happened
got alert of disk usage is high.
what did you do to investigate
tried using
bash
kubectl get --raw "/api/v1/nodes/<node-name>/proxy/stats/summary" \
| jq -r '.pods[]
| [.podRef.namespace, .podRef.name,
(.containers[]?.rootfs.usedBytes //0 )]
| @tsv' \
| sort -k3 -nr | head -15but my rancher account don't have the right RBAC for this info, i get this error
text
Error from server (Forbidden): nodes "ip-172-31-77-252.us-west-2.compute.internal" is forbidden: User "u-7vstncbuoi" cannot get resource "nodes/proxy" in API group "" at the cluster scopethen i went to look at the metrics stored on self-observe, using this dql i can find the emphemeral disk usage of each pod
dql
M::`kube_pod`:(`ephemeral_storage_used_bytes` AS `usage`) { `guance_site` = 'us1' and `node_name` = 'ip-172-31-77-252.us-west-2.compute.internal' } BY `pod_name`this helps me locate the problem to a pod worker-9-5d7c9559db-7j2pb -n func2
i did kubectl rollout restart deploy worker-9 -n func2 and solved the problem
what's next
im guessing there is some weird stuff happening in worker-9, maybe we need to see if we can limit the amount of emphemeral storage available to worker-9
as for why worker-9 is has growing emphemeral storage, chatgpt has these hypothesis
next: we can add the ephemeral-storage request/limit (and emptyDir.sizeLimit if relevant)