kubernetes node disk space pressure

what happened

got alert of disk usage is high.

what did you do to investigate

tried using

bash

kubectl get --raw "/api/v1/nodes/<node-name>/proxy/stats/summary" \
| jq -r '.pods[]
    | [.podRef.namespace, .podRef.name,
       (.containers[]?.rootfs.usedBytes //0 )] 
    | @tsv' \
| sort -k3 -nr | head -15

but my rancher account don't have the right RBAC for this info, i get this error

text

Error from server (Forbidden): nodes "ip-172-31-77-252.us-west-2.compute.internal" is forbidden: User "u-7vstncbuoi" cannot get resource "nodes/proxy" in API group "" at the cluster scope

then i went to look at the metrics stored on self-observe, using this dql i can find the emphemeral disk usage of each pod

dql

M::`kube_pod`:(`ephemeral_storage_used_bytes` AS `usage`) { `guance_site` = 'us1' and `node_name` = 'ip-172-31-77-252.us-west-2.compute.internal' } BY `pod_name`

this helps me locate the problem to a pod worker-9-5d7c9559db-7j2pb -n func2

i did kubectl rollout restart deploy worker-9 -n func2 and solved the problem

what's next

im guessing there is some weird stuff happening in worker-9, maybe we need to see if we can limit the amount of emphemeral storage available to worker-9

as for why worker-9 is has growing emphemeral storage, chatgpt has these hypothesis

next: we can add the ephemeral-storage request/limit (and emptyDir.sizeLimit if relevant)

kubernetes node disk space pressure ​

what happened ​

what did you do to investigate ​

what's next ​

kubernetes node disk space pressure

what happened

what did you do to investigate

what's next