The alert

Node has a filesystem that is not read-only which is (a) above 60% usage and (b) is expected to reach 100% usage in 24h.

alert: NodeFilesystemSpaceFillingUp
expr: |
    (
        node_filesystem_avail_bytes{fstype!="",job="node-exporter"} / node_filesystem_size_bytes{fstype!="",job="node-exporter"} * 100 < 40
        and
        predict_linear(node_filesystem_avail_bytes{fstype!="",job="node-exporter"}[6h], 24 * 60 * 60) < 0
        and
        node_filesystem_readonly{fstype!="",job="node-exporter"} == 0
    )
for: 1h

The investigation

The disk is above 80% utilization:

# disk usage above 80%
[root@ip-172-23-91-119 docker]# df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p1  100G   81G   20G  81% /

~65% of disk usage is due to Docker images:

[root@ip-172-23-91-119 docker]# docker system df
TYPE                TOTAL               ACTIVE              SIZE                RECLAIMABLE
Images              245                 65                  65.34GB             55.75GB (85%)
Containers          247                 163                 16.37MB             2.978MB (18%)
Local Volumes       2                   2                   32.37kB             0B (0%)
Build Cache         0                   0                   0B                  0B

The node is running with kubelet default eviction thresholds:

--eviction-minimum-reclaim="nodefs.available=0,imagefs.available=0"
--eviction-hard="nodefs.available<10%,imagefs.available<15%"
--eviction-soft=""

When eviction thresholds are met, resources will be reclaimed:

  • nodefs threshold (nodefs.available<10%): ~3GB (~3%) of disk will be freed (plus those containers images).
  • imagefs threshold (imagefs.available<15%): ~55GB (~55%) of disk will be freed.

The resolution

The linear prediction for 24h does not take into consideration the node’s ability to reclaim space when the disk usage reches the thresholds (~85% disk usage). There’s nothing to worry about UNTIL disk is at least 85% full.