Identifying Kubernetes Evictions
November 7, 2018
ops
kubernetes
For personal projects, I run a single node “cluster” on an OVH node, with kubeadm. Over the last couple months, I’ve been having repeated issues with pod evictions, which are typically not an issue, except when affecting critical pods, such as the kube-apiserver
and kube-scheduler
. Despite these being marked as critical pods, they were still being evicted due to disk pressure. After fighting with pod evictions for a while, it became clear that I wasn’t actually solving the underlying problem; this machine has 2TB of disk space… disk pressure evictions should not be occurring.
What is going on?
The first step here was the determine the reason for the evictions in the first place; ie, what trigger is being hit to begin the evictions. By describe
-ing one of the evicted pods, we can confirm that the reason is due to disk pressure. By looking in the kubelet config (/var/lib/kubelet/config.yml
), we can find the evictionThresholds:
evictionHard:
imagefs.available: 15%
memory.available: 100Mi
nodefs.available: 10%
nodefs.inodesFree: 5%
This tells us the levels at which the kubelet will begin evictions, but we still need to see why we are close to that level.
For this we can use df -h
and df -i
in order to see the disk usage (in human read-able format), and inode usage.
In my case, df -h
gave the following (culled for readability):
root@ns532161:/var/lib/docker# df -h
Filesystem Size Used Avail Use% Mounted on
udev 32G 0 32G 0% /dev
tmpfs 6.3G 1.2G 5.2G 18% /run
/dev/md3 20G 16G 2.9G 85% /
shm 64M 3.5M 61M 6% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/md2 487M 56M 403M 13% /boot
/dev/md4 1.8T 377G 1.4T 22% /home
…
none 20G 16G 2.9G 85% /var/lib/docker/aufs/mnt/27f50....
none 20G 16G 2.9G 85% /var/lib/docker/aufs/mnt/4dc06....
none 20G 16G 2.9G 85% /var/lib/docker/aufs/mnt/acc09....
We can see that the docker AUFS storage driver is running off the main system mount (/dev/md3
), and not our 2TB of storage (/dev/md4
). This is leaving us with an 85% disk usage, thus triggering evictions as there is only 15% free.
Changing the AUFS root
Quick note on AUFS. AUFS is the (previous) default storage driver for Docker on Ubuntu. You can find more information on the AUFS storage driver here.
In order to correct this I took the approach of changing the docker AUFS root directory, and simply restarting all of the services.
As this was mostly a toy environment, I didn’t have any seriously reliability or data loss concerns. Definitely DO NOT do this in a production environment. Instead, cycle your nodes safely in a controlled fashion.
You can see the docker root with docker info
:
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 494
Dirperm1 Supported: true
This confirms (again) that we were not properly using our 2TB mount (/dev/md4), which is mounted at /home
.
The quick approach here was to simply change the docker root to something under /home
, in this case we used /home/docker-aufs
. Doing this was fairly straight forward by using the instructions below:
This option is preferred as directly editing .service
files should be avoided. They may be overwritten during an update for example.
vi /etc/systemd/system/docker.service.d/docker.root.conf
and populate with:
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd -g /new/docker/root -H fd://
systemctl daemon-reload
systemctl restart docker
docker info
- verify the root dir has updated
Note - Existing Containers and Images
If you already have containers or images in /var/lib/docker
you may wish to stop and back these up before moving them to the new root location. Moving can be done by either rsync -a /var/lib/docker/* /path/to/new/root
or if permissions do not matter, you can simply use mv or cp too.
Note that the above instructions were originally taken from here.
After restarting the underlying docker engine, and waiting for our cluster to recover (as everything needed to be rescheduled), our cluster was up again and no longer suffering from pod evictions.
Note: the
kube-scheduler
did have some problems restarting, as it was stuck in CrashLoopBackOff due to a port conflict. I believe this was due to not properly shutting down the running containers when when docker was restarted, however I am not sure. After a couple attempts, the container properly started.
I hope this was helpful for you. If you have any questions, don’t hesitate to shoot me an email, or follow me on twitter @nrmitchi