Identifying Kubernetes Evictions

November 7, 2018
ops kubernetes

For personal projects, I run a single node “cluster” on an OVH node, with kubeadm. Over the last couple months, I’ve been having repeated issues with pod evictions, which are typically not an issue, except when affecting critical pods, such as the kube-apiserver and kube-scheduler. Despite these being marked as critical pods, they were still being evicted due to disk pressure. After fighting with pod evictions for a while, it became clear that I wasn’t actually solving the underlying problem; this machine has 2TB of disk space… disk pressure evictions should not be occurring.

What is going on?

The first step here was the determine the reason for the evictions in the first place; ie, what trigger is being hit to begin the evictions. By describe-ing one of the evicted pods, we can confirm that the reason is due to disk pressure. By looking in the kubelet config (/var/lib/kubelet/config.yml), we can find the evictionThresholds:

evictionHard:
  imagefs.available: 15%
  memory.available: 100Mi
  nodefs.available: 10%
  nodefs.inodesFree: 5%

This tells us the levels at which the kubelet will begin evictions, but we still need to see why we are close to that level.

For this we can use df -h and df -i in order to see the disk usage (in human read-able format), and inode usage.

In my case, df -h gave the following (culled for readability):

root@ns532161:/var/lib/docker# df -h
Filesystem      Size  Used Avail Use% Mounted on
udev             32G     0   32G   0% /dev
tmpfs           6.3G  1.2G  5.2G  18% /run
/dev/md3         20G   16G  2.9G  85% /
shm              64M  3.5M   61M   6% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs            32G     0   32G   0% /sys/fs/cgroup
/dev/md2        487M   56M  403M  13% /boot
/dev/md4        1.8T  377G  1.4T  22% /home
…
none             20G   16G  2.9G  85% /var/lib/docker/aufs/mnt/27f50....
none             20G  16G   2.9G  85% /var/lib/docker/aufs/mnt/4dc06....
none             20G  16G   2.9G  85% /var/lib/docker/aufs/mnt/acc09....

We can see that the docker AUFS storage driver is running off the main system mount (/dev/md3), and not our 2TB of storage (/dev/md4). This is leaving us with an 85% disk usage, thus triggering evictions as there is only 15% free.

Changing the AUFS root

Quick note on AUFS. AUFS is the (previous) default storage driver for Docker on Ubuntu. You can find more information on the AUFS storage driver here.

In order to correct this I took the approach of changing the docker AUFS root directory, and simply restarting all of the services.

As this was mostly a toy environment, I didn’t have any seriously reliability or data loss concerns. Definitely DO NOT do this in a production environment. Instead, cycle your nodes safely in a controlled fashion.

You can see the docker root with docker info:

Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 494
 Dirperm1 Supported: true

This confirms (again) that we were not properly using our 2TB mount (/dev/md4), which is mounted at /home.

The quick approach here was to simply change the docker root to something under /home, in this case we used /home/docker-aufs. Doing this was fairly straight forward by using the instructions below:

This option is preferred as directly editing .service files should be avoided. They may be overwritten during an update for example.

vi /etc/systemd/system/docker.service.d/docker.root.conf and populate with:

[Service]
ExecStart=
ExecStart=/usr/bin/dockerd -g /new/docker/root -H fd://

systemctl daemon-reload
systemctl restart docker
docker info - verify the root dir has updated

Note - Existing Containers and Images
If you already have containers or images in /var/lib/docker you may wish to stop and back these up before moving them to the new root location. Moving can be done by either rsync -a /var/lib/docker/* /path/to/new/root or if permissions do not matter, you can simply use mv or cp too.

Note that the above instructions were originally taken from here.

After restarting the underlying docker engine, and waiting for our cluster to recover (as everything needed to be rescheduled), our cluster was up again and no longer suffering from pod evictions.

Note: the kube-scheduler did have some problems restarting, as it was stuck in CrashLoopBackOff due to a port conflict. I believe this was due to not properly shutting down the running containers when when docker was restarted, however I am not sure. After a couple attempts, the container properly started.

I hope this was helpful for you. If you have any questions, don’t hesitate to shoot me an email, or follow me on twitter @nrmitchi

What is going on?

Changing the AUFS root

Read more