The OpenShift anyuid SCC and its effects on storage
Working for IBM on the Cloud Pak for Integration, recently I’ve been spending a lot of time
with operators, and how to make running software on OpenShift as easy as possible. I came
across a situation where when running under the restricted
SCC, our pods behave, but under
the less restrictive anyuid
SCC, they can fail to access storage. This post is an exploration
of why that is, and a suggestion of how to address it.
A lot of my understanding of how SCCs affect UIDs came from the blog post A Guide to OpenShift and UIDs by William Caban Babilonia, its a fantastic resource.
The OpenShift documentation on Managing security context constraints is also good for understanding the different SCC settings and strategies.
Using the restricted SCC
Best practice when creating an OpenShift pod is to use the restricted
SCC. The restricted
SCC adds a number of security features to your running pods, including running as a random
user and group ID that cannot clash with other namespaces in your cluster.
To take advantage of those features, when creating pods you do not specify the user and group
ID so that OpenShift can assign one. This all works perfectly when the restricted
SCC is in
use, because the assignment will be done on pod admission so the metadata is all there for
Kubernetes to use when mounting storage volumes.
When Kubernetes mounts storage volumes to pods, it will (caveat: not for all provisioners!)
chmod
and chown
the volume recursively to allow the pod to access the files. It does this
based (partly) on the fsGroup
provided in the pod’s securityContext
. Under the restricted
SCC, if you don’t specify the fsGroup
yourself, one will be provided based on the allocated
range for the namespace.
What’s different with anyuid?
The anyuid
SCC enables an important use case for OpenShift - running a pod under the user
and group defined in the image. In a plain Kubernetes distribution, you can do this by not
specifying your own values in the pod spec. In OpenShift, if you don’t specify values in the
pod spec and the restricted
SCC is applied, it will provide defaults, so the image definition
will not be used.
Generally, this is a good thing, and a more secure way to run the containers, but there are
situations where you need to run using the image definition. To achieve this, you can apply
the anyuid
SCC to the pod, which has a higher priority, and doesn’t provide defaults to the
pod when it is applied.
Unfortunately, if your pod is relying on defaults to be applied to successfully chmod
and
chown
your volumes on mount, then applying the anyuid
SCC to a pod designed for the
restricted
SCC can mean you can no longer access mounted storage.
What, as operator developers, can we do about it?
Ideally, we would know in advance which SCC would be applied, and could modify our resources to specify metadata if we knew OpenShift wasn’t going to do it for us, but unfortunately, there’s no easy way for the operator to know which SCC will be applied to a pod in advance, and it could change over time anyway.
When admitting pods under the restricted
SCC, if the fsGroup
is not already set, the first
group in the range defined by the openshift.io/sa.scc.supplemental-groups
annotation on the
namespace will be inserted into the pod definition. This is the part that doesn’t happen with
the anyuid
SCC.
To make our pod compatible with both SCCs, what we can do is examine the annotation ourselves,
and set the fsGroup
for our pod to the first group in the range. This will satisfy the
restricted
SCC, because the group is in range, and will provide the metadata for Kubernetes
to prepare the mount under the anyuid
SCC.
Can you back that up?
I ran some tests, and you can too, using my handy test repo:
https://github.com/Jamstah/write-test-container
I ran these tests on a Red Hat OpenShift on IBM Cloud cluster using ibmc-block-gold
storage with
OpenShift 4.6.36. The important thing to note is that the volume is always writable where the fsGroup
has been defined within the range of groups for the namespace.
The scc-nonroot-no-context
job is an example of a pod designed for the restricted
SCC only.
The scc-anyuid-no-context
job shows what happens when a pod designed for the restricted
SCC is
run using the anyuid
SCC.
The scc-nonroot-fsgroupproject
and scc-anyuid-fsgroupproject
jobs show how adding an fsGroup
within the correct range can solve the problem under both SCCs.
Here are my results:
Job | Job spec fsGroup | Pod Admitted | Effective SCC | Pod spec fsGroup | UID | GID | Groups | Volume UID | Volume GID | Volume perms | Writable | Written UID | Written GID | Written perms |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
scc-anyuid-fsgroup0 | 0 | Yes | anyuid | 0 | uid-1001(1001) | gid-0(root) | groups-0(root) | root | root | drwxrwsr-x. | Yes | 1001 | root | -rw-r–r–. |
scc-anyuid-fsgroupproject | 1000670000 | Yes | anyuid | 1000670000 | uid-1001(1001) | gid-0(root) | groups-0(root),1000670000 | root | 1000670000 | drwxrwsr-x. | Yes | 1001 | 1000670000 | -rw-r–r–. |
scc-anyuid-no-context | Yes | anyuid | uid-1001(1001) | gid-0(root) | groups-0(root) | root | root | drwxr-xr-x. | No | |||||
scc-default-fsgroup0 | 0 | No | ||||||||||||
scc-default-fsgroupproject | 1000670000 | Yes | restricted | 1000670000 | uid-1000670000(1000670000) | gid-0(root) | groups-0(root),1000670000 | root | 1000670000 | drwxrwsr-x. | Yes | 1000670000 | 1000670000 | -rw-rw-r–. |
scc-default-no-context | Yes | restricted | 1000670000 | uid-1000670000(1000670000) | gid-0(root) | groups-0(root),1000670000 | root | 1000670000 | drwxrwsr-x. | Yes | 1000670000 | 1000670000 | -rw-rw-r–. | |
scc-nonroot-fsgroup0 | 0 | Yes | nonroot | 0 | uid-1001(1001) | gid-0(root) | groups-0(root) | root | root | drwxrwsr-x. | Yes | 1001 | root | -rw-r–r–. |
scc-nonroot-fsgroupproject | 1000670000 | Yes | restricted | 1000670000 | uid-1000670000(1000670000) | gid-0(root) | groups-0(root),1000670000 | root | 1000670000 | drwxrwsr-x. | Yes | 1000670000 | 1000670000 | -rw-r–r–. |
scc-nonroot-no-context | Yes | restricted | 1000670000 | uid-1000670000(1000670000) | gid-0(root) | groups-0(root),1000670000 | root | 1000670000 | drwxrwsr-x. | Yes | 1000670000 | 1000670000 | -rw-r–r–. |
- Job
- Name of the job from the kustomize, includes the name of the SCC (above the default - restricted) that the service account has access to.
- Job spec fsGroup
- The fsGroup applied in the pod template in the job spec
- Pod Admitted
- Was the pod allowed to run on the cluster?
- Effective SCC
- The SCC that OpenShift used to admit the pod
- Pod spec fsGroup
- The fsGroup in the pod spec after admission (to see if OpenShift changed it
- UID
- The UID of the user in the container
- GID
- The GID of the user in the container
- Groups
- The additional groups of the user in the container
- Volume UID
- The UID of the root dir of the volume
- Volume GID
- The GID of the root dir of the volume
- Volume perms
- The permissions applies to the root dir of the volume
- Writable
- Could the container write to storage?
- Written UID
- What UID was the file written with?
- Written GID
- What GID was the file written with?
- Written perms
- What permissions was the file written with (controlled by umask)