How to Use Kubernetes Taints and Tolerations to Avoid Undesirable Scheduling

Kubernetes logo

Taints and tolerations are a Kubernetes mechanism for controlling how Pods schedule to the Nodes in your cluster. Taints are applied to Nodes and act as a repelling barrier against new Pods. Tainted Nodes will only accept Pods that have been marked with a corresponding toleration.

Taints are one of the more advanced Kubernetes scheduling mechanisms. They facilitate many different use cases where you want to prevent Pods ending up on undesirable Nodes. In this article, you’ll learn what taints and tolerations are and how you can utilize them in your own cluster.

How Scheduling Works

Kubernetes is a distributed system where you can deploy containerized applications (Pods) across multiple physical hosts (Nodes). When you create a new Pod, Kubernetes needs to determine the set of Nodes it can be placed on. This is what scheduling refers to.

The scheduler considers many different factors to establish a suitable placement for each Pod. It’ll default to selecting a Node that can provide sufficient resources to satisfy the Pod’s CPU and memory requests.

The selected Node won’t necessarily be appropriate for your deployment though. It could lack required hardware or be reserved for development use. Node taints are a mechanism for enforcing these constraints by preventing arbitrary assignation of Pods to Nodes.

Taint Use Cases

Tainting a Node means it will start to repel Pods, forcing the scheduler to consider the next candidate Node instead. You can overcome the taint by setting a matching toleration on the Pod. This provides a mechanism for allowing specific Pods onto the Node.

Taints are often used to keep Pods away from Nodes that are reserved for specific purposes. Some Kubernetes clusters might host several environments, such as staging and production. In this situation you’ll want to prevent staging deployments from ending up on the dedicated production hardware.

You can achieve the desired behavior by tainting the production Node and setting a matching toleration on production Pods. Staging Pods will be confined to the other Nodes in your cluster, preventing them from consuming production resources.

Taints can also help distinguish between Nodes with particular hardware. Operators might deploy a subset of Nodes with dedicated GPUs for use with AI workloads. Tainting these Nodes ensures Pods that don’t need the GPU can’t schedule onto them.

Taint Effects

Each Node taint can have one of three different effects on Kubernetes scheduling decisions:

  • NoSchedule – Pods that lack a toleration for the taint won’t be scheduled onto the Node. Pods already scheduled to the Node aren’t affected, even if they don’t tolerate the taint.
  • PreferNoSchedule – Kubernetes will avoid scheduling Pods without the taint’s toleration. The Pod could still be scheduled to the Node as a last resort option. This does not affect existing Pods.
  • NoExecute – This functions similarly to NoSchedule except that existing Pods are impacted too. Pods without the toleration will be immediately evicted from the Node, causing them to be rescheduled onto other Nodes in your cluster.

The NoExecute effect is useful when you’re changing the role of a Node that’s already running some workloads. NoSchedule is more appropriate if you want to guard the Node against receiving new Pods, without disrupting existing deployments.

Tainting a Node

Taints are applied to Nodes using the kubectl taint command. It takes the name of the target Node, a key and value for the taint, and an effect.

Here’s an example of tainting a Node to allocate it to a specific environment:

$ kubectl taint nodes demo-node env=production:NoSchedule
node/demo-node tainted

You can apply multiple taints to a Node by repeating the command. The key value is optional – you can create binary taints by omitting it:

$ kubectl taint nodes demo-node has-gpu:NoSchedule

To remove a previously applied taint, repeat the command but append a hyphen (-) to the effect name:

$ kubectl taint nodes demo-node has-gpu:NoSchedule-
node/demo-node untainted

This will delete the matching taint if it exists.

You can retrieve a list of all the taints applied to a Node using the describe command. The taints will be shown near the top of the output, after the Node’s labels and annotations:

$ kubectl describe node demo-node
Name:   demo-node
...
Taints: env=production:NoSchedule
...

Adding Tolerations to Pods

The example above tainted demo-node with the intention of reserving it for production workloads. The next step is to add an equivalent toleration to your production Pods so that they’re permitted to schedule onto the Node.

Pod tolerations are declared in the spec.tolerations manifest field:

apiVersion: v1
kind: Pod
metadata:
  name: api
spec:
  containers:
    - name: api
      image: example.com/api:latest
  tolerations:
    - key: env
      operator: Equals
      value: production
      effect: NoSchedule

This toleration allows the api Pod to schedule to Nodes that have an env taint with a value of production and NoSchedule as the effect. The example Pod can now be scheduled to demo-node.

To tolerate taints without a value, use the Exists operator instead:

apiVersion: v1
kind: Pod
metadata:
  name: api
spec:
  containers:
    - name: api
      image: example.com/api:latest
  tolerations:
    - key: has-gpu
      operator: Exists
      effect: NoSchedule

The Pod now tolerates the has-gpu taint, whether or not a value has been set.

Tolerations do not require that the Pod is scheduled to a tainted Node. This is a common misconception around taints and tolerations. The mechanism only says that a Node can’t host a Pod; it does not express the alternative view that a Pod must be placed on a particular Node. Taints are commonly combined with affinities to achieve this bi-directional behavior.

Taint and Toleration Matching Rules

Tainted Nodes only receive Pods that tolerate all of their taints. Kubernetes first discovers the taints on the Node, then filters out taints that are tolerated by the Pod. The effects requested by the remaining set of taints will be applied to the Pod.

There’s a special case for the NoExecute effect. Pods that tolerate this kind of taint will usually get to stay on the Node after the taint is applied. You can modify this behavior so that Pods are voluntarily evicted after a given time, despite tolerating the trait:

apiVersion: v1
kind: Pod
metadata:
  name: api
spec:
  containers:
    - name: api
      image: example.com/api:latest
  tolerations:
    - key: env
      operator: Equals
      value: production
      effect: NoExecute
      tolerationSeconds: 900

A Node that’s hosting this Pod but is subsequently tainted with env=production:NoExecute will allow the Pod to remain present for up to 15 minutes after the taint’s applied. The Pod will then be evicted despite having the NoExecute toleration.

Automatic Taints

Nodes are automatically tainted by the Kubernetes control plane to evict Pods and prevent scheduling when resource contention occurs. Taints such as node.kubernetes.io/memory-pressure and node.kubernetes.io/disk-pressure mean Kubernetes is blocking the Node from taking new Pods because it lacks sufficient resources.

Other commonly applied taints include node.kubernetes.io/not-ready, when a new Node isn’t accepting Pods, and node.kubernetes.io/unschedulable. The latter is applied to cordoned Nodes to halt all Pod scheduling activity.

These taints implement the Kubernetes eviction and Node management systems. You don’t normally need to think about them and you shouldn’t manage these taints manually. If you see them on a Node, it’s because Kubernetes has applied them in response to changing conditions or another command you’ve issued. It is possible to create Pod tolerations for these taints but doing so could lead to resource exhaustion and unexpected behavior.

Summary

Taints and tolerations are a mechanism for repelling Pods away from individual Kubernetes Nodes. They help you avoid undesirable scheduling outcomes by preventing Pods from being automatically assigned to arbitrary Nodes.

Tainting isn’t the only mechanism that provides control over scheduling behavior. Pod affinities and anti-affinities are a related technique for constraining the Nodes that can receive a Pod. Affinity can also be defined at an inter-Pod level, allowing you to make scheduling decisions based on the Pods already running on a Node. You can combine affinity with taints and tolerations to set up advanced scheduling rules.

Leave a Reply