We have a couple of pods that get pretty intense memory use. It’s possible there is a memory leak because it OOM’d a node due to a spike in usage. One of our temporary solutions while we investigate is to put a pod anti-affinity rule in the deployment. What this means is that we don’t like it when pod instances run on the same node. That way we spread them out.
As we may have more pods then nodes, we don’t want to make this a blocking requirement. Therefore we use the preferredDuringSchedulingIgnoredDuringExecution
rule:
affinity:
# schedule pods on different nodes
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- pod-name
topologyKey: kubernetes.io/hostname
Here, the label on the pods are app.kubernetes.io/name: pod-name
This will now make it so the pods aren’t up on the same place.
You can see this with:
kubectl get pods -o wide
And it will show that pods that have the same name are scheduled on different nodes.
For more information, see: https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#inter-pod-affinity-and-anti-affinity