-
Notifications
You must be signed in to change notification settings - Fork 43
Pipeline pods cannot be scheduled in Drone for Kubernetes #72
Comments
I opened an issue on github.com/drone/drone, and just realized that this already exist. We are facing the same problem. This is the text of my issue, because maybe adds some more info: ProblemWhen Drone configures affinity of pipelines' pods, it sets this chunk of code: spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- ip-10-100-45-161.eu-west-1.compute.internal Correct me if I'm wrong, but this affinity seems to be set based on pipeline's pod kubectl get nodes -L kubernetes.io/hostname
NAME STATUS ROLES AGE VERSION HOSTNAME
ip-10-100-15-22.eu-west-1.compute.internal Ready <none> 12d v1.12.7 ip-10-100-15-22
ip-10-100-45-161.eu-west-1.compute.internal Ready <none> 121m v1.12.7 ip-10-100-45-161
ip-10-100-63-36.eu-west-1.compute.internal Ready <none> 11d v1.12.7 ip-10-100-63-36 So, when the pipeline launches a new namespace to run all the steps of the pipeline, child jobs have the upper affinity, and never find a node to schedule the pod. Possible solutionYou can set your nodes label But maybe Drone should use another type of affinity, or none affinity at all, but I assume that the affinity setting is there for a reason. Let me know if you need more information. Thank you for everything! |
I've been trying some workarounds, and be able to fix this issue without touching Drone configuration. I'm using EKS Terraform Module to create our cluster and auto scaling groups. Due to the fact that # Code from Terraform EKS Module
worker_groups = [
{
key_name = "${aws_key_pair.key.key_name}"
name = "workers-m5"
# pre_userdata sets hostname equal to dns
pre_userdata = "hostnamectl set-hostname $$( cat /etc/hostname ).${var.aws_region}.compute.internal"
kubelet_extra_args = "--node-labels=stage=${var.stage}"
asg_desired_capacity = 2
asg_max_size = 2
asg_min_size = 1
instance_type = "m5.large"
enable_monitoring = false
public_ip = false
autoscaling_enabled = false
}
] New nodes associated with this auto scaling group will be created with the same node name, and Kubernetes label ''kubernetes.io/hostname`. Hope this will be helpful to someone. :) |
@iron-sam sam I'm using this workaround you mentioned and it works:
But we are using Rancher to manage the Kubernetes cluster, we might have issues when I scale up and down the cluster nodes. |
@totogo I don't use Rancher, but the idea is to set the host name of the server to the DNS assigned by AWS + region + |
Rancher does have a hostname override option. |
We are overhauling the kubernetes runtime, and should have an alpha available end of next week. The runtime is standalone and is being developed at https://github.com/drone-runners/drone-runner-kube This new implementation is better aligned with kubernetes and executes all pipeline steps in a single pod, similar to Tekton, as opposed to executing each step in its own pod. Since all steps are executed in the same pod we no longer need to rely on affinity, which means this issue goes away :) |
closing this since docs for the new (still experimental) kubernetes runner will be posted today for early adopters. This iteration of the runner does not use node affinity and instead runs all pipeline steps in the same pod. |
The this part of code generated a node selector which is not fully correct in some environments, causing the pod can't be scheduled. So the pipeline always stuck on Clone stage.
drone-runtime/engine/kube/kube.go
Line 144 in 7884b81
The issue is also described here:
https://discourse.drone.io/t/drone-on-k8s-failedscheduling/3854/10
According to Kubernetes document:
The text was updated successfully, but these errors were encountered: