Pipeline pods cannot be scheduled in Drone for Kubernetes #72

totogo · 2019-05-30T03:46:14Z

The this part of code generated a node selector which is not fully correct in some environments, causing the pod can't be scheduled. So the pipeline always stuck on Clone stage.

drone-runtime/engine/kube/kube.go

Line 144 in 7884b81

Values: []string{e.node},

The issue is also described here:
https://discourse.drone.io/t/drone-on-k8s-failedscheduling/3854/10

According to Kubernetes document:

Note: The value of these labels is cloud provider specific and is not guaranteed to be reliable. For example, the value of kubernetes.io/hostname may be the same as the Node name in some environments and a different value in other environments.

iron-sam · 2019-06-11T14:37:53Z

I opened an issue on github.com/drone/drone, and just realized that this already exist. We are facing the same problem. This is the text of my issue, because maybe adds some more info:

Problem

When Drone configures affinity of pipelines' pods, it sets this chunk of code:

spec:
  affinity:                                                                                                                                                                                                                                                                                                                  
    nodeAffinity:                                                                                                                                                                                                                                                                                                            
      requiredDuringSchedulingIgnoredDuringExecution:                                                                                                                                                                                                                                                                        
        nodeSelectorTerms:                                                                                                                                                                                                                                                                                                   
        - matchExpressions:                                                                                                                                                                                                                                                                                                  
          - key: kubernetes.io/hostname                                                                                                                                                                                                                                                                                      
            operator: In                                                                                                                                                                                                                                                                                                     
            values:                                                                                                                                                                                                                                                                                                          
            - ip-10-100-45-161.eu-west-1.compute.internal

Correct me if I'm wrong, but this affinity seems to be set based on pipeline's pod spec.nodeName. On AWS EKS, worker nodes names are tagged like ${HOST_DNS}.${AWS_REGION}.compute.internal, but Kubernetes' label kubernetes.io/hostname is set to the actual host name of the EC2 instance, ${HOST_DNS}, e.g.:

kubectl get nodes -L kubernetes.io/hostname
NAME                                          STATUS   ROLES    AGE    VERSION   HOSTNAME
ip-10-100-15-22.eu-west-1.compute.internal    Ready    <none>   12d    v1.12.7   ip-10-100-15-22
ip-10-100-45-161.eu-west-1.compute.internal   Ready    <none>   121m   v1.12.7   ip-10-100-45-161
ip-10-100-63-36.eu-west-1.compute.internal    Ready    <none>   11d    v1.12.7   ip-10-100-63-36

So, when the pipeline launches a new namespace to run all the steps of the pipeline, child jobs have the upper affinity, and never find a node to schedule the pod.

Possible solution

You can set your nodes label kubernetes.io/hostname to the node name manually, and Drone would launch pods without errors.

But maybe Drone should use another type of affinity, or none affinity at all, but I assume that the affinity setting is there for a reason.

Let me know if you need more information.

Thank you for everything!

iron-sam · 2019-06-12T12:56:15Z

I've been trying some workarounds, and be able to fix this issue without touching Drone configuration.

I'm using EKS Terraform Module to create our cluster and auto scaling groups. Due to the fact that nodeName != hostname when worker nodes are created, I added one line to user_data to set hostname equal to nodeName (like ip-XXX-XXX-XXX-XXX.${REGION}.compute.internal), just like this:

# Code from Terraform EKS Module
  worker_groups = [
    {
      key_name             = "${aws_key_pair.key.key_name}"
      name                 = "workers-m5"
      # pre_userdata sets hostname equal to dns
      pre_userdata         = "hostnamectl set-hostname $$( cat /etc/hostname ).${var.aws_region}.compute.internal" 
      kubelet_extra_args   = "--node-labels=stage=${var.stage}"
      asg_desired_capacity = 2
      asg_max_size         = 2
      asg_min_size         = 1
      instance_type        = "m5.large"
      enable_monitoring    = false
      public_ip            = false
      autoscaling_enabled  = false
    }
]

New nodes associated with this auto scaling group will be created with the same node name, and Kubernetes label ''kubernetes.io/hostname`.

Hope this will be helpful to someone. :)

totogo · 2019-06-18T11:42:01Z

@iron-sam sam I'm using this workaround you mentioned and it works:

You can set your nodes label kubernetes.io/hostname to the node name manually, and Drone would launch pods without errors.

But we are using Rancher to manage the Kubernetes cluster, we might have issues when I scale up and down the cluster nodes.

iron-sam · 2019-06-24T09:50:11Z

@totogo I don't use Rancher, but the idea is to set the host name of the server to the DNS assigned by AWS + region + compute.internal. Check if Rancher allows adding user data script on instance creation.

HighwayofLife · 2019-11-02T18:55:39Z

Rancher does have a hostname override option.

bradrydzewski · 2019-11-02T19:12:40Z

We are overhauling the kubernetes runtime, and should have an alpha available end of next week. The runtime is standalone and is being developed at https://github.com/drone-runners/drone-runner-kube

This new implementation is better aligned with kubernetes and executes all pipeline steps in a single pod, similar to Tekton, as opposed to executing each step in its own pod. Since all steps are executed in the same pod we no longer need to rely on affinity, which means this issue goes away :)

bradrydzewski · 2019-11-05T17:34:02Z

closing this since docs for the new (still experimental) kubernetes runner will be posted today for early adopters. This iteration of the runner does not use node affinity and instead runs all pipeline steps in the same pod.
https://github.com/drone-runners/drone-runner-kube

noahingh mentioned this issue Jul 14, 2019

Resolve pod could be scheduled by node name. #77

Open

bradrydzewski closed this as completed Nov 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipeline pods cannot be scheduled in Drone for Kubernetes #72

Pipeline pods cannot be scheduled in Drone for Kubernetes #72

totogo commented May 30, 2019

iron-sam commented Jun 11, 2019

iron-sam commented Jun 12, 2019

totogo commented Jun 18, 2019

iron-sam commented Jun 24, 2019

HighwayofLife commented Nov 2, 2019

bradrydzewski commented Nov 2, 2019

bradrydzewski commented Nov 5, 2019

Pipeline pods cannot be scheduled in Drone for Kubernetes #72

Pipeline pods cannot be scheduled in Drone for Kubernetes #72

Comments

totogo commented May 30, 2019

iron-sam commented Jun 11, 2019

Problem

Possible solution

iron-sam commented Jun 12, 2019

totogo commented Jun 18, 2019

iron-sam commented Jun 24, 2019

HighwayofLife commented Nov 2, 2019

bradrydzewski commented Nov 2, 2019

bradrydzewski commented Nov 5, 2019