Skip to content
This repository has been archived by the owner on Jun 23, 2020. It is now read-only.

Pipeline pods cannot be scheduled in Drone for Kubernetes #72

Closed
totogo opened this issue May 30, 2019 · 7 comments · May be fixed by #77
Closed

Pipeline pods cannot be scheduled in Drone for Kubernetes #72

totogo opened this issue May 30, 2019 · 7 comments · May be fixed by #77

Comments

@totogo
Copy link

totogo commented May 30, 2019

The this part of code generated a node selector which is not fully correct in some environments, causing the pod can't be scheduled. So the pipeline always stuck on Clone stage.

Values: []string{e.node},

The issue is also described here:
https://discourse.drone.io/t/drone-on-k8s-failedscheduling/3854/10

According to Kubernetes document:

Note: The value of these labels is cloud provider specific and is not guaranteed to be reliable. For example, the value of kubernetes.io/hostname may be the same as the Node name in some environments and a different value in other environments.

@iron-sam
Copy link

I opened an issue on github.com/drone/drone, and just realized that this already exist. We are facing the same problem. This is the text of my issue, because maybe adds some more info:

Problem

When Drone configures affinity of pipelines' pods, it sets this chunk of code:

spec:
  affinity:                                                                                                                                                                                                                                                                                                                  
    nodeAffinity:                                                                                                                                                                                                                                                                                                            
      requiredDuringSchedulingIgnoredDuringExecution:                                                                                                                                                                                                                                                                        
        nodeSelectorTerms:                                                                                                                                                                                                                                                                                                   
        - matchExpressions:                                                                                                                                                                                                                                                                                                  
          - key: kubernetes.io/hostname                                                                                                                                                                                                                                                                                      
            operator: In                                                                                                                                                                                                                                                                                                     
            values:                                                                                                                                                                                                                                                                                                          
            - ip-10-100-45-161.eu-west-1.compute.internal

Correct me if I'm wrong, but this affinity seems to be set based on pipeline's pod spec.nodeName. On AWS EKS, worker nodes names are tagged like ${HOST_DNS}.${AWS_REGION}.compute.internal, but Kubernetes' label kubernetes.io/hostname is set to the actual host name of the EC2 instance, ${HOST_DNS}, e.g.:

kubectl get nodes -L kubernetes.io/hostname
NAME                                          STATUS   ROLES    AGE    VERSION   HOSTNAME
ip-10-100-15-22.eu-west-1.compute.internal    Ready    <none>   12d    v1.12.7   ip-10-100-15-22
ip-10-100-45-161.eu-west-1.compute.internal   Ready    <none>   121m   v1.12.7   ip-10-100-45-161
ip-10-100-63-36.eu-west-1.compute.internal    Ready    <none>   11d    v1.12.7   ip-10-100-63-36

So, when the pipeline launches a new namespace to run all the steps of the pipeline, child jobs have the upper affinity, and never find a node to schedule the pod.

Possible solution

You can set your nodes label kubernetes.io/hostname to the node name manually, and Drone would launch pods without errors.

But maybe Drone should use another type of affinity, or none affinity at all, but I assume that the affinity setting is there for a reason.

Let me know if you need more information.

Thank you for everything!

@iron-sam
Copy link

I've been trying some workarounds, and be able to fix this issue without touching Drone configuration.

I'm using EKS Terraform Module to create our cluster and auto scaling groups. Due to the fact that nodeName != hostname when worker nodes are created, I added one line to user_data to set hostname equal to nodeName (like ip-XXX-XXX-XXX-XXX.${REGION}.compute.internal), just like this:

# Code from Terraform EKS Module
  worker_groups = [
    {
      key_name             = "${aws_key_pair.key.key_name}"
      name                 = "workers-m5"
      # pre_userdata sets hostname equal to dns
      pre_userdata         = "hostnamectl set-hostname $$( cat /etc/hostname ).${var.aws_region}.compute.internal" 
      kubelet_extra_args   = "--node-labels=stage=${var.stage}"
      asg_desired_capacity = 2
      asg_max_size         = 2
      asg_min_size         = 1
      instance_type        = "m5.large"
      enable_monitoring    = false
      public_ip            = false
      autoscaling_enabled  = false
    }
]

New nodes associated with this auto scaling group will be created with the same node name, and Kubernetes label ''kubernetes.io/hostname`.

Hope this will be helpful to someone. :)

@totogo
Copy link
Author

totogo commented Jun 18, 2019

@iron-sam sam I'm using this workaround you mentioned and it works:

You can set your nodes label kubernetes.io/hostname to the node name manually, and Drone would launch pods without errors.

But we are using Rancher to manage the Kubernetes cluster, we might have issues when I scale up and down the cluster nodes.

@iron-sam
Copy link

@totogo I don't use Rancher, but the idea is to set the host name of the server to the DNS assigned by AWS + region + compute.internal. Check if Rancher allows adding user data script on instance creation.

@HighwayofLife
Copy link

Rancher does have a hostname override option.

@bradrydzewski
Copy link
Member

We are overhauling the kubernetes runtime, and should have an alpha available end of next week. The runtime is standalone and is being developed at https://github.com/drone-runners/drone-runner-kube

This new implementation is better aligned with kubernetes and executes all pipeline steps in a single pod, similar to Tekton, as opposed to executing each step in its own pod. Since all steps are executed in the same pod we no longer need to rely on affinity, which means this issue goes away :)

@bradrydzewski
Copy link
Member

closing this since docs for the new (still experimental) kubernetes runner will be posted today for early adopters. This iteration of the runner does not use node affinity and instead runs all pipeline steps in the same pod.
https://github.com/drone-runners/drone-runner-kube

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants