Skip to content

Commit cadc24e

Browse files
author
Ravi Gadde
committed
Scheduler extension
1 parent ef84c57 commit cadc24e

20 files changed

+1278
-138
lines changed

docs/design/scheduler_extender.md

+117
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
2+
3+
<!-- BEGIN STRIP_FOR_RELEASE -->
4+
5+
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
6+
width="25" height="25">
7+
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
8+
width="25" height="25">
9+
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
10+
width="25" height="25">
11+
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
12+
width="25" height="25">
13+
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
14+
width="25" height="25">
15+
16+
<h2>PLEASE NOTE: This document applies to the HEAD of the source tree</h2>
17+
18+
If you are using a released version of Kubernetes, you should
19+
refer to the docs that go with that version.
20+
21+
<strong>
22+
The latest release of this document can be found
23+
[here](http://releases.k8s.io/release-1.1/docs/design/scheduler_extender.md).
24+
25+
Documentation for other releases can be found at
26+
[releases.k8s.io](http://releases.k8s.io).
27+
</strong>
28+
--
29+
30+
<!-- END STRIP_FOR_RELEASE -->
31+
32+
<!-- END MUNGE: UNVERSIONED_WARNING -->
33+
34+
# Scheduler extender
35+
36+
There are three ways to add new scheduling rules (predicates and priority functions) to Kubernetes: (1) by adding these rules to the scheduler and recompiling (described here: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/scheduler.md), (2) implementing your own scheduler process that runs instead of, or alongside of, the standard Kubernetes scheduler, (3) implementing a "scheduler extender" process that the standard Kubernetes scheduler calls out to as a final pass when making scheduling decisions.
37+
38+
This document describes the third approach. This approach is needed for use cases where scheduling decisions need to be made on resources not directly managed by the standard Kubernetes scheduler. The extender helps make scheduling decisions based on such resources. (Note that the three approaches are not mutually exclusive.)
39+
40+
When scheduling a pod, the extender allows an external process to filter and prioritize nodes. Two separate http/https calls are issued to the extender, one for "filter" and one for "prioritize" actions. To use the extender, you must create a scheduler policy configuration file. The configuration specifies how to reach the extender, whether to use http or https and the timeout.
41+
42+
```go
43+
// Holds the parameters used to communicate with the extender. If a verb is unspecified/empty,
44+
// it is assumed that the extender chose not to provide that extension.
45+
type ExtenderConfig struct {
46+
// URLPrefix at which the extender is available
47+
URLPrefix string `json:"urlPrefix"`
48+
// Verb for the filter call, empty if not supported. This verb is appended to the URLPrefix when issuing the filter call to extender.
49+
FilterVerb string `json:"filterVerb,omitempty"`
50+
// Verb for the prioritize call, empty if not supported. This verb is appended to the URLPrefix when issuing the prioritize call to extender.
51+
PrioritizeVerb string `json:"prioritizeVerb,omitempty"`
52+
// The numeric multiplier for the node scores that the prioritize call generates.
53+
// The weight should be a positive integer
54+
Weight int `json:"weight,omitempty"`
55+
// EnableHttps specifies whether https should be used to communicate with the extender
56+
EnableHttps bool `json:"enableHttps,omitempty"`
57+
// TLSConfig specifies the transport layer security config
58+
TLSConfig *client.TLSClientConfig `json:"tlsConfig,omitempty"`
59+
// HTTPTimeout specifies the timeout duration for a call to the extender. Filter timeout fails the scheduling of the pod. Prioritize
60+
// timeout is ignored, k8s/other extenders priorities are used to select the node.
61+
HTTPTimeout time.Duration `json:"httpTimeout,omitempty"`
62+
}
63+
```
64+
65+
A sample scheduler policy file with extender configuration:
66+
67+
```json
68+
{
69+
"predicates": [
70+
{
71+
"name": "HostName"
72+
},
73+
{
74+
"name": "MatchNodeSelector"
75+
},
76+
{
77+
"name": "PodFitsResources"
78+
}
79+
],
80+
"priorities": [
81+
{
82+
"name": "LeastRequestedPriority",
83+
"weight": 1
84+
}
85+
],
86+
"extenders": [
87+
{
88+
"urlPrefix": "http://127.0.0.1:12345/api/scheduler",
89+
"filterVerb": "filter",
90+
"enableHttps": false
91+
}
92+
]
93+
}
94+
```
95+
96+
Arguments passed to the FilterVerb endpoint on the extender are the set of nodes filtered through the k8s predicates and the pod. Arguments passed to the PrioritizeVerb endpoint on the extender are the set of nodes filtered through the k8s predicates and extender predicates and the pod.
97+
98+
```go
99+
// ExtenderArgs represents the arguments needed by the extender to filter/prioritize
100+
// nodes for a pod.
101+
type ExtenderArgs struct {
102+
// Pod being scheduled
103+
Pod api.Pod `json:"pod"`
104+
// List of candidate nodes where the pod can be scheduled
105+
Nodes api.NodeList `json:"nodes"`
106+
}
107+
```
108+
109+
The "filter" call returns a list of nodes (api.NodeList). The "prioritize" call returns priorities for each node (schedulerapi.HostPriorityList).
110+
111+
The "filter" call may prune the set of nodes based on its predicates. Scores returned by the "prioritize" call are added to the k8s scores (computed through its priority functions) and used for final host selection.
112+
113+
Multiple extenders can be configured in the scheduler policy.
114+
115+
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
116+
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/scheduler_extender.md?pixel)]()
117+
<!-- END MUNGE: GENERATED_ANALYTICS -->

examples/examples_test.go

+3-2
Original file line numberDiff line numberDiff line change
@@ -235,7 +235,8 @@ func TestExampleObjectSchemas(t *testing.T) {
235235
"daemon": &extensions.DaemonSet{},
236236
},
237237
"../examples": {
238-
"scheduler-policy-config": &schedulerapi.Policy{},
238+
"scheduler-policy-config": &schedulerapi.Policy{},
239+
"scheduler-policy-config-with-extender": &schedulerapi.Policy{},
239240
},
240241
"../examples/rbd/secret": {
241242
"ceph-secret": &api.Secret{},
@@ -409,7 +410,7 @@ func TestExampleObjectSchemas(t *testing.T) {
409410
t.Logf("skipping : %s/%s\n", path, name)
410411
return
411412
}
412-
if name == "scheduler-policy-config" {
413+
if strings.Contains(name, "scheduler-policy-config") {
413414
if err := schedulerapilatest.Codec.DecodeInto(data, expectedType); err != nil {
414415
t.Errorf("%s did not decode correctly: %v\n%s", path, err, string(data))
415416
return
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
{
2+
"kind" : "Policy",
3+
"apiVersion" : "v1",
4+
"predicates" : [
5+
{"name" : "PodFitsPorts"},
6+
{"name" : "PodFitsResources"},
7+
{"name" : "NoDiskConflict"},
8+
{"name" : "MatchNodeSelector"},
9+
{"name" : "HostName"}
10+
],
11+
"priorities" : [
12+
{"name" : "LeastRequestedPriority", "weight" : 1},
13+
{"name" : "BalancedResourceAllocation", "weight" : 1},
14+
{"name" : "ServiceSpreadingPriority", "weight" : 1},
15+
{"name" : "EqualPriority", "weight" : 1}
16+
],
17+
"extender": {
18+
"url": "http://127.0.0.1:12346/scheduler",
19+
"apiVersion": "v1beta1",
20+
"filterVerb": "filter",
21+
"prioritizeVerb": "prioritize",
22+
"weight": 5,
23+
"enableHttps": false
24+
}
25+
}

plugin/pkg/scheduler/algorithm/priorities/priorities.go

+14-13
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ import (
2525
"k8s.io/kubernetes/pkg/labels"
2626
"k8s.io/kubernetes/plugin/pkg/scheduler/algorithm"
2727
"k8s.io/kubernetes/plugin/pkg/scheduler/algorithm/predicates"
28+
schedulerapi "k8s.io/kubernetes/plugin/pkg/scheduler/api"
2829
)
2930

3031
// the unused capacity is calculated on a scale of 0-10
@@ -73,7 +74,7 @@ func getNonzeroRequests(requests *api.ResourceList) (int64, int64) {
7374

7475
// Calculate the resource occupancy on a node. 'node' has information about the resources on the node.
7576
// 'pods' is a list of pods currently scheduled on the node.
76-
func calculateResourceOccupancy(pod *api.Pod, node api.Node, pods []*api.Pod) algorithm.HostPriority {
77+
func calculateResourceOccupancy(pod *api.Pod, node api.Node, pods []*api.Pod) schedulerapi.HostPriority {
7778
totalMilliCPU := int64(0)
7879
totalMemory := int64(0)
7980
capacityMilliCPU := node.Status.Capacity.Cpu().MilliValue()
@@ -104,7 +105,7 @@ func calculateResourceOccupancy(pod *api.Pod, node api.Node, pods []*api.Pod) al
104105
cpuScore, memoryScore,
105106
)
106107

107-
return algorithm.HostPriority{
108+
return schedulerapi.HostPriority{
108109
Host: node.Name,
109110
Score: int((cpuScore + memoryScore) / 2),
110111
}
@@ -114,14 +115,14 @@ func calculateResourceOccupancy(pod *api.Pod, node api.Node, pods []*api.Pod) al
114115
// It calculates the percentage of memory and CPU requested by pods scheduled on the node, and prioritizes
115116
// based on the minimum of the average of the fraction of requested to capacity.
116117
// Details: cpu((capacity - sum(requested)) * 10 / capacity) + memory((capacity - sum(requested)) * 10 / capacity) / 2
117-
func LeastRequestedPriority(pod *api.Pod, podLister algorithm.PodLister, nodeLister algorithm.NodeLister) (algorithm.HostPriorityList, error) {
118+
func LeastRequestedPriority(pod *api.Pod, podLister algorithm.PodLister, nodeLister algorithm.NodeLister) (schedulerapi.HostPriorityList, error) {
118119
nodes, err := nodeLister.List()
119120
if err != nil {
120-
return algorithm.HostPriorityList{}, err
121+
return schedulerapi.HostPriorityList{}, err
121122
}
122123
podsToMachines, err := predicates.MapPodsToMachines(podLister)
123124

124-
list := algorithm.HostPriorityList{}
125+
list := schedulerapi.HostPriorityList{}
125126
for _, node := range nodes.Items {
126127
list = append(list, calculateResourceOccupancy(pod, node, podsToMachines[node.Name]))
127128
}
@@ -144,7 +145,7 @@ func NewNodeLabelPriority(label string, presence bool) algorithm.PriorityFunctio
144145
// CalculateNodeLabelPriority checks whether a particular label exists on a node or not, regardless of its value.
145146
// If presence is true, prioritizes nodes that have the specified label, regardless of value.
146147
// If presence is false, prioritizes nodes that do not have the specified label.
147-
func (n *NodeLabelPrioritizer) CalculateNodeLabelPriority(pod *api.Pod, podLister algorithm.PodLister, nodeLister algorithm.NodeLister) (algorithm.HostPriorityList, error) {
148+
func (n *NodeLabelPrioritizer) CalculateNodeLabelPriority(pod *api.Pod, podLister algorithm.PodLister, nodeLister algorithm.NodeLister) (schedulerapi.HostPriorityList, error) {
148149
var score int
149150
nodes, err := nodeLister.List()
150151
if err != nil {
@@ -157,7 +158,7 @@ func (n *NodeLabelPrioritizer) CalculateNodeLabelPriority(pod *api.Pod, podListe
157158
labeledNodes[node.Name] = (exists && n.presence) || (!exists && !n.presence)
158159
}
159160

160-
result := []algorithm.HostPriority{}
161+
result := []schedulerapi.HostPriority{}
161162
//score int - scale of 0-10
162163
// 0 being the lowest priority and 10 being the highest
163164
for nodeName, success := range labeledNodes {
@@ -166,7 +167,7 @@ func (n *NodeLabelPrioritizer) CalculateNodeLabelPriority(pod *api.Pod, podListe
166167
} else {
167168
score = 0
168169
}
169-
result = append(result, algorithm.HostPriority{Host: nodeName, Score: score})
170+
result = append(result, schedulerapi.HostPriority{Host: nodeName, Score: score})
170171
}
171172
return result, nil
172173
}
@@ -177,21 +178,21 @@ func (n *NodeLabelPrioritizer) CalculateNodeLabelPriority(pod *api.Pod, podListe
177178
// close the two metrics are to each other.
178179
// Detail: score = 10 - abs(cpuFraction-memoryFraction)*10. The algorithm is partly inspired by:
179180
// "Wei Huang et al. An Energy Efficient Virtual Machine Placement Algorithm with Balanced Resource Utilization"
180-
func BalancedResourceAllocation(pod *api.Pod, podLister algorithm.PodLister, nodeLister algorithm.NodeLister) (algorithm.HostPriorityList, error) {
181+
func BalancedResourceAllocation(pod *api.Pod, podLister algorithm.PodLister, nodeLister algorithm.NodeLister) (schedulerapi.HostPriorityList, error) {
181182
nodes, err := nodeLister.List()
182183
if err != nil {
183-
return algorithm.HostPriorityList{}, err
184+
return schedulerapi.HostPriorityList{}, err
184185
}
185186
podsToMachines, err := predicates.MapPodsToMachines(podLister)
186187

187-
list := algorithm.HostPriorityList{}
188+
list := schedulerapi.HostPriorityList{}
188189
for _, node := range nodes.Items {
189190
list = append(list, calculateBalancedResourceAllocation(pod, node, podsToMachines[node.Name]))
190191
}
191192
return list, nil
192193
}
193194

194-
func calculateBalancedResourceAllocation(pod *api.Pod, node api.Node, pods []*api.Pod) algorithm.HostPriority {
195+
func calculateBalancedResourceAllocation(pod *api.Pod, node api.Node, pods []*api.Pod) schedulerapi.HostPriority {
195196
totalMilliCPU := int64(0)
196197
totalMemory := int64(0)
197198
score := int(0)
@@ -234,7 +235,7 @@ func calculateBalancedResourceAllocation(pod *api.Pod, node api.Node, pods []*ap
234235
score,
235236
)
236237

237-
return algorithm.HostPriority{
238+
return schedulerapi.HostPriority{
238239
Host: node.Name,
239240
Score: score,
240241
}

0 commit comments

Comments
 (0)