Skip to content

Commit

Permalink
Merge branch 'master' into 4105-spark-driver-executor-podtemplate
Browse files Browse the repository at this point in the history
Signed-off-by: machichima <[email protected]>
  • Loading branch information
machichima committed Feb 5, 2025
2 parents a1edbdd + ea00864 commit 167c1e6
Show file tree
Hide file tree
Showing 181 changed files with 9,651 additions and 2,099 deletions.
4 changes: 2 additions & 2 deletions README.md

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion boilerplate/flyte/golang_support_tools/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -246,7 +246,7 @@ require (
golang.org/x/exp v0.0.0-20240904232852-e7e105dedf7e // indirect
golang.org/x/exp/typeparams v0.0.0-20240314144324-c7f7c6466f7f // indirect
golang.org/x/mod v0.21.0 // indirect
golang.org/x/net v0.28.0 // indirect
golang.org/x/net v0.33.0 // indirect
golang.org/x/oauth2 v0.22.0 // indirect
golang.org/x/sync v0.10.0 // indirect
golang.org/x/sys v0.28.0 // indirect
Expand Down
4 changes: 2 additions & 2 deletions boilerplate/flyte/golang_support_tools/go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -675,8 +675,8 @@ golang.org/x/net v0.1.0/go.mod h1:Cx3nUiGt4eDBEyega/BKRp+/AlGL8hYe7U9odMt2Cco=
golang.org/x/net v0.2.0/go.mod h1:KqCZLdyyvdV855qA2rE3GC2aiw5xGR5TEjj8smXukLY=
golang.org/x/net v0.5.0/go.mod h1:DivGGAXEgPSlEBzxGzZI+ZLohi+xUj054jfeKui00ws=
golang.org/x/net v0.6.0/go.mod h1:2Tu9+aMcznHK/AK1HMvgo6xiTLG5rD5rZLDS+rp2Bjs=
golang.org/x/net v0.28.0 h1:a9JDOJc5GMUJ0+UDqmLT86WiEy7iWyIhz8gz8E4e5hE=
golang.org/x/net v0.28.0/go.mod h1:yqtgsTWOOnlGLG9GFRrK3++bGOUEkNBoHZc8MEDWPNg=
golang.org/x/net v0.33.0 h1:74SYHlV8BIgHIFC/LrYkOGIwL19eTYXQ5wc6TBuO36I=
golang.org/x/net v0.33.0/go.mod h1:HXLR5J+9DxmrqMwG9qjGCxZ+zKXxBru04zlTvWlWuN4=
golang.org/x/oauth2 v0.0.0-20180821212333-d2e6202438be/go.mod h1:N/0e6XlmueqKjAGxoOufVs8QHGRruUQn6yWY3a++T0U=
golang.org/x/oauth2 v0.22.0 h1:BzDx2FehcG7jJwgWLELCdmLuxk2i+x9UDpSiss2u0ZA=
golang.org/x/oauth2 v0.22.0/go.mod h1:XYTD2NtWslqkgxebSiOHnXEap4TF09sJSc7H1sXbhtI=
Expand Down
2 changes: 1 addition & 1 deletion charts/flyte-binary/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Chart for basic single Flyte executable deployment
| commonAnnotations | object | `{}` | |
| commonLabels | object | `{}` | |
| configuration.agentService.defaultAgent.defaultTimeout | string | `"10s"` | |
| configuration.agentService.defaultAgent.endpoint | string | `"dns:///flyteagent.flyte.svc.cluster.local:8000"` | |
| configuration.agentService.defaultAgent.endpoint | string | `"k8s://flyteagent.flyte:8000"` | |
| configuration.agentService.defaultAgent.insecure | bool | `true` | |
| configuration.agentService.defaultAgent.timeouts.GetTask | string | `"10s"` | |
| configuration.annotations | object | `{}` | |
Expand Down
2 changes: 1 addition & 1 deletion charts/flyte-binary/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ configuration:
# agentService Flyte Agent configuration
agentService:
defaultAgent:
endpoint: "dns:///flyteagent.flyte.svc.cluster.local:8000"
endpoint: "k8s://flyteagent.flyte:8000"
insecure: true
timeouts:
GetTask: 10s
Expand Down
7 changes: 4 additions & 3 deletions charts/flyte-core/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -211,9 +211,9 @@ helm install gateway bitnami/contour -n flyte
| flyteadmin.serviceMonitor.scrapeTimeout | string | `"30s"` | Sets the timeout after which request to scrape metrics will time out |
| flyteadmin.tolerations | list | `[]` | tolerations for Flyteadmin deployment |
| flyteagent.enabled | bool | `false` | |
| flyteagent.plugin_config.plugins.agent-service | object | `{"defaultAgent":{"endpoint":"dns:///flyteagent.flyte.svc.cluster.local:8000","insecure":true},"supportedTaskTypes":[]}` | Agent service configuration for propeller. |
| flyteagent.plugin_config.plugins.agent-service.defaultAgent | object | `{"endpoint":"dns:///flyteagent.flyte.svc.cluster.local:8000","insecure":true}` | The default agent service to use for plugin tasks. |
| flyteagent.plugin_config.plugins.agent-service.defaultAgent.endpoint | string | `"dns:///flyteagent.flyte.svc.cluster.local:8000"` | The agent service endpoint propeller should connect to. |
| flyteagent.plugin_config.plugins.agent-service | object | `{"defaultAgent":{"endpoint":"k8s://flyteagent.flyte:8000","insecure":true},"supportedTaskTypes":[]}` | Agent service configuration for propeller. |
| flyteagent.plugin_config.plugins.agent-service.defaultAgent | object | `{"endpoint":"k8s://flyteagent.flyte:8000","insecure":true}` | The default agent service to use for plugin tasks. |
| flyteagent.plugin_config.plugins.agent-service.defaultAgent.endpoint | string | `"k8s://flyteagent.flyte:8000"` | The agent service endpoint propeller should connect to. |
| flyteagent.plugin_config.plugins.agent-service.defaultAgent.insecure | bool | `true` | Whether the connection from propeller to the agent service should use TLS. |
| flyteagent.plugin_config.plugins.agent-service.supportedTaskTypes | list | `[]` | The task types supported by the default agent. As of #5460 these are discovered automatically and don't need to be configured. |
| flyteagent.podLabels | object | `{}` | Labels for flyteagent pods |
Expand Down Expand Up @@ -304,6 +304,7 @@ helm install gateway bitnami/contour -n flyte
| secrets.adminOauthClientCredentials.clientId | string | `"flytepropeller"` | |
| secrets.adminOauthClientCredentials.clientSecret | string | `"foobar"` | |
| secrets.adminOauthClientCredentials.enabled | bool | `true` | |
| secrets.adminOauthClientCredentials.secretName | string | `"flyte-secret-auth"` | |
| sparkoperator | object | `{"enabled":false,"plugin_config":{"plugins":{"spark":{"spark-config-default":[{"spark.hadoop.fs.s3a.aws.credentials.provider":"com.amazonaws.auth.DefaultAWSCredentialsProviderChain"},{"spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version":"2"},{"spark.kubernetes.allocation.batch.size":"50"},{"spark.hadoop.fs.s3a.acl.default":"BucketOwnerFullControl"},{"spark.hadoop.fs.s3n.impl":"org.apache.hadoop.fs.s3a.S3AFileSystem"},{"spark.hadoop.fs.AbstractFileSystem.s3n.impl":"org.apache.hadoop.fs.s3a.S3A"},{"spark.hadoop.fs.s3.impl":"org.apache.hadoop.fs.s3a.S3AFileSystem"},{"spark.hadoop.fs.AbstractFileSystem.s3.impl":"org.apache.hadoop.fs.s3a.S3A"},{"spark.hadoop.fs.s3a.impl":"org.apache.hadoop.fs.s3a.S3AFileSystem"},{"spark.hadoop.fs.AbstractFileSystem.s3a.impl":"org.apache.hadoop.fs.s3a.S3A"},{"spark.hadoop.fs.s3a.multipart.threshold":"536870912"},{"spark.blacklist.enabled":"true"},{"spark.blacklist.timeout":"5m"},{"spark.task.maxfailures":"8"}]}}}}` | Optional: Spark Plugin using the Spark Operator |
| sparkoperator.enabled | bool | `false` | - enable or disable Sparkoperator deployment installation |
| sparkoperator.plugin_config | object | `{"plugins":{"spark":{"spark-config-default":[{"spark.hadoop.fs.s3a.aws.credentials.provider":"com.amazonaws.auth.DefaultAWSCredentialsProviderChain"},{"spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version":"2"},{"spark.kubernetes.allocation.batch.size":"50"},{"spark.hadoop.fs.s3a.acl.default":"BucketOwnerFullControl"},{"spark.hadoop.fs.s3n.impl":"org.apache.hadoop.fs.s3a.S3AFileSystem"},{"spark.hadoop.fs.AbstractFileSystem.s3n.impl":"org.apache.hadoop.fs.s3a.S3A"},{"spark.hadoop.fs.s3.impl":"org.apache.hadoop.fs.s3a.S3AFileSystem"},{"spark.hadoop.fs.AbstractFileSystem.s3.impl":"org.apache.hadoop.fs.s3a.S3A"},{"spark.hadoop.fs.s3a.impl":"org.apache.hadoop.fs.s3a.S3AFileSystem"},{"spark.hadoop.fs.AbstractFileSystem.s3a.impl":"org.apache.hadoop.fs.s3a.S3A"},{"spark.hadoop.fs.s3a.multipart.threshold":"536870912"},{"spark.blacklist.enabled":"true"},{"spark.blacklist.timeout":"5m"},{"spark.task.maxfailures":"8"}]}}}` | Spark plugin configuration |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ spec:
{{- if .Values.secrets.adminOauthClientCredentials.enabled }}
- name: auth
secret:
secretName: flyte-secret-auth
secretName: {{ .Values.secrets.adminOauthClientCredentials.secretName }}
{{- end }}
{{- end }}
{{- with .Values.cluster_resource_manager.nodeSelector }}
Expand Down
2 changes: 1 addition & 1 deletion charts/flyte-core/templates/common/secret-auth.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
apiVersion: v1
kind: Secret
metadata:
name: flyte-secret-auth
name: {{ .Values.secrets.adminOauthClientCredentials.secretName }}
namespace: {{ template "flyte.namespace" . }}
type: Opaque
stringData:
Expand Down
2 changes: 1 addition & 1 deletion charts/flyte-core/templates/propeller/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ spec:
{{- if .Values.secrets.adminOauthClientCredentials.enabled }}
- name: auth
secret:
secretName: flyte-secret-auth
secretName: {{ .Values.secrets.adminOauthClientCredentials.secretName }}
{{- end }}
{{- with .Values.flytepropeller.additionalVolumes -}}
{{ tpl (toYaml .) $ | nindent 6 }}
Expand Down
3 changes: 2 additions & 1 deletion charts/flyte-core/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -288,7 +288,7 @@ flyteagent:
# -- The default agent service to use for plugin tasks.
defaultAgent:
# -- The agent service endpoint propeller should connect to.
endpoint: "dns:///flyteagent.flyte.svc.cluster.local:8000"
endpoint: "k8s://flyteagent.flyte:8000"
# -- Whether the connection from propeller to the agent service should use TLS.
insecure: true
# -- The task types supported by the default agent. As of #5460 these are discovered automatically and don't
Expand Down Expand Up @@ -487,6 +487,7 @@ secrets:
enabled: true
clientSecret: foobar
clientId: flytepropeller
secretName: flyte-secret-auth

#
# WEBHOOK SETTINGS
Expand Down
2 changes: 1 addition & 1 deletion datacatalog/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ require (
go.opentelemetry.io/proto/otlp v1.1.0 // indirect
golang.org/x/crypto v0.31.0 // indirect
golang.org/x/exp v0.0.0-20240325151524-a685a6edb6d8 // indirect
golang.org/x/net v0.27.0 // indirect
golang.org/x/net v0.33.0 // indirect
golang.org/x/oauth2 v0.18.0 // indirect
golang.org/x/sync v0.10.0 // indirect
golang.org/x/sys v0.28.0 // indirect
Expand Down
4 changes: 2 additions & 2 deletions datacatalog/go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -496,8 +496,8 @@ golang.org/x/net v0.0.0-20201209123823-ac852fbbde11/go.mod h1:m0MpNAwzfU5UDzcl9v
golang.org/x/net v0.0.0-20201224014010-6772e930b67b/go.mod h1:m0MpNAwzfU5UDzcl9v0D8zg8gWTRqZa9RBIspLL5mdg=
golang.org/x/net v0.0.0-20210226172049-e18ecbb05110/go.mod h1:m0MpNAwzfU5UDzcl9v0D8zg8gWTRqZa9RBIspLL5mdg=
golang.org/x/net v0.0.0-20220722155237-a158d28d115b/go.mod h1:XRhObCWvk6IyKnWLug+ECip1KBveYUHfp+8e9klMJ9c=
golang.org/x/net v0.27.0 h1:5K3Njcw06/l2y9vpGCSdcxWOYHOUk3dVNGDXN+FvAys=
golang.org/x/net v0.27.0/go.mod h1:dDi0PyhWNoiUOrAS8uXv/vnScO4wnHQO4mj9fn/RytE=
golang.org/x/net v0.33.0 h1:74SYHlV8BIgHIFC/LrYkOGIwL19eTYXQ5wc6TBuO36I=
golang.org/x/net v0.33.0/go.mod h1:HXLR5J+9DxmrqMwG9qjGCxZ+zKXxBru04zlTvWlWuN4=
golang.org/x/oauth2 v0.0.0-20180821212333-d2e6202438be/go.mod h1:N/0e6XlmueqKjAGxoOufVs8QHGRruUQn6yWY3a++T0U=
golang.org/x/oauth2 v0.0.0-20190226205417-e64efc72b421/go.mod h1:gOpvHmFTYa4IltrdGE7lF6nIHvwfUNPOp7c8zoXwtLw=
golang.org/x/oauth2 v0.0.0-20190604053449-0f29369cfe45/go.mod h1:gOpvHmFTYa4IltrdGE7lF6nIHvwfUNPOp7c8zoXwtLw=
Expand Down
8 changes: 4 additions & 4 deletions docker/sandbox-bundled/manifests/complete-agent.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -486,7 +486,7 @@ data:
agent-service:
defaultAgent:
defaultTimeout: 10s
endpoint: dns:///flyteagent.flyte.svc.cluster.local:8000
endpoint: k8s://flyteagent.flyte:8000
insecure: true
timeouts:
GetTask: 10s
Expand Down Expand Up @@ -823,7 +823,7 @@ type: Opaque
---
apiVersion: v1
data:
haSharedSecret: UnZJZHEzUExzbkJsOW1wYw==
haSharedSecret: QW5aNWlUNmxVWEpxUUV4ZQ==
proxyPassword: ""
proxyUsername: ""
kind: Secret
Expand Down Expand Up @@ -1254,7 +1254,7 @@ spec:
metadata:
annotations:
checksum/cluster-resource-templates: 6fd9b172465e3089fcc59f738b92b8dc4d8939360c19de8ee65f68b0e7422035
checksum/configuration: 5a537c05dbd27a7f2884eb78f4e762205c3bcc3248ab9e509ab7074c7e5f953d
checksum/configuration: 7841a55b7d0bd6a6d44f37ccf05297fb7c3338c1ebd9c2608d499e4f8c817383
checksum/configuration-secret: 09216ffaa3d29e14f88b1f30af580d02a2a5e014de4d750b7f275cc07ed4e914
labels:
app.kubernetes.io/component: flyte-binary
Expand Down Expand Up @@ -1420,7 +1420,7 @@ spec:
metadata:
annotations:
checksum/config: 8f50e768255a87f078ba8b9879a0c174c3e045ffb46ac8723d2eedbe293c8d81
checksum/secret: ce172103045f4215e361b4c109776a78fe06660a4ade01c7351ea07212e7cfb9
checksum/secret: bbd233a0f62bc60cc8937296e0ba5a8fd30953cb6a67883ee6f687add14e3de7
labels:
app: docker-registry
release: flyte-sandbox
Expand Down
4 changes: 2 additions & 2 deletions docker/sandbox-bundled/manifests/complete.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -805,7 +805,7 @@ type: Opaque
---
apiVersion: v1
data:
haSharedSecret: dDFiem04NjFzb29ZWHFtNA==
haSharedSecret: WXBrdzVRdDlYN3hoQzlrVw==
proxyPassword: ""
proxyUsername: ""
kind: Secret
Expand Down Expand Up @@ -1369,7 +1369,7 @@ spec:
metadata:
annotations:
checksum/config: 8f50e768255a87f078ba8b9879a0c174c3e045ffb46ac8723d2eedbe293c8d81
checksum/secret: 529d34a9c4d3c82b9eec5028fcc30f26e923fa77a57eb29c4705d28c85355963
checksum/secret: a143b07663973d76476087be99820760b0767450a57fd4a9153fad3df0a49b3a
labels:
app: docker-registry
release: flyte-sandbox
Expand Down
4 changes: 2 additions & 2 deletions docker/sandbox-bundled/manifests/dev.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -499,7 +499,7 @@ metadata:
---
apiVersion: v1
data:
haSharedSecret: Y1V1RU03eGVhUDFFc1pSdQ==
haSharedSecret: QkJ4V2NCcnJ0M0tpZEdpRQ==
proxyPassword: ""
proxyUsername: ""
kind: Secret
Expand Down Expand Up @@ -934,7 +934,7 @@ spec:
metadata:
annotations:
checksum/config: 8f50e768255a87f078ba8b9879a0c174c3e045ffb46ac8723d2eedbe293c8d81
checksum/secret: 66507f448be8010226a1ad2c741fb2866ef4372b68e61287c7500b47fae05572
checksum/secret: 2b3345c5f413de6f29c8f7e884f6d8471c5de1245e6a4f16d7a7f4c536cca2e0
labels:
app: docker-registry
release: flyte-sandbox
Expand Down
3 changes: 3 additions & 0 deletions docs/deployment/agents/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@ If you are using a managed deployment of Flyte, you will need to contact your de
- Configuring your Flyte deployment for the SnowFlake agent.
* - {ref}`OpenAI Batch <deployment-agent-setup-openai-batch>`
- Submit requests to OpenAI GPT models for asynchronous batch processing.
* - {ref}`LinkedIn K8s Service Batch <deployment-agent-setup-k8sservice>`
- Configuring your Flyte deployment for the K8s service agent.
```

```{toctree}
Expand All @@ -49,4 +51,5 @@ sagemaker_inference
sensor
snowflake
openai_batch
k8sservice
```
179 changes: 179 additions & 0 deletions docs/deployment/agents/k8sservice.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
.. _deployment-agent-setup-k8sservice:

Kubernetes (K8s) Service Agent
==============================

The Kubernetes (K8s) Data Service Agent enables machine learning (ML) users to efficiently handle non-training tasks—such as data loading, caching, and processing—concurrently with training jobs in Kubernetes clusters.
This capability is particularly valuable in deep learning applications, such as those in Graph Neural Networks (GNNs).

This guide offers a comprehensive overview of setting up the K8s Data Service Agent within your Flyte deployment.

Spin up a cluster
-----------------

.. tabs::

.. group-tab:: Flyte binary

You can spin up a demo cluster using the following command:

.. code-block:: bash
flytectl demo start
Or install Flyte using the :ref:`flyte-binary helm chart <deployment-deployment-cloud-simple>`.

.. group-tab:: Flyte core

If you've installed Flyte using the
`flyte-core helm chart <https://github.com/flyteorg/flyte/tree/master/charts/flyte-core>`__, please ensure:

* You have the correct kubeconfig and have selected the correct Kubernetes context.
* You have configured the correct flytectl settings in ``~/.flyte/config.yaml``.

.. note::

Add the Flyte chart repo to Helm if you're installing via the Helm charts.

.. code-block:: bash
helm repo add flyteorg https://flyteorg.github.io/flyte
Specify agent configuration
----------------------------

Enable the K8s service agent by adding the following config to the relevant YAML file(s):

.. code-block:: yaml
tasks:
task-plugins:
enabled-plugins:
- agent-service
default-for-task-types:
- dataservicetask: agent-service
.. code-block:: yaml
plugins:
agent-service:
agents:
k8sservice-agent:
endpoint: <AGENT_ENDPOINT>
insecure: true
agentForTaskTypes:
- dataservicetask: k8sservice-agent
- sensor: k8sservice-agent
Substitute ``<AGENT_ENDPOINT>`` with the endpoint of your MMCloud agent.

Setup the RBAC
--------------

The K8s Data Service Agent will create a StatefulSet and expose the Service endpoint for the StatefulSet pods.
RBAC needs to be set up to allow the K8s Data Service Agent to perform CRUD operations on the StatefulSet and Service.

The role `flyte-flyteagent-role` set up:

.. code-block:: yaml
# Example of the role/binding set up for the data service to create/update/delete resources in the sandbox flyte namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: flyte-flyteagent-role
namespace: flyte
labels:
app.kubernetes.io/name: flyteagent
app.kubernetes.io/instance: flyte
rules:
- apiGroups:
- apps
resources:
- statefulsets
- statefulsets/status
- statefulsets/scale
- statefulsets/finalizers
verbs:
- get
- list
- watch
- create
- update
- delete
- patch
- apiGroups:
- ""
resources:
- pods
- configmaps
- serviceaccounts
- secrets
- pods/exec
- pods/log
- pods/status
- services
verbs:
- '*'
The binding `flyte-flyteagent-rolebinding` for the role `flyte-flyteagent-role`

.. code-block:: yaml
# Example of the role/binding set up for the data service to create/update/delete resources in the sandbox flyte namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: flyte-flyteagent-rolebinding
namespace: flyte
labels:
app.kubernetes.io/name: flyteagent
app.kubernetes.io/instance: flyte
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: flyte-flyteagent-role
subjects:
- kind: ServiceAccount
name: flyteagent
namespace: flyte
Upgrade the deployment
----------------------

.. tabs::

.. group-tab:: Flyte binary

.. tabs::

.. group-tab:: Demo cluster

.. code-block:: bash
kubectl rollout restart deployment flyte-sandbox -n flyte
.. group-tab:: Helm chart

.. code-block:: bash
helm upgrade <RELEASE_NAME> flyteorg/flyte-binary -n <YOUR_NAMESPACE> --values <YOUR_YAML_FILE>
Replace ``<RELEASE_NAME>`` with the name of your release (e.g., ``flyte-backend``),
``<YOUR_NAMESPACE>`` with the name of your namespace (e.g., ``flyte``),
and ``<YOUR_YAML_FILE>`` with the name of your YAML file.

.. group-tab:: Flyte core

.. code-block::
helm upgrade <RELEASE_NAME> flyte/flyte-core -n <YOUR_NAMESPACE> --values values-override.yaml
Replace ``<RELEASE_NAME>`` with the name of your release (e.g., ``flyte``)
and ``<YOUR_NAMESPACE>`` with the name of your namespace (e.g., ``flyte``).

Wait for the upgrade to complete. You can check the status of the deployment pods by running the following command:

.. code-block::
kubectl get pods -n flyte
Loading

0 comments on commit 167c1e6

Please sign in to comment.