Skip to content

Commit

Permalink
revise documentation for plugins requiring flyte backend setup (#1062)
Browse files Browse the repository at this point in the history
* revise documentation for plugins requiring flyte backend setup

Signed-off-by: Samhita Alla <[email protected]>

* nit

Signed-off-by: Samhita Alla <[email protected]>

* update docs

Signed-off-by: Samhita Alla <[email protected]>

* update requirements

Signed-off-by: Samhita Alla <[email protected]>

* downgrade pydantic

Signed-off-by: Samhita Alla <[email protected]>

* revert requirements update

Signed-off-by: Samhita Alla <[email protected]>

* revert requirements update

Signed-off-by: Samhita Alla <[email protected]>

* generate requirements in databricks

Signed-off-by: Samhita Alla <[email protected]>

* add envd

Signed-off-by: Samhita Alla <[email protected]>

* ray isort

Signed-off-by: Samhita Alla <[email protected]>

* sort imports

Signed-off-by: Samhita Alla <[email protected]>

* lint

Signed-off-by: Samhita Alla <[email protected]>

* add envd to plugin requirements

Signed-off-by: Samhita Alla <[email protected]>

* lint

Signed-off-by: Samhita Alla <[email protected]>

* add requirements.txt file

Signed-off-by: Samhita Alla <[email protected]>

* fix requirements

Signed-off-by: Samhita Alla <[email protected]>

* modify imagespec registry

Signed-off-by: Samhita Alla <[email protected]>

* lint

Signed-off-by: Samhita Alla <[email protected]>

* tf dependencies

Signed-off-by: Samhita Alla <[email protected]>

* modify deps and add registry

Signed-off-by: Samhita Alla <[email protected]>

* modify registry

Signed-off-by: Samhita Alla <[email protected]>

* mpi protobuf

Signed-off-by: Samhita Alla <[email protected]>

* add placeholder

Signed-off-by: Samhita Alla <[email protected]>

* incorporate suggestions

Signed-off-by: Samhita Alla <[email protected]>

---------

Signed-off-by: Samhita Alla <[email protected]>
  • Loading branch information
samhita-alla authored Aug 18, 2023
1 parent bcd5624 commit 20a7f1f
Show file tree
Hide file tree
Showing 41 changed files with 2,019 additions and 2,770 deletions.
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
# -- Project information -----------------------------------------------------

project = "Flytesnacks"
copyright = "2022, Flyte"
copyright = "2023, Flyte"
author = "Flyte"

# The full version, including alpha/beta/rc tags
Expand Down
71 changes: 33 additions & 38 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,8 +170,8 @@ of data between tasks and, more generally, the dependencies between tasks 🔀.
:animate: fade-in-slide-down

Flyte `@task` and `@workflow` decorators are designed to work seamlessly with
your code-base, provided that the *decorated function is at the top-level scope
of the module*.
your code-base, provided that the _decorated function is at the top-level scope
of the module_.

This means that you can invoke tasks and workflows as regular Python methods and
even import and use them in other Python modules or scripts.
Expand All @@ -189,15 +189,15 @@ only supports a subset of Python's semantics. Learn more in the

## Running Flyte Workflows in Python

You can run the workflow in ``example.py`` on a local Python by using `pyflyte`,
You can run the workflow in `example.py` on a local Python by using `pyflyte`,
the CLI that ships with `flytekit`.

```{prompt} bash $
pyflyte run example.py training_workflow \
--hyperparameters '{"C": 0.1}'
```

:::::{dropdown} {fa}`info-circle` Running into shell issues?
:::::{dropdown} {fa}`info-circle` Running into shell issues?
:title: text-muted
:animate: fade-in-slide-down

Expand All @@ -212,32 +212,30 @@ set -gx PATH $PATH ~/.local/bin
:::
:::::



:::::{dropdown} {fa}`info-circle` Why use `pyflyte run` rather than `python example.py`?
:title: text-muted
:animate: fade-in-slide-down

`pyflyte run` enables you to execute a specific workflow using the syntax
`pyflyte run <path/to/script.py> <workflow_or_task_function_name>`.

Keyword arguments can be supplied to ``pyflyte run`` by passing in options in
the format ``--kwarg value``, and in the case of ``snake_case_arg`` argument
names, you can pass in options in the form of ``--snake-case-arg value``.
Keyword arguments can be supplied to `pyflyte run` by passing in options in
the format `--kwarg value`, and in the case of `snake_case_arg` argument
names, you can pass in options in the form of `--snake-case-arg value`.

::::{note}
If you want to run a workflow with `python example.py`, you would have to write
a `main` module conditional at the end of the script to actually run the
workflow:

:::{code-block} python
```python
if __name__ == "__main__":
training_workflow(hyperparameters={"C": 0.1})
:::
```

This becomes even more verbose if you want to pass in arguments:

:::{code-block} python
```python
if __name__ == "__main__":
import json
from argparse import ArgumentParser
Expand All @@ -248,7 +246,7 @@ if __name__ == "__main__":

args = parser.parse_args()
training_workflow(hyperparameters=args.hyperparameters)
:::
```

::::

Expand Down Expand Up @@ -315,7 +313,6 @@ Where ``<execution_name>`` is a unique identifier for the workflow execution.
````


## Inspect the Results

Navigate to the URL produced by `pyflyte run`. This will take you to
Expand All @@ -324,7 +321,6 @@ workflows, and executions.

![getting started console](https://github.com/flyteorg/static-resources/raw/main/flytesnacks/getting_started/getting_started_console.gif)


```{note}
There are a few features about FlyteConsole worth pointing out in the GIF above:
Expand All @@ -337,13 +333,12 @@ There are a few features about FlyteConsole worth pointing out in the GIF above:

## Summary

🎉 **Congratulations! In this introductory guide, you:**
🎉 **Congratulations! In this introductory guide, you:**

1. 📜 Created a Flyte script, which trains a binary classification model.
2. 🚀 Spun up a demo Flyte cluster on your local system.
3. 👟 Ran a workflow locally and on a demo Flyte cluster.


## What's Next?

Follow the rest of the sections in the documentation to get a better
Expand Down Expand Up @@ -439,33 +434,33 @@ flyte_lab
:hidden:
Integrations <integrations>
auto_examples/sql_plugin/index
auto_examples/greatexpectations_plugin/index
auto_examples/papermill_plugin/index
auto_examples/pandera_plugin/index
auto_examples/modin_plugin/index
auto_examples/dolt_plugin/index
auto_examples/airflow_plugin/index
auto_examples/athena_plugin/index
auto_examples/aws_batch_plugin/index
auto_examples/sagemaker_pytorch_plugin/index
auto_examples/sagemaker_training_plugin/index
auto_examples/bigquery_plugin/index
auto_examples/k8s_dask_plugin/index
auto_examples/databricks_plugin/index
auto_examples/dbt_plugin/index
auto_examples/whylogs_plugin/index
auto_examples/mlflow_plugin/index
auto_examples/onnx_plugin/index
auto_examples/dolt_plugin/index
auto_examples/duckdb_plugin/index
auto_examples/greatexpectations_plugin/index
auto_examples/hive_plugin/index
auto_examples/k8s_pod_plugin/index
auto_examples/k8s_dask_plugin/index
auto_examples/k8s_spark_plugin/index
auto_examples/kfpytorch_plugin/index
auto_examples/kftensorflow_plugin/index
auto_examples/mlflow_plugin/index
auto_examples/modin_plugin/index
auto_examples/kfmpi_plugin/index
auto_examples/onnx_plugin/index
auto_examples/papermill_plugin/index
auto_examples/pandera_plugin/index
auto_examples/kfpytorch_plugin/index
auto_examples/ray_plugin/index
auto_examples/sagemaker_training_plugin/index
auto_examples/sagemaker_pytorch_plugin/index
auto_examples/athena_plugin/index
auto_examples/aws_batch_plugin/index
auto_examples/hive_plugin/index
auto_examples/snowflake_plugin/index
auto_examples/databricks_plugin/index
auto_examples/bigquery_plugin/index
auto_examples/airflow_plugin/index
auto_examples/k8s_spark_plugin/index
auto_examples/sql_plugin/index
auto_examples/kftensorflow_plugin/index
auto_examples/whylogs_plugin/index
```

```{toctree}
Expand Down
1 change: 1 addition & 0 deletions examples/basics/basics/named_outputs.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ def say_hello() -> hello_output:
# which are tuples that need to be de-referenced.
# :::


# %%
@workflow
def my_wf() -> wf_outputs:
Expand Down
11 changes: 2 additions & 9 deletions examples/databricks_plugin/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
FROM databricksruntime/standard:11.3-LTS
FROM databricksruntime/standard:12.2-LTS
LABEL org.opencontainers.image.source=https://github.com/flyteorg/flytesnacks
# To build this dockerfile, run "make docker_build".

ENV VENV /opt/venv
ENV LANG C.UTF-8
Expand All @@ -11,12 +10,6 @@ USER 0

RUN sudo apt-get update && sudo apt-get install -y make build-essential libssl-dev git

# Install custom package
RUN /databricks/python3/bin/pip install awscli
WORKDIR /opt
RUN curl https://sdk.cloud.google.com > install.sh
RUN bash /opt/install.sh --install-dir=/opt

# Install Python dependencies
COPY ./requirements.txt /databricks/driver/requirements.txt
RUN /databricks/python3/bin/pip install -r /databricks/driver/requirements.txt
Expand All @@ -27,6 +20,6 @@ WORKDIR /databricks/driver
COPY . /databricks/driver/

# This tag is supplied by the build script and will be used to determine the version
# when registering tasks, workflows, and launch plans
# when registering tasks, workflows and launch plans.
ARG tag
ENV FLYTE_INTERNAL_IMAGE $tag
61 changes: 28 additions & 33 deletions examples/databricks_plugin/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,52 +4,47 @@
.. tags:: Spark, Integration, DistributedComputing, Data, Advanced
```

Flyte backend can be connected with Databricks service. Once enabled it can allow you to submit a spark job to Databricks platform.
This section will provide how to use the Databricks Plugin using flytekit python.
Flyte can be seamlessly integrated with the [Databricks](https://www.databricks.com/) service,
enabling you to effortlessly submit Spark jobs to the Databricks platform.

## Installation
## Install the plugin

The flytekit Databricks plugin is bundled into its Spark plugin, so to use, simply run the following:
The Databricks plugin comes bundled with the Spark plugin.
To execute it locally, run the following command:

```{eval-rst}
.. prompt:: bash
pip install flytekitplugins-spark
```
pip install flytekitplugins-spark
```

## How to Build Your Dockerfile for Spark on Databricks
If you intend to run the plugin on the Flyte cluster, you must first set it up on the backend.
Please refer to the
{std:ref}`Databricks plugin setup guide <flyte:deployment-plugin-setup-webapi-databricks>`
for detailed instructions.

Using Spark on Databricks is extremely easy and provides full versioning using the custom-built Spark container. The built container can also execute regular Spark tasks.
For Spark, the image must use a base image built by Databricks and the workflow code must copy to `/databricks/driver`
## Run the example on the Flyte cluster

```{literalinclude} ../../../examples/databricks_plugin/Dockerfile
:emphasize-lines: 20-32
:language: docker
:linenos: true
To run the provided example on the Flyte cluster, use the following command:

```
pyflyte run --remote \
--image ghcr.io/flyteorg/flytecookbook:databricks_plugin-latest \
https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/databricks_plugin/databricks_plugin/databricks_job.py \
my_databricks_job
```

## Configuring the backend to get Databricks plugin working
:::{note}
Using Spark on Databricks is incredibly simple and offers comprehensive versioning through a
custom-built Spark container. This built container also facilitates the execution of standard Spark tasks.

1. Make sure to add "databricks" in `tasks.task-plugins.enabled-plugin` in [enabled_plugins.yaml](https://github.com/flyteorg/flyte/blob/master/deployment/sandbox/flyte_generated.yaml#L2296)
2. Add Databricks access token to Flytepropeller. [here](https://docs.databricks.com/administration-guide/access-control/tokens.html#enable-or-disable-token-based-authentication-for-the-workspace) to see more detail to create Databricks access token.
To utilize Spark, the image should employ a base image provided by Databricks,
and the workflow code must be copied to `/databricks/driver`.

```bash
kubectl edit secret -n flyte flyte-propeller-auth
```{literalinclude} ../../../examples/databricks_plugin/Dockerfile
:language: docker
:emphasize-lines: 1,7-8,20
```

Configuration will be like below

```bash
apiVersion: v1
data:
FLYTE_DATABRICKS_API_TOKEN: <ACCESS_TOKEN>
kind: Secret
metadata:
annotations:
meta.helm.sh/release-name: flyte
meta.helm.sh/release-namespace: flyte
...
```
:::

```{auto-examples-toc}
databricks_job
Expand Down
Loading

0 comments on commit 20a7f1f

Please sign in to comment.