Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

first version connect cloud pages #21297

Closed
wants to merge 18 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions docs/source/_build/API_REFERENCE_LINKS.yml
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,13 @@ python:
write_json: https://docs.pola.rs/api/python/stable/reference/api/polars.DataFrame.write_json.html
write_ndjson: https://docs.pola.rs/api/python/stable/reference/api/polars.DataFrame.write_ndjson.html
write_parquet: https://docs.pola.rs/api/python/stable/reference/api/polars.DataFrame.write_parquet.html
Workspace: https://docs.cloud.pola.rs/reference/workspace/workspace.html
ComputeContext: https://docs.cloud.pola.rs/reference/compute/compute.html
LazyFrameExt : https://docs.cloud.pola.rs/reference/query/lazyframeext.html
QueryResult : https://docs.cloud.pola.rs/reference/query/query_result.html
InteractiveQuery : https://docs.cloud.pola.rs/reference/query/interactive_query.html
BatchQuery: https://docs.cloud.pola.rs/reference/query/batch_query.html
login: https://docs.cloud.pola.rs/reference/auth/api/polars_cloud.login.html

rust:
agg: https://docs.rs/polars/latest/polars/prelude/struct.LazyGroupBy.html#method.agg
Expand Down
4 changes: 1 addition & 3 deletions docs/source/_build/css/extra.css
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
:root {
--md-primary-fg-color: #0B7189;
--md-primary-fg-color--light: #C2CCD6;
--md-primary-fg-color--dark: #103547;
--md-primary-fg-color: #0075ff;
--md-text-font: 'Proxima Nova', sans-serif;
}

Expand Down
7 changes: 1 addition & 6 deletions docs/source/api/index.md → docs/source/api/reference.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,4 @@
---
hide:
- navigation
---

# API reference
# Reference guide

The API reference contains detailed descriptions of all public functions and objects. It's the best
place to look if you need information on a specific function.
Expand Down
Binary file added docs/source/polars-cloud/assets/aws-infra.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
55 changes: 55 additions & 0 deletions docs/source/polars-cloud/cli.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# CLI

Polars cloud comes with a command line interface (CLI) out of the box. This allows you to interact
with polars cloud resources from the terminal.

```bash
pc --help
```

```
usage: pc [-h] [-v] [-V] {login,workspace,compute} ...

positional arguments:
{login,workspace,compute}
login Authenticate with Polars Cloud by logging in through the browser
workspace Manage Polars Cloud workspaces.
compute Manage Polars Cloud compute clusters.

options:
-h, --help show this help message and exit
-v, --verbose Output debug logging messages.
-V, --version Display the version of the Polars Cloud client.
```

### Authentication

You can authenticate with Polars Cloud from the CLI using

```bash
pc login
```

This refreshes your access token and saves it to disk.

### Workspaces

Create and setup a new workspace

```bash
pc workspace setup
```

List all workspaces

```bash
pc workspace list
```

```
NAME ID STATUS
test-workspace 0194ac0e-5122-7a90-af5e-b1f60b1989f4 Active
polars-ci-2025… 0194287a-e0a5-7642-8058-0f79a39f5b98 Uninitialized
```

### Compute
46 changes: 46 additions & 0 deletions docs/source/polars-cloud/connect-cloud.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Connect cloud environment

To use Polars Cloud, you have to connect all your workspaces a cloud environment.

If you login to the Polars Cloud dashboard for the first time, you will notice a blue bar at the top of the screen. When you created a new account that is not connected to a cloud environment, you can explore Polars Cloud but you cannot execute any queries yet.

![An overview of the Polars Cloud dashboard showing a button to connect your cloud environment](../assets/connect-cloud/dashboard.png)

When you click the blue bar you will be redirected to the start of the set up flow. In this first step you can name your workspace.

## 1. Set workspace name

In the first step of the setup flow you give a name to your workspace. You can keep the name "Personal Workspace" or use the name of your team/department. This workspace name will be required by the compute context to execute a query remote.

!!! tip "Naming your workspace"
If you are not sure you can use a temporary name, You can change the name of workspace under the workspace settings at any moment.

![Connect your cloud screen where you can input a workspace name](../assets/connect-cloud/workspace-naming.png)

## 2. Deploy to AWS

When you have entered a name, you can click "Deploy to Amazon". This will open a screen in AWS with a CloudFormation template that is required to install the required roles in your AWS environment.

![CloudFormation stack image as step of the setupflow](../assets/connect-cloud/cloudformation.png)

If you want to learn more about what Polars Cloud installs in your environment, you can read more on [the AWS Infrastructure page](../providers/aws/infra).

!!! info "No permissions to deploy the stack in AWS"
If you don't have the required persmissions to deploy CloudFormation stacks in your AWS environment, you can copy the URL and share it with your operations team or someone with the permissions. With the URL they can deploy the stack for you.

## 3. Deploying the environment

After you have "Create stack", the CloudFormation stack will be deployed in your environment. This will take around 5 minutes. You can follow the progress in your AWS environment or in the Polars set up flow.

![Progress screen in the set up flow](../assets/connect-cloud/progress-page.png)

When the CloudFormation stack is deployed you will see a confirmation message.

![Final screen of the set up flow indication successful deployment](../assets/connect-cloud/successful-setup.png)

If you click "Start exploring", you will be redirected to the Polars Cloud dashboard.

You can now run your Polars query remotely in the cloud. Go to the [getting started section](../quickstart) to your first query in minutes, [learn more how to run queries remote](../run/compute-context) or manage your workspace to invite your team.

!!! info "Only connect a workspace once"
You only have to connect your workspace once. If you invite your team to a workspace that is connected to a cloud environment they can immediately run queries remotely.
38 changes: 38 additions & 0 deletions docs/source/polars-cloud/explain/authentication.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Authentication

Polars cloud allows authentication through short-lived authentication tokens. There are two ways you
can obtain an access token:

- command line interface
- python client

After a successful `login` Polars Cloud stores the token in `{$HOME}/.polars`. You can alter this
path by setting the environment variable `POLARS_CLOUD_ACCESS_TOKEN_PATH`.

### Command Line Interface (CLI)

Authenticate with CLI using the following command

```bash
pc login
```

### Python client

Authenticate with the Polars Cloud using

{{code_block('polars-cloud/authentication','login',['login'])}}

Both methods redirect you to the browser where you can provide your login credentials and continue
the sign in process.

## Service accounts

Both flows described above are for interactive logins where a person is present in the process. For
non-interactive workflows such as orchestration tools there are service accounts. These allow you to
login programmatically.

To create a service account go to the Polars Cloud dashboard under Settings and service accounts.
Here you can create a new service account for your workspace. To authenticate set the
`POLARS_CLOUD_CLIENT_ID` and `POLARS_CLOUD_CLIENT_SECRET` environment variables. Polars Cloud will
automatically pick these up if there are no access tokens present in the path.
46 changes: 46 additions & 0 deletions docs/source/polars-cloud/faq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# FAQ

On this page you can find answers to some frequently asked questions around Polars Cloud.

## Who is behind Polars Cloud?

Polars Cloud is built by the organization behind the open source Polars project. Polars has grown to 10M+ monthly downloads and more than 125M+ total downloads since the first commit in 2020. We are building Polars Cloud, on top of the open source Polars project, to offer a service that is more aligned with the needs of organizations that use or are looking to use Polars.

By offering our managed service we get to drive even greater adoption and invest more resources into the open source Polars project, supporting further improvements and long term development.

## Where does the compute run?

All compute runs in your own cloud environment. The main reason is that this ensures that your data never leaves your environment and that the compute is always close to your data.

You can learn more about how this setup in [the infrastructure section of the documentation](providers/aws/infra.md).

## Can you run Polars Cloud on-premise?

Currently, Polars Cloud is only available to organizations that are on AWS. Support for on-premise infrastructure is on our roadmap and will become available soon.

## What does Polars Cloud offer me beyond Polars?

Polars Cloud offers a managed service that enables scalable data processing with the flexibility and expressiveness of the Polars API. It extends the open source Polars project with the following capabilities:

- Distributed engine to efficiently handle terabyte to petabyte scale workloads through parallel processing across multiple nodes
- Cost-optimized serverless architecture that automatically scales compute resources
- Built-in fault tolerance mechanisms ensuring query completion even during hardware failures or system interruptions
- Comprehensive monitoring and analytics tools providing detailed insights into query performance and resource utilization.

## What are the main use cases for Polars Cloud?

Polars Cloud offers both a batch as an interactive mode to users. Batch mode can be used for ETL workloads or one-off large scale analytic jobs. Interactive mode is for users that are looking to do data exploration on a larger scale data processing that requires more compute than their own machine can offer.

## How can Polars Cloud integrate with my workflow?

One of our key priorities is ensuring that running remote queries feels as native and seamless as running them locally. Every user should be able to scale their queries effortlessly.

Polars Cloud is completely environment agnostic. This allows you to run your queries from anywhere such as your own machine, Jupyter/Marimo notebooks, Airflow DAGs, AWS Lambda functions, or your servers. By not tying you to a specific platform, Polars Cloud gives you the flexibility to execute your queries wherever it best fits your workflow.

## What is the pricing model of Polars Cloud?

Polars Cloud is available at no additional cost in this early stage. You only pay for the resources you use in your own cloud environment. We are exploring different usage based pricing models that are geared towards running queries as fast and efficient as possible.

## Will the distributed engine be available in open source?

The distributed engine is only available in Polars Cloud. There are no plans to make it available in the open source project. Polars is focused on single machines, as it makes efficient use of the available resources. Users already report utilizing Polars to process hundres of gigabytes of data on single (large) compute instance. The distributed engine is gear towards teams and organizations that are I/O bound or want to scale their Polars queries beyond single machines and required a solution process workloads at terabyte and even petabyte scale.
114 changes: 114 additions & 0 deletions docs/source/polars-cloud/glossary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
# Concepts

<!-- TODO: Alphabetic order + table of content -->

This section covers the main concepts present in Polars Cloud.

## Workspaces

A workspace is a logical container in which other resources within Polars Cloud live. All resources
(e.g. compute, queries, ...) are bound to a specific workspace.

!!! Access Control Everyone within a workspace has the same access there is no notion of roles (e.g.
admin, user) in the current version. This means users within the same workspace can view each others
clusters and queries. A workspace has a single IAM role in the cloud and runs under the same
permissions. A user can't send queries to a compute cluster of another user.

{{code_block('polars-cloud/concepts','workspace',['Workspace'])}}

## Compute Context

The compute context describes the underlying hardware. You can start a compute context by specifying
the instance requirements in terms of cpu's and memory or by directly specifying the AWS EC2
instance type.

By instance type

{{code_block('polars-cloud/concepts','compute',['ComputeContext'])}}

By instance requirements {{code_block('polars-cloud/concepts','compute2',['ComputeContext'])}}

When specifying with instance requirements Polars will search for the cheapest available instance
type with at least the requested values. In the cloud not all options are available. The following
example `pc.ComputeContext(cpus = 1, memory = 32)` there is no machine with 1 core that has 32 GB.
In this case Polars Cloud will find the cheapest available machine that has at least 32 GB of RAM.

Below are the various options which you can specify

| Parameter | Type | Description |
| -------------------------------- | ------------ | ------------------------------------------------------------------------------------------------------- |
| `workspace_name`<img width=100/> | string | The name of the workspace |
| `cpus` | number | The minimum number of CPUs the compute cluster should have access to. |
| `memory` | number | The minimum amount of RAM (in GB) the compute cluster should have access to. |
| `instance_type` | string | The AWS instance type (e.g. `t2.micro`). This parameter can not be used together with memory or cpus |
| `storage` | number | The amount of local disk space (in GB) each node in the compute cluster has access to. Defaults to `16` |
| `cluster_size` | number | The number of machines to spin up in the cluster. Defaults to `1`. |
| `interactive` | bool | Activate interactive mode |
| `labels` | List[string] | Labels of the compute context |
| `log_level` | string | Override the log level of the cluster for debug purposes. One of `"info", "debug", "trace"`. |

!!! warning "Distributed Engine" We are currently developing our distributed engine. This engine
will run on top of the new open source streaming engine and is exclusive to Polars Cloud. It is
still in an experimental phase.

### Interactive vs Batch

A compute context can either run in interactive or batch mode.

The batch mode is meant for queries that are run periodically. In this mode, clients send there
queries to Polars Cloud to be queued. If ready, the compute context will poll and run each query.
Metadata around the query (e.g. status, query plan, logs, metrics) are send back to Polars Cloud for
reporting purposes. The actual result data is not shared for privacy reasons.

This process of queuing and polling leads to some seconds delay, although negligible when running a
lot of queries interactively this can lower the developer experience. Additionally, for exploratory
work it is not always necessary or valuable to save the metadata of the query. In these cases the
interactive mode is better. In interactive mode you as a client directly communicate with the
compute cluster. This way there is no delay and queries run immediately. An additional option is
that data can be shared securely, for example to view (a part of) the result.

### Default Context

It is recommended to explicitly specify the compute context when running a query. However to
simplify manners it is possible to use a default context. Under `Settings` in your workspace you'll
find the option to specify default parameters for `memory`, `cpus`, `cluster_size` etc. If you run a
query without a context Polars Cloud will spin up a compute cluster with these default parameters.

## Queries

Queries represent a single `LazyFrame` being executed in Polars Cloud. This can either be in Batch
or interactive mode.

{{code_block('polars-cloud/concepts','query',[])}}

Running a query remotely is as simple as calling `remote` while passing the compute context to it.
Depending on the mode of Compute this either returns a `InteractiveQuery` or a `BatchQuery`.

### Interactive

In interactive mode you can directly communicate to the compute context. The communication is
securely encrypted using mTLS between your client and the compute server. Queries send to an
interactive compute context return a `InteractiveQuery` which can be awaited or cancelled. Queries
executed in interactive mode do not show up on the polars cloud dashboard.

{{code_block('polars-cloud/concepts','interactive',['QueryResult','InteractiveQuery'])}}

In this example we create a `LazyFrame` called `lf` and we execute it on Polars Cloud. We can
continue on the result by calling `lazy()` on the result which leads to a `LazyFrame` .

<!-- dprint-ignore-start -->

!!! info "Interactive mode"
If you want to continue on a existing query / query result you must use
`write_parquet` to S3 as an intermediate storage location. We are adding a `.execute` (or similar)
to our API which allows you to skip specifying this location.

<!-- dprint-ignore-end-->

### Batch

Running a query in batch mode gives a `BatchQuery` which has the same API as its interactive
counterpart. The main differences are that queries go through the control plane and metadata on the
query is stored in the dashboard for reporting purposes.

{{code_block('polars-cloud/concepts','interactive',['QueryResult','BatchQuery'])}}
3 changes: 3 additions & 0 deletions docs/source/polars-cloud/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Introduction

what is polars cloud
Empty file.
Loading
Loading