Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
1ec16f3
ADH-4718: added spark-connect grcp frontend with thrift backend
Mar 4, 2026
22d5fe4
ADH-4718: set spark connect port before spark session start
Mar 5, 2026
15ce7eb
ADH-4718: set user for grcp userContext
Mar 6, 2026
b5e24be
ADH-4718: added docs/architecture.mmd
Mar 6, 2026
0cb2ff0
ADH-4718: renamed some classes
Mar 11, 2026
7d249f4
ADH-4718: set default 10199 to kyuubi.frontend.spark.connect.bind.port
Mar 11, 2026
69a5bd8
ADH-4718: removed ENGINE_SPARK_CONNECT_ENABLED
Mar 11, 2026
2e2ca11
ADH-4718: fixed saslDisabled
Mar 11, 2026
0551c6d
ADH-4718: added ssl support for SparkConnectFrontendService
Mar 12, 2026
62d78a3
ADH-4718: added kerberos authentication (first step)
Mar 17, 2026
5ab804c
ADH-4718: fixed grpc error for new spark session
Mar 18, 2026
6384c2a
ADH-4718: added debug for SPNEGO tokens
Mar 20, 2026
bb7c339
ADH-4718: support LDAP auth (Bearer Authentication) for spark connect
Mar 20, 2026
8d5fe7a
ADH-4718: added docs/how-to-spark-connect.md
Mar 23, 2026
4c33c9b
ADH-4718: fixed docs/how-to-spark-connect.md
Mar 23, 2026
d49d14c
ADH-4718: fixed docs/how-to-spark-connect.md
Mar 23, 2026
8e3e9bd
ADH-4718: added auth service to get tokens
Mar 26, 2026
dcf0ee0
ADH-4718: fixed session leak in releaseSession; use FRONTEND_SPARK_CO…
Mar 27, 2026
e3744f0
ADH-4718: fixed scalastyle
Mar 27, 2026
0ed2dba
ADH-4718: renew token in each grcp request; removed SPNEGO; added sma…
Mar 30, 2026
f05f392
ADH-4718: added SparkConnectCredentialHandler; use Basic scheme for l…
Mar 31, 2026
60369e7
ADH-4718: revoke token on ReleaseSession, fixed tests
Mar 31, 2026
52c166c
ADH-4718: support NONE auth for spark connect, renamed PlainCredentia…
Apr 1, 2026
bc1a9d2
ADH-4718: added tests for kerberos auth (minikdc), added tests for Ba…
Apr 1, 2026
8093e0c
ADH-4718: renamed SparkConnectFrontendService to KyuubiSparkConnectFr…
Apr 2, 2026
5e64d39
ADH-4718: added tests for session manager, small refactoring
Apr 2, 2026
27168b9
ADH-4718: added integration test for spark-connect; added error handl…
Apr 3, 2026
e26e103
ADH-4718: added docs/spark-connect-auth-simple.mmd
Apr 3, 2026
fa133b8
ADH-4718: moved spark-connect docs to docs/spark_connect dir; added m…
Apr 3, 2026
bba7f92
ADH-4718: fixed docs/spark_connect/spark_connect_auth.mmd
Apr 3, 2026
58dc94e
ADH-4718: renamed docs/spark_connect/spark_connect_flow.mmd
Apr 3, 2026
b93fb67
ADH-4718: small fix in docs/spark_connect/how_to.md
Apr 3, 2026
1a009b6
ADH-4718: renamed docs/spark_connect/spark_connect_flow.mmd to docs/s…
Apr 3, 2026
f455a37
ADH-4718: minor fix in docs/spark_connect/spark_connect_auth_flow.mmd
Apr 3, 2026
e5bb321
ADH-4718: added build/build-python-package; added python/kyuubi-spark…
Apr 8, 2026
80d4d63
ADH-4718: added docs/spark_connect/spark_connect_architecture.mmd, do…
Apr 15, 2026
3c5a56f
small fixes in kyuubi_architecture.mmd and kyuubi_spark_connect_archi…
Apr 16, 2026
f2df80a
ADH-4718: added spark-connect-communication.png
Apr 16, 2026
052ca79
ADH-4718: fixed spark_connect_auth_flow.mmd
Apr 16, 2026
1308e16
ADH-4718: revoke token and close application in ReleaseSession; suppo…
Apr 20, 2026
96e846d
ADH-4718: don't create new session in releaseExecute
Apr 22, 2026
44ebcf9
ADH-8148: prepared jvm client
Apr 24, 2026
51fcb44
ADH-4718: fixed proto file path in build/build-python-package
Apr 24, 2026
6120d1b
ADH-8148: moved extensions/kyuubi-spark-connect-client to extensions/…
Apr 28, 2026
e956710
ADH-8148: fixed pom.xml
Apr 28, 2026
51e2497
removed <createDependencyReducedPom>false</createDependencyReducedPom…
Apr 28, 2026
31d2667
ADH-4718: added spark_connect_auth.proto
May 12, 2026
f6efae4
ADH-4718: fixed docs/spark_connect/how_to.md
May 12, 2026
6faf75b
ADH-4718: fixed docs/spark_connect/how_to.md
May 12, 2026
1fc757e
ADH-4718: fixed docs/spark_connect/how_to.md
May 12, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions build/build-python-package
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
#!/usr/bin/env bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Script to regenerate Python gRPC stubs and build the kyuubi-spark-connect wheel.
# Run this script when kyuubi-spark-connect-common/src/main/protobuf/spark_connect_auth.proto changes.
#
# Prerequisites: pip install grpcio-tools build

set -e

KYUUBI_HOME="$(cd "$(dirname "$0")/.."; pwd)"

if ! python3 -m grpc_tools.protoc --version &>/dev/null; then
echo "Error: grpcio-tools is not installed. Run: pip install grpcio-tools"
exit 1
fi

if ! python3 -m build --version &>/dev/null; then
echo "Error: build is not installed. Run: pip install build"
exit 1
fi

python3 -m grpc_tools.protoc \
-I "$KYUUBI_HOME/kyuubi-spark-connect-common/src/main/protobuf" \
--python_out="$KYUUBI_HOME/python/kyuubi-spark-connect" \
--grpc_python_out="$KYUUBI_HOME/python/kyuubi-spark-connect" \
"$KYUUBI_HOME/kyuubi-spark-connect-common/src/main/protobuf/kyuubi/spark_connect_auth.proto"

echo "Generated stubs in $KYUUBI_HOME/python/kyuubi-spark-connect/kyuubi/"

cd "$KYUUBI_HOME/python/kyuubi-spark-connect"
python3 -m build --wheel --no-isolation
echo "Wheel built in $KYUUBI_HOME/python/kyuubi-spark-connect/dist/"
189 changes: 189 additions & 0 deletions docs/spark_connect/how_to.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
# Kyuubi Spark Connect

## Connect to Kyuubi using gRPC

Add `kyuubi.frontend.protocols=SPARK_CONNECT` and set other options in Kyuubi configuration `/etc/kyuubi/conf/kyuubi-defaults.conf`:

```
kyuubi.frontend.protocols=THRIFT_BINARY,REST,SPARK_CONNECT
kyuubi.frontend.spark.connect.bind.port=10199
kyuubi.frontend.spark.connect.ssl.enabled=true
```

### Common requirements for python client

define `GRPC_DEFAULT_SSL_ROOTS_FILE_PATH` env variable (it points to path with ssl certificates):

```
export GRPC_DEFAULT_SSL_ROOTS_FILE_PATH=/etc/ssl/certs/ca-certificates.crt
```

**For KERBEROS and LDAP authentication types:**

Build kyuubi-spark-connect python package, run:

```
build/build-python-package
...
Successfully built kyuubi_spark_connect-1.0.0-py3-none-any.whl
```
These package (wheel) should be installed to /opt/pyspark3-python/lib or other directory with python libraries.

Then define `PYTHONPATH` if you run python code non-interactively:

```
export PYTHONPATH="/opt/pyspark3-python/lib/python3.10/site-packages/:/usr/lib/spark3/python/lib/py4j-0.10.9.7-src.zip:/usr/lib/spark3/python/"

```

Deploy the spark code from PR (we'll use `KyuubiSessionBuilder` python class):
https://github.com/arenadata/spark/pull/27/changes

### There are several authentication types

#### "NOSASL" or "NONE" authentication type

Set in `/etc/kyuubi/conf/kyuubi-defaults.conf`

```
kyuubi.authentication=NOSASL
```

Run interactive pyspark shell command:

```
pyspark3 --remote 'sc://vdmitriev-adh-orion-hadoop-3.ru-central1.internal:10199/;use_ssl=true;x-user-name=vdmitriev'

Using Python version 3.10.4 (main, Apr 21 2025 10:41:58)
Client connected to the Spark Connect server at vdmitriev-adh-orion-hadoop-3.ru-central1.internal:10199
SparkSession available as 'spark'.
>>> spark.sql("select * from vdmitriev.table1_orc").show()
+-----+------------+
| id| data|
+-----+------------+
| 556|test data556|
...

```

#### KERBEROS authentication (SPNEGO)

set in config `/etc/kyuubi/conf/kyuubi-defaults.conf`

```
kyuubi.authentication=KERBEROS

kyuubi.spnego.keytab=/etc/security/keytabs/HTTP.service.keytab
kyuubi.spnego.principal=HTTP/vdmitriev-adh-orion-hadoop-3.ru-central1.internal@RU-CENTRAL1.INTERNAL
```

##### Requirements

Install the following packages (for ubuntu):

```
gcc, python3.10-dev, libkrb5-dev
```

and install python library:

```
gssapi
```

obtain Kerberos ticket-granting ticket:

```
kinit vdmitriev
```

you can run interactive pyspark shell command, define `KYUUBI_AUTH=kerberos` env variable:

```
KYUUBI_AUTH=kerberos pyspark3 --remote "sc://vdmitriev-adh-orion-hadoop-3.ru-central1.internal:10199/;use_ssl=true"
...
Using Python version 3.10.4 (main, Apr 21 2025 10:41:58)
Client connected to the Spark Connect server at vdmitriev-adh-orion-hadoop-3.ru-central1.internal:10199
SparkSession available as 'spark'.
>>> sql("select current_user()").show()
+--------------+
|current_user()|
+--------------+
| vdmitriev|
+--------------+
```

or pass `auth="kerberos"` parameter to `KyuubiSessionBuilder` class in python code:

```
from kyuubi.spark_connect import KyuubiSessionBuilder

HOST = "vdmitriev-adh-orion-hadoop-3.ru-central1.internal"
PORT = 10199

spark = KyuubiSessionBuilder(f"sc://{HOST}:{PORT}/;use_ssl=true", auth="kerberos").getOrCreate()

spark.sql("SELECT current_user()").show()

spark.stop()

```

run this code:

```
$ python3 spark_connect_client_kerberos.py
+--------------+
|current_user()|
+--------------+
| vdmitriev|
+--------------+
```

#### LDAP authentication

set in `/etc/kyuubi/conf/kyuubi-defaults.conf`

```
kyuubi.authentication=LDAP
```

see more about ldap parameters: https://kyuubi.readthedocs.io/en/master/security/ldap.html

##### Interactive mode (pyspark)

you can run interactive pyspark shell command, define `KYUUBI_AUTH=ldap`, `KYUUBI_USERNAME` and `KYUUBI_PASSWORD` env variables:

```
export KYUUBI_PASSWORD=vdmitriev2pass

KYUUBI_AUTH=ldap KYUUBI_USERNAME=vdmitriev2 pyspark3 --remote "sc://vdmitriev-adh-orion-hadoop-3.ru-central1.internal:10199/;use_ssl=true"
...
```

##### Python code

pass `auth="ldap"`, `username` and `password` parameters to `KyuubiSessionBuilder` class in python code:

```
spark = KyuubiSessionBuilder(f"sc://{HOST}:{PORT}/;use_ssl=true", auth="ldap",
username="vdmitriev2", password="vdmitriev2pass").getOrCreate()

spark.sql("SELECT current_user()").show()
```

## Build kyuubi

```
export JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64
./build/dist --tgz --spark-provided --flink-provided --hive-provided --web-ui -Pjdbc-shaded -Pjava-8 -Pscala-2.13 -Pspark-3.5 -Pzookeeper-3.6 -Drat.skip=true
```

run spark-connect tests:

```
export JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64
export SPARK_HOME=/home/vdmitriev/git/spark-3.5.4-bin-hadoop3-scala2.13
build/mvn test -pl kyuubi-server -Pjdbc-shaded -Pjava-8 -Pscala-2.13 -Pspark-3.5 -Pzookeeper-3.6 -Dsuites="org.apache.kyuubi.server.grpc.*,*SparkConnect*"
```

92 changes: 92 additions & 0 deletions docs/spark_connect/kyuubi_architecture.mmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
flowchart TB
%% ── Clients ──────────────────────────────────────────────────────────────
subgraph Clients
C1[JDBC / Beeline / ODBC]
C2[REST Client]
C3[PySpark — Spark Connect]
C4[MySQL / Trino Client]
end

%% ── Kyuubi Server ────────────────────────────────────────────────────────
subgraph KyuubiServer["Kyuubi Server"]
direction TB

subgraph Frontends["Frontend Services"]
F1[Thrift Binary & HTTP<br>:10009]
F2[REST :10099]
F3[SparkConnect gRPC :10199]
F4[MySQL / Trino]
end

subgraph Core["Session & Backend"]
SM[SessionManager<br>+ BatchService]
BS[BackendService<br>KyuubiSyncThriftClient]
end

MB[MetadataStore<br>JDBC / LevelDB]
end

%% ── HA / Discovery ───────────────────────────────────────────────────────
DISC[(ZooKeeper / etcd)]

%% ── Engines ──────────────────────────────────────────────────────────────
subgraph Engines["Compute Engines (separate JVM processes)"]
direction LR

subgraph SparkEngine["Spark SQL Engine"]
SE_T[Thrift Frontend]
SE_G[Spark Connect Service]
SE_B[SparkSQL Backend]
end

OE[Flink / Hive / JDBC /<br>Trino / Chat Engines]
end

%% ── Spark Extensions ─────────────────────────────────────────────────────
subgraph SparkExt["Spark Extensions"]
EX[spark-authz · spark-lineage<br>extension-spark · connectors]
end

%% ── Observability ────────────────────────────────────────────────────────
OB[Metrics · Event Logging · Spark UI]

%% ── Connections ──────────────────────────────────────────────────────────
C1 -->|Thrift| F1
C2 -->|HTTP| F2
C3 -->|gRPC| F3
C4 --> F4

F1 & F2 & F4 --> SM
F3 -->|open session via Thrift,<br>then gRPC proxy| SM

SM --> BS
SM <-->|discover & register| DISC
SM --> MB

BS -->|Thrift| SE_T
BS -->|Thrift / HTTP| OE
F3 -->|gRPC ExecutePlan| SE_G

SE_T -->|register| DISC
SE_B --> EX

KyuubiServer --> OB

%% ── Styles ───────────────────────────────────────────────────────────────
classDef client fill:#dbeafe,stroke:#3b82f6,color:#1e3a5f
classDef frontend fill:#dcfce7,stroke:#16a34a,color:#14532d
classDef core fill:#f3e8ff,stroke:#9333ea,color:#3b0764
classDef engine fill:#e0f2fe,stroke:#0284c7,color:#0c4a6e
classDef discovery fill:#fce7f3,stroke:#db2777,color:#831843
classDef extension fill:#ecfdf5,stroke:#059669,color:#064e3b
classDef observ fill:#f1f5f9,stroke:#64748b,color:#1e293b
classDef store fill:#fef9c3,stroke:#ca8a04,color:#713f12

class C1,C2,C3,C4 client
class F1,F2,F3,F4 frontend
class SM,BS core
class SE_T,SE_G,SE_B,OE engine
class DISC discovery
class EX extension
class OB observ
class MB store
39 changes: 39 additions & 0 deletions docs/spark_connect/kyuubi_spark_connect_architecture.mmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
graph LR
Client1["User 1 (python)"]
Client2["User 2 (go)"]
Client3["User 3 (java)"]

subgraph Kyuubi["Kyuubi Server"]
Frontend["Spark Connect Frontend<br>(gRPC)"]
end

subgraph App3["YARN Application 3 (User 3)"]
SC3["Spark Connect Service"]
end

subgraph App2["YARN Application 2 (User 2)"]
SC2["Spark Connect Service"]
end

subgraph App1["YARN Application 1 (User 1)"]
SC1["Spark Connect Service"]
end

Client1 -->|"gRPC (Kerberos)"| Frontend
Client2 -->|"gRPC (LDAP)"| Frontend
Client3 -->|"gRPC (Kerberos)"| Frontend
Frontend -->|"gRPC proxy"| SC1
Frontend -->|"gRPC proxy"| SC2
Frontend -->|"gRPC proxy"| SC3

Client1:::client
Client2:::client
Client3:::client
Frontend:::frontend
SC1:::engine
SC2:::engine
SC3:::engine

classDef client fill:#dbeafe,stroke:#3b82f6,color:#1e3a5f
classDef frontend fill:#dcfce7,stroke:#16a34a,color:#14532d
classDef engine fill:#e0f2fe,stroke:#0284c7,color:#0c4a6e
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
22 changes: 22 additions & 0 deletions docs/spark_connect/spark_connect_architecture.mmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
graph LR
Client1["User 1 (python)"]
Client2["User 2 (go)"]
Client3["User 3 (java)"]

SparkConnect["Spark Connect Server (standalone)"]
Cluster["YARN Application"]

Client1 -->|"gRPC"| SparkConnect
Client2 -->|"gRPC"| SparkConnect
Client3 -->|"gRPC"| SparkConnect
SparkConnect -->|"submits jobs"| Cluster

Client1:::client
Client2:::client
Client3:::client
SparkConnect:::frontend
Cluster:::engine

classDef client fill:#dbeafe,stroke:#3b82f6,color:#1e3a5f
classDef frontend fill:#dcfce7,stroke:#16a34a,color:#14532d
classDef engine fill:#e0f2fe,stroke:#0284c7,color:#0c4a6e
Loading
Loading