Skip to content

Commit e0c76ff

Browse files
committed
[DOP-18232] Add PySpark to usage examples
1 parent 24e30ad commit e0c76ff

File tree

1 file changed

+87
-20
lines changed

1 file changed

+87
-20
lines changed

docs/using_the_dialect.md

Lines changed: 87 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -2,31 +2,98 @@
22

33
This section provides instructions on how to configure Apache Spark to use the Spark Dialect Extension, enabling custom handling of JDBC data types.
44

5-
### Configuration Steps
5+
### Add the JAR to Spark
66

7-
To integrate the Spark Dialect Extension into your Spark application, you need to add the compiled JAR file to the Spark classpath. The extension, via ``<DBMS>DialectRegistry`` classes, will dynamically detect the Spark version and load the corresponding dialect.
7+
#### Using release version
88

9-
#### Add the JAR to Spark
9+
##### Using SparkConf
1010

11-
1. **Locate the Compiled JAR**: Ensure you have built the project and locate the `.jar`: `/path/to/spark-dialect-extension_2.12-0.1.jar` directory.
11+
For PySpark:
1212

13-
2. **Configure Spark**: Add the JAR to your Spark job's classpath by modifying the `spark.jars` configuration parameter. This can be done in several ways depending on how you are running your Spark application:
13+
```python
14+
from pyspark.sql import SparkSession
1415

15-
- **Spark Submit Command**:
16-
```bash
17-
spark-submit --jars /path/to/spark-dialect-extension_2.12-0.1.jar --class YourMainClass your-application.jar
18-
```
16+
spark = (
17+
SparkSession.builder
18+
.appName("My Spark App")
19+
.config("spark.jars.packages", "io.github.mtsongithub.doetl:spark-dialect-extension_2.12:0.0.1")
20+
.getOrCreate()
21+
)
22+
```
23+
24+
For Spark on Scala:
25+
26+
```scala
27+
import org.apache.spark.sql.SparkSession
28+
29+
val spark = SparkSession.builder()
30+
.appName("My Spark App")
31+
.config("spark.jars.packages", "io.github.mtsongithub.doetl:spark-dialect-extension_2.12:0.0.1")
32+
.getOrCreate()
33+
```
34+
35+
##### Using Spark Submit
36+
37+
```bash
38+
spark-submit --conf spark.jars.packages=io.github.mtsongithub.doetl:spark-dialect-extension_2.12:0.0.1
39+
```
40+
41+
#### Compile from source
42+
43+
##### Build .jar file
44+
45+
See [CONTRIBUTING.md](../CONTRIBUTING.md) for build instructions.
46+
47+
After build you'll have a file `/path/to/cloned-repo/target/scala_2.12/spark-dialect-extension_2.12-0.0.1.jar`
48+
49+
##### Using SparkConf
50+
51+
For PySpark:
1952

20-
- **Programmatically** (within your Spark application):
21-
```scala
22-
import io.github.mtsongithub.doetl.sparkdialectextensions.clickhouse.ClickhouseDialectRegistry
23-
import org.apache.spark.sql.SparkSession
24-
25-
val spark = SparkSession.builder()
53+
```python
54+
from pyspark.sql import SparkSession
55+
56+
spark = (
57+
SparkSession.builder
2658
.appName("My Spark App")
27-
.config("spark.jars", "/path/to/spark-dialect-extension_2.12-0.1.jar")
59+
.config("spark.jars", "/path/to/cloned-repo/target/scala_2.12/spark-dialect-extension_2.12-0.0.1.jar")
2860
.getOrCreate()
29-
30-
// Register custom Clickhouse dialect based on Spark version
31-
ClickhouseDialectRegistry.register()
32-
```
61+
)
62+
```
63+
64+
For Spark on Scala:
65+
66+
```scala
67+
import org.apache.spark.sql.SparkSession
68+
69+
val spark = SparkSession.builder()
70+
.appName("My Spark App")
71+
.config("spark.jars", "/path/to/cloned-repo/target/scala_2.12/spark-dialect-extension_2.12-0.0.1.jar")
72+
.getOrCreate()
73+
```
74+
75+
##### Using Spark Submit
76+
77+
```bash
78+
spark-submit --jars /path/to/cloned-repo/target/scala_2.12/spark-dialect-extension_2.12-0.0.1.jar
79+
```
80+
81+
### Register a dialect
82+
83+
To integrate the Spark Dialect Extension into your Spark application, you need to use ``<DBMS>DialectRegistry`` classes, which dynamically detect the Spark version and register the corresponding dialect.
84+
85+
For PySpark:
86+
87+
```python
88+
# Register custom Clickhouse dialect
89+
ClickhouseDialectRegistry = spark._jvm.io.github.mtsongithub.doetl.sparkdialectextensions.clickhouse.ClickhouseDialectRegistry
90+
ClickhouseDialectRegistry.register()
91+
```
92+
93+
For Spark on Scala:
94+
```scala
95+
// Register custom Clickhouse dialect
96+
import io.github.mtsongithub.doetl.sparkdialectextensions.clickhouse.ClickhouseDialectRegistry
97+
98+
ClickhouseDialectRegistry.register()
99+
```

0 commit comments

Comments
 (0)