Skip to content

Commit

Permalink
[DOP-18232] Add PySpark to usage examples
Browse files Browse the repository at this point in the history
  • Loading branch information
dolfinus committed Sep 23, 2024
1 parent 24e30ad commit e0c76ff
Showing 1 changed file with 87 additions and 20 deletions.
107 changes: 87 additions & 20 deletions docs/using_the_dialect.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,31 +2,98 @@

This section provides instructions on how to configure Apache Spark to use the Spark Dialect Extension, enabling custom handling of JDBC data types.

### Configuration Steps
### Add the JAR to Spark

To integrate the Spark Dialect Extension into your Spark application, you need to add the compiled JAR file to the Spark classpath. The extension, via ``<DBMS>DialectRegistry`` classes, will dynamically detect the Spark version and load the corresponding dialect.
#### Using release version

#### Add the JAR to Spark
##### Using SparkConf

1. **Locate the Compiled JAR**: Ensure you have built the project and locate the `.jar`: `/path/to/spark-dialect-extension_2.12-0.1.jar` directory.
For PySpark:

2. **Configure Spark**: Add the JAR to your Spark job's classpath by modifying the `spark.jars` configuration parameter. This can be done in several ways depending on how you are running your Spark application:
```python
from pyspark.sql import SparkSession

- **Spark Submit Command**:
```bash
spark-submit --jars /path/to/spark-dialect-extension_2.12-0.1.jar --class YourMainClass your-application.jar
```
spark = (
SparkSession.builder
.appName("My Spark App")
.config("spark.jars.packages", "io.github.mtsongithub.doetl:spark-dialect-extension_2.12:0.0.1")
.getOrCreate()
)
```

For Spark on Scala:

```scala
import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder()
.appName("My Spark App")
.config("spark.jars.packages", "io.github.mtsongithub.doetl:spark-dialect-extension_2.12:0.0.1")
.getOrCreate()
```

##### Using Spark Submit

```bash
spark-submit --conf spark.jars.packages=io.github.mtsongithub.doetl:spark-dialect-extension_2.12:0.0.1
```

#### Compile from source

##### Build .jar file

See [CONTRIBUTING.md](../CONTRIBUTING.md) for build instructions.

After build you'll have a file `/path/to/cloned-repo/target/scala_2.12/spark-dialect-extension_2.12-0.0.1.jar`

##### Using SparkConf

For PySpark:

- **Programmatically** (within your Spark application):
```scala
import io.github.mtsongithub.doetl.sparkdialectextensions.clickhouse.ClickhouseDialectRegistry
import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder()
```python
from pyspark.sql import SparkSession

spark = (
SparkSession.builder
.appName("My Spark App")
.config("spark.jars", "/path/to/spark-dialect-extension_2.12-0.1.jar")
.config("spark.jars", "/path/to/cloned-repo/target/scala_2.12/spark-dialect-extension_2.12-0.0.1.jar")
.getOrCreate()

// Register custom Clickhouse dialect based on Spark version
ClickhouseDialectRegistry.register()
```
)
```

For Spark on Scala:

```scala
import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder()
.appName("My Spark App")
.config("spark.jars", "/path/to/cloned-repo/target/scala_2.12/spark-dialect-extension_2.12-0.0.1.jar")
.getOrCreate()
```

##### Using Spark Submit

```bash
spark-submit --jars /path/to/cloned-repo/target/scala_2.12/spark-dialect-extension_2.12-0.0.1.jar
```

### Register a dialect

To integrate the Spark Dialect Extension into your Spark application, you need to use ``<DBMS>DialectRegistry`` classes, which dynamically detect the Spark version and register the corresponding dialect.

For PySpark:

```python
# Register custom Clickhouse dialect
ClickhouseDialectRegistry = spark._jvm.io.github.mtsongithub.doetl.sparkdialectextensions.clickhouse.ClickhouseDialectRegistry
ClickhouseDialectRegistry.register()
```

For Spark on Scala:
```scala
// Register custom Clickhouse dialect
import io.github.mtsongithub.doetl.sparkdialectextensions.clickhouse.ClickhouseDialectRegistry

ClickhouseDialectRegistry.register()
```

0 comments on commit e0c76ff

Please sign in to comment.