|
2 | 2 |
|
3 | 3 | This section provides instructions on how to configure Apache Spark to use the Spark Dialect Extension, enabling custom handling of JDBC data types.
|
4 | 4 |
|
5 |
| -### Configuration Steps |
| 5 | +### Add the JAR to Spark |
6 | 6 |
|
7 |
| -To integrate the Spark Dialect Extension into your Spark application, you need to add the compiled JAR file to the Spark classpath. The extension, via ``<DBMS>DialectRegistry`` classes, will dynamically detect the Spark version and load the corresponding dialect. |
| 7 | +#### Using release version |
8 | 8 |
|
9 |
| -#### Add the JAR to Spark |
| 9 | +##### Using SparkConf |
10 | 10 |
|
11 |
| -1. **Locate the Compiled JAR**: Ensure you have built the project and locate the `.jar`: `/path/to/spark-dialect-extension_2.12-0.1.jar` directory. |
| 11 | +For PySpark: |
12 | 12 |
|
13 |
| -2. **Configure Spark**: Add the JAR to your Spark job's classpath by modifying the `spark.jars` configuration parameter. This can be done in several ways depending on how you are running your Spark application: |
| 13 | +```python |
| 14 | +from pyspark.sql import SparkSession |
14 | 15 |
|
15 |
| -- **Spark Submit Command**: |
16 |
| - ```bash |
17 |
| - spark-submit --jars /path/to/spark-dialect-extension_2.12-0.1.jar --class YourMainClass your-application.jar |
18 |
| - ``` |
| 16 | +spark = ( |
| 17 | + SparkSession.builder |
| 18 | + .appName("My Spark App") |
| 19 | + .config("spark.jars.packages", "io.github.mtsongithub.doetl:spark-dialect-extension_2.12:0.0.1") |
| 20 | + .getOrCreate() |
| 21 | +) |
| 22 | +``` |
| 23 | + |
| 24 | +For Spark on Scala: |
| 25 | + |
| 26 | +```scala |
| 27 | +import org.apache.spark.sql.SparkSession |
| 28 | + |
| 29 | +val spark = SparkSession.builder() |
| 30 | +.appName("My Spark App") |
| 31 | +.config("spark.jars.packages", "io.github.mtsongithub.doetl:spark-dialect-extension_2.12:0.0.1") |
| 32 | +.getOrCreate() |
| 33 | +``` |
| 34 | + |
| 35 | +##### Using Spark Submit |
| 36 | + |
| 37 | +```bash |
| 38 | +spark-submit --conf spark.jars.packages=io.github.mtsongithub.doetl:spark-dialect-extension_2.12:0.0.1 |
| 39 | +``` |
| 40 | + |
| 41 | +#### Compile from source |
| 42 | + |
| 43 | +##### Build .jar file |
| 44 | + |
| 45 | +See [CONTRIBUTING.md](../CONTRIBUTING.md) for build instructions. |
| 46 | + |
| 47 | +After build you'll have a file `/path/to/cloned-repo/target/scala_2.12/spark-dialect-extension_2.12-0.0.1.jar` |
| 48 | + |
| 49 | +##### Using SparkConf |
| 50 | + |
| 51 | +For PySpark: |
19 | 52 |
|
20 |
| -- **Programmatically** (within your Spark application): |
21 |
| - ```scala |
22 |
| - import io.github.mtsongithub.doetl.sparkdialectextensions.clickhouse.ClickhouseDialectRegistry |
23 |
| - import org.apache.spark.sql.SparkSession |
24 |
| - |
25 |
| - val spark = SparkSession.builder() |
| 53 | +```python |
| 54 | +from pyspark.sql import SparkSession |
| 55 | + |
| 56 | +spark = ( |
| 57 | + SparkSession.builder |
26 | 58 | .appName("My Spark App")
|
27 |
| - .config("spark.jars", "/path/to/spark-dialect-extension_2.12-0.1.jar") |
| 59 | + .config("spark.jars", "/path/to/cloned-repo/target/scala_2.12/spark-dialect-extension_2.12-0.0.1.jar") |
28 | 60 | .getOrCreate()
|
29 |
| - |
30 |
| - // Register custom Clickhouse dialect based on Spark version |
31 |
| - ClickhouseDialectRegistry.register() |
32 |
| - ``` |
| 61 | +) |
| 62 | +``` |
| 63 | + |
| 64 | +For Spark on Scala: |
| 65 | + |
| 66 | +```scala |
| 67 | +import org.apache.spark.sql.SparkSession |
| 68 | + |
| 69 | +val spark = SparkSession.builder() |
| 70 | +.appName("My Spark App") |
| 71 | +.config("spark.jars", "/path/to/cloned-repo/target/scala_2.12/spark-dialect-extension_2.12-0.0.1.jar") |
| 72 | +.getOrCreate() |
| 73 | +``` |
| 74 | + |
| 75 | +##### Using Spark Submit |
| 76 | + |
| 77 | +```bash |
| 78 | +spark-submit --jars /path/to/cloned-repo/target/scala_2.12/spark-dialect-extension_2.12-0.0.1.jar |
| 79 | +``` |
| 80 | + |
| 81 | +### Register a dialect |
| 82 | + |
| 83 | +To integrate the Spark Dialect Extension into your Spark application, you need to use ``<DBMS>DialectRegistry`` classes, which dynamically detect the Spark version and register the corresponding dialect. |
| 84 | + |
| 85 | +For PySpark: |
| 86 | + |
| 87 | +```python |
| 88 | +# Register custom Clickhouse dialect |
| 89 | +ClickhouseDialectRegistry = spark._jvm.io.github.mtsongithub.doetl.sparkdialectextensions.clickhouse.ClickhouseDialectRegistry |
| 90 | +ClickhouseDialectRegistry.register() |
| 91 | +``` |
| 92 | + |
| 93 | +For Spark on Scala: |
| 94 | +```scala |
| 95 | +// Register custom Clickhouse dialect |
| 96 | +import io.github.mtsongithub.doetl.sparkdialectextensions.clickhouse.ClickhouseDialectRegistry |
| 97 | + |
| 98 | +ClickhouseDialectRegistry.register() |
| 99 | +``` |
0 commit comments