diff --git a/docs/using_the_dialect.md b/docs/using_the_dialect.md index deeefd4..09e1afd 100644 --- a/docs/using_the_dialect.md +++ b/docs/using_the_dialect.md @@ -2,31 +2,98 @@ This section provides instructions on how to configure Apache Spark to use the Spark Dialect Extension, enabling custom handling of JDBC data types. -### Configuration Steps +### Add the JAR to Spark -To integrate the Spark Dialect Extension into your Spark application, you need to add the compiled JAR file to the Spark classpath. The extension, via ``DialectRegistry`` classes, will dynamically detect the Spark version and load the corresponding dialect. +#### Using release version -#### Add the JAR to Spark +##### Using SparkConf -1. **Locate the Compiled JAR**: Ensure you have built the project and locate the `.jar`: `/path/to/spark-dialect-extension_2.12-0.1.jar` directory. +For PySpark: -2. **Configure Spark**: Add the JAR to your Spark job's classpath by modifying the `spark.jars` configuration parameter. This can be done in several ways depending on how you are running your Spark application: +```python +from pyspark.sql import SparkSession -- **Spark Submit Command**: - ```bash - spark-submit --jars /path/to/spark-dialect-extension_2.12-0.1.jar --class YourMainClass your-application.jar - ``` +spark = ( + SparkSession.builder + .appName("My Spark App") + .config("spark.jars.packages", "io.github.mtsongithub.doetl:spark-dialect-extension_2.12:0.0.1") + .getOrCreate() +) +``` + +For Spark on Scala: + +```scala +import org.apache.spark.sql.SparkSession + +val spark = SparkSession.builder() +.appName("My Spark App") +.config("spark.jars.packages", "io.github.mtsongithub.doetl:spark-dialect-extension_2.12:0.0.1") +.getOrCreate() +``` + +##### Using Spark Submit + +```bash +spark-submit --conf spark.jars.packages=io.github.mtsongithub.doetl:spark-dialect-extension_2.12:0.0.1 +``` + +#### Compile from source + +##### Build .jar file + +See [CONTRIBUTING.md](../CONTRIBUTING.md) for build instructions. + +After build you'll have a file `/path/to/cloned-repo/target/scala_2.12/spark-dialect-extension_2.12-0.0.1.jar` + +##### Using SparkConf + +For PySpark: -- **Programmatically** (within your Spark application): - ```scala - import io.github.mtsongithub.doetl.sparkdialectextensions.clickhouse.ClickhouseDialectRegistry - import org.apache.spark.sql.SparkSession - - val spark = SparkSession.builder() +```python +from pyspark.sql import SparkSession + +spark = ( + SparkSession.builder .appName("My Spark App") - .config("spark.jars", "/path/to/spark-dialect-extension_2.12-0.1.jar") + .config("spark.jars", "/path/to/cloned-repo/target/scala_2.12/spark-dialect-extension_2.12-0.0.1.jar") .getOrCreate() - - // Register custom Clickhouse dialect based on Spark version - ClickhouseDialectRegistry.register() - ``` +) +``` + +For Spark on Scala: + +```scala +import org.apache.spark.sql.SparkSession + +val spark = SparkSession.builder() +.appName("My Spark App") +.config("spark.jars", "/path/to/cloned-repo/target/scala_2.12/spark-dialect-extension_2.12-0.0.1.jar") +.getOrCreate() +``` + +##### Using Spark Submit + +```bash +spark-submit --jars /path/to/cloned-repo/target/scala_2.12/spark-dialect-extension_2.12-0.0.1.jar +``` + +### Register a dialect + +To integrate the Spark Dialect Extension into your Spark application, you need to use ``DialectRegistry`` classes, which dynamically detect the Spark version and register the corresponding dialect. + +For PySpark: + +```python +# Register custom Clickhouse dialect +ClickhouseDialectRegistry = spark._jvm.io.github.mtsongithub.doetl.sparkdialectextensions.clickhouse.ClickhouseDialectRegistry +ClickhouseDialectRegistry.register() +``` + +For Spark on Scala: +```scala +// Register custom Clickhouse dialect +import io.github.mtsongithub.doetl.sparkdialectextensions.clickhouse.ClickhouseDialectRegistry + +ClickhouseDialectRegistry.register() +```