@@ -90,13 +90,13 @@ necessary.
90
90
The fat jar is near 5MB, so the size should be not a problem.
91
91
92
92
As you probably know, Spark is base in Scala. Different Spark distributions are using different Scala versions.
93
- This is the Spark/Scala version combination available for latest release v1.0.10 :
93
+ This is the Spark/Scala version combination available for latest release v1.0.11 :
94
94
95
95
| Spark Branch | Scala | Packages |
96
96
| :------------:| :------:| :---------|
97
- | 2.4 | 2.11 | [ ` com.acervera.osm4scala:osm4scala-spark2-shaded_2.11:1.0.10 ` ] ( https://search.maven.org/artifact/com.acervera.osm4scala/osm4scala-spark2-shaded_2.11/1.0.10 /jar )
98
- | 2.4 | 2.12 | [ ` com.acervera.osm4scala:osm4scala-spark2-shaded_2.12:1.0.10 ` ] ( https://search.maven.org/artifact/com.acervera.osm4scala/osm4scala-spark2-shaded_2.12/1.0.10 /jar )
99
- | 3.0 / 3.1 | 2.12 | [ ` com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.10 ` ] ( https://search.maven.org/artifact/com.acervera.osm4scala/osm4scala-spark3-shaded_2.12/1.0.10 /jar )
97
+ | 2.4 | 2.11 | [ ` com.acervera.osm4scala:osm4scala-spark2-shaded_2.11:1.0.11 ` ] ( https://search.maven.org/artifact/com.acervera.osm4scala/osm4scala-spark2-shaded_2.11/1.0.11 /jar )
98
+ | 2.4 | 2.12 | [ ` com.acervera.osm4scala:osm4scala-spark2-shaded_2.12:1.0.11 ` ] ( https://search.maven.org/artifact/com.acervera.osm4scala/osm4scala-spark2-shaded_2.12/1.0.11 /jar )
99
+ | 3.0 / 3.1 | 2.12 | [ ` com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.11 ` ] ( https://search.maven.org/artifact/com.acervera.osm4scala/osm4scala-spark3-shaded_2.12/1.0.11 /jar )
100
100
101
101
Although following sections are focus on Spark Shell and Notebooks, you can use the same technique in other situations where
102
102
you want to use the shaded version.
@@ -115,13 +115,13 @@ To solve the conflict, I published the library in two fashion:
115
115
1 . Start the spark shell as usual, using the ` --packages ` option to add the right dependency. The dependency will depend to
116
116
the Spark Version that you are using. Please, check the reference table in the previous section.
117
117
``` shell title="Scala"
118
- bin/spark-shell --packages ' com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.10 '
118
+ bin/spark-shell --packages ' com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.11 '
119
119
```
120
120
``` scala title="PySpark"
121
- bin/ pyspark -- packages ' com .acervera.osm4scala: osm4scala- spark3- shaded_2.12 : 1.0.10 '
121
+ bin/ pyspark -- packages ' com .acervera.osm4scala: osm4scala- spark3- shaded_2.12 : 1.0.11 '
122
122
```
123
123
``` scala title="SQL"
124
- bin/ spark- sql -- packages ' com .acervera.osm4scala: osm4scala- spark3- shaded_2.12 : 1.0.10 '
124
+ bin/ spark- sql -- packages ' com .acervera.osm4scala: osm4scala- spark3- shaded_2.12 : 1.0.11 '
125
125
```
126
126
127
127
2 . Create the Dataframe using the osm.pbf format, pointing to the pbf file or folder containing pbf files.
@@ -258,7 +258,7 @@ If you prefer an online option, you can try [MyBinder](https://mybinder.org/v2/g
258
258
of the [ All in one jar] ( #all-in-one-jar ) section. In or case, the version used is ` Spark v3.1.1 ` with ` Scala 2.12 ` .
259
259
``` jypiter
260
260
%%init_spark
261
- launcher.packages = ["com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.10 "]
261
+ launcher.packages = ["com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.11 "]
262
262
```
263
263
If you did not execute anything before, running the cell will start the Spark session. Sometime, depending to the
264
264
Notebook used, ** you will need to restart the Spark session (or Kernel session)** .
@@ -304,13 +304,13 @@ When we need to write more complex analysis, data extractions, ETLs, etc, it is
304
304
can not * easily* to import and use facilities from the Scala library. So in this case, you can jump to the next step.
305
305
306
306
``` sbt title="Sbt"
307
- libraryDependencies += " com.acervera.osm4scala" % " osm4scala-spark3-shaded_2.12" % " 1.0.10 "
307
+ libraryDependencies += " com.acervera.osm4scala" % " osm4scala-spark3-shaded_2.12" % " 1.0.11 "
308
308
```
309
309
``` xml title="Maven"
310
310
<dependency >
311
311
<groupId >com.acervera.osm4scala</groupId >
312
312
<artifactId >osm4scala-spark3-shaded_2.12</artifactId >
313
- <version >1.0.10 </version >
313
+ <version >1.0.11 </version >
314
314
</dependency >
315
315
```
316
316
:::tip Reduce artifact size.
@@ -364,20 +364,20 @@ When we need to write more complex analysis, data extractions, ETLs, etc, it is
364
364
3 . Submit the application to your Spark cluster.
365
365
``` shell title="Scala"
366
366
bin/spark-submit \
367
- --packages ' com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.10 ' \
368
- examples/spark-documentation/target/scala-2.12/osm4scala-examples-spark-documentation_2.12-1.0.10-SNAPSHOT .jar \
367
+ --packages ' com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.11 ' \
368
+ examples/spark-documentation/target/scala-2.12/osm4scala-examples-spark-documentation_2.12-1.0.11 .jar \
369
369
/tmp/osm/monaco-anonymized.osm.pbf
370
370
```
371
371
``` shell title="PySpark"
372
372
bin/spark-submit \
373
- --packages ' com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.10 ' \
373
+ --packages ' com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.11 ' \
374
374
examples/spark-documentation/src/main/scala/com/acervera/osm4scala/examples/spark/documentation/PrimiriveCounter.py \
375
375
/tmp/osm/monaco-anonymized.osm.pbf
376
376
```
377
377
378
378
:::note Optional --packages.
379
379
380
- You will not need to add ` --packages 'com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.10 ' ` if it is part of the
380
+ You will not need to add ` --packages 'com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.11 ' ` if it is part of the
381
381
deployed artifact.
382
382
383
383
:::
@@ -545,13 +545,13 @@ In that case, the best practice is to manage dependencies using `sbt` or `maven`
545
545
OSM Pbf files are based on [ Protocol Buffer] ( https://developers.google.com/protocol-buffers ) , so [ Scalapb] ( https://scalapb.github.io ) is
546
546
used as deserializer so it's the unique transitive dependency.
547
547
548
- This is the Spark/Scala version combination available for latest release v1.0.10 :
548
+ This is the Spark/Scala version combination available for latest release v1.0.11 :
549
549
550
550
| Spark branch | Scalapb | Scala | Packages |
551
551
| :------------:| :-------:| :------:| :---------|
552
- | 2.4 | 0.9.7 | 2.11 | [ ` com.acervera.osm4scala:osm4scala-spark2_2.11:1.0.10 ` ] ( https://search.maven.org/artifact/com.acervera.osm4scala/osm4scala-spark2_2.11/1.0.10 /jar )
553
- | 2.4 | 0.10.2 | 2.12 | [ ` com.acervera.osm4scala:osm4scala-spark2_2.12:1.0.10 ` ] ( https://search.maven.org/artifact/com.acervera.osm4scala/osm4scala-spark2_2.12/1.0.10 /jar )
554
- | 3.0 / 3.1 | 0.10.2 | 2.12 | [ ` com.acervera.osm4scala:osm4scala-spark3_2.12:1.0.10 ` ] ( https://search.maven.org/artifact/com.acervera.osm4scala/osm4scala-spark3_2.12/1.0.10 /jar )
552
+ | 2.4 | 0.9.7 | 2.11 | [ ` com.acervera.osm4scala:osm4scala-spark2_2.11:1.0.11 ` ] ( https://search.maven.org/artifact/com.acervera.osm4scala/osm4scala-spark2_2.11/1.0.11 /jar )
553
+ | 2.4 | 0.10.2 | 2.12 | [ ` com.acervera.osm4scala:osm4scala-spark2_2.12:1.0.11 ` ] ( https://search.maven.org/artifact/com.acervera.osm4scala/osm4scala-spark2_2.12/1.0.11 /jar )
554
+ | 3.0 / 3.1 | 0.10.2 | 2.12 | [ ` com.acervera.osm4scala:osm4scala-spark3_2.12:1.0.11 ` ] ( https://search.maven.org/artifact/com.acervera.osm4scala/osm4scala-spark3_2.12/1.0.11 /jar )
555
555
556
556
After importing the connector, you can use it as we explained in the [ All in one section] ( #all-in-one-jar ) . So lets see
557
557
how to import the library in our project and few examples.
0 commit comments