Skip to content

Commit 175815d

Browse files
authored
docs: ✏️updates readme
1 parent 034a0a8 commit 175815d

File tree

1 file changed

+89
-59
lines changed

1 file changed

+89
-59
lines changed

Diff for: README.md

+89-59
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,58 @@
1-
# Kotlin for Apache Spark
2-
3-
4-
Your next API to work with [Apache Spark](https://spark.apache.org/).
5-
6-
We are looking to have this as a part of https://github.com/apache/spark repository. Consider this beta-quality software.
7-
8-
Nice-looking rendered [README](https://jetbrains.github.io/kotlin-spark-api/)
9-
10-
## Goal
11-
12-
This project adds a missing layer of compatibility between [Kotlin](https://kotlinlang.org/) and [Spark](https://spark.apache.org/).
13-
14-
Despite Kotlin having first-class compatibility API, Kotlin developers may want to use familiar features like data classes and lambda expressions as simple expressions in curly braces or method references.
15-
16-
## Non-goals
17-
18-
There is no goal to replace any currently supported language or provide other APIs with some functionality to support Kotlin language.
19-
20-
## Installation
21-
22-
Currently, there are no kotlin-spark-api artifacts in maven central, but you can obtain a copy using JitPack here: [![](https://jitpack.io/v/JetBrains/kotlin-spark-api.svg)](https://jitpack.io/#JetBrains/kotlin-spark-api)
23-
24-
There is support for `Maven`, `Gradle`, `SBT`, and `leinengen` on JitPack.
25-
26-
This project does not force you to use any specific version of Spark, but it has only been tested it with spark `3.0.0`.
27-
28-
So if you're using Maven you'll have to add the following into your `pom.xml`:
1+
# Kotlin for Apache® Spark™
2+
3+
Your next API to work with [Apache Spark](https://spark.apache.org/).
4+
5+
This project adds a missing layer of compatibility between [Kotlin](https://kotlinlang.org/) and [Apache Spark](https://spark.apache.org/).
6+
It allows Kotlin developers to use familiar language features such as data classes, and lambda expressions as simple expressions in curly braces or method references.
7+
8+
We have opened a Spark Project Improvement Proposal: [Kotlin support for Apache Spark](http://issues.apache.org/jira/browse/SPARK-32530#) to work with the community towards getting Kotlin support as a first-class citizen in Apache Spark. We encourage you to voice your opinions and participate in the discussion.
9+
10+
## Table of Contents
11+
12+
- [Supported versions of Apache Spark](#supported-apache-spark)
13+
- [Releases](#releases)
14+
- [How to configure Kotlin for Apache Spark in your project](#how-to-configure-kotlin-for-apache-spark-in-your-project)
15+
- [Kotlin for Apache Spark features](#kotlin-for-apache-spark-features)
16+
- [Creating a SparkSession in Kotlin](#creating-a-sparksession-in-kotlin)
17+
- [Creating a Dataset in Kotlin](#creating-a-dataset-in-kotlin)
18+
- [Null safety](#null-safety)
19+
- [withSpark function](#withspark-function)
20+
- [withCached function](#withcached-function)
21+
- [toList and toArray](#tolist-and-toarray-methods)
22+
- [Examples](#examples)
23+
- [Reporting issues/Support](#reporting-issuessupport)
24+
- [Code of Conduct](#code-of-conduct)
25+
- [License](#license)
26+
27+
## Supported versions of Apache Spark
28+
29+
<table>
30+
<thead>
31+
<tr>
32+
<th>Apache Spark</th>
33+
<th>Kotlin for Apache Spark</th>
34+
</tr>
35+
</thead>
36+
<tbody align="center">
37+
<tr>
38+
<td>3.0.0</td>
39+
<td>0.3 +</td>
40+
</tr>
41+
</tbody>
42+
</table>
43+
44+
## Releases
45+
46+
The list of Kotlin for Apache Spark releases is available [here](https://github.com/JetBrains/kotlin-spark-api/releases/).
47+
The `kotlin-spark-api` artifact can be obtained from [JitPack](https://jitpack.io/#JetBrains/kotlin-spark-api).
48+
49+
[![](https://jitpack.io/v/JetBrains/kotlin-spark-api.svg)](https://jitpack.io/#JetBrains/kotlin-spark-api)
50+
51+
## How to configure Kotlin for Apache Spark in your project
52+
53+
You can add Kotlin for Apache Spark as a dependency to your project: `Maven`, `Gradle`, `SBT`, and `leinengen` are supported.
54+
55+
Here's an example `pom.xml`:
2956

3057
```xml
3158
<repositories>
@@ -46,19 +73,17 @@ So if you're using Maven you'll have to add the following into your `pom.xml`:
4673
</dependency>
4774
```
4875

49-
Note that `core` is being compiled against Scala version `2.12` and it means you have to use `2.12` build of spark if you want to try out this project.
50-
You can find a complete example with `pom.xml` and `build.gradle` in the [Quick Start Guide](docs/quick-start-guide.md).
51-
52-
## Usage
53-
54-
First (and hopefully last) thing you need to do is to add following import to your Kotlin file:
76+
Note that `core` is being compiled against Scala version `2.12`.
77+
You can find a complete example with `pom.xml` and `build.gradle` in the [Quick Start Guide](docs/quick-start-guide.md).
5578

79+
Once you have configured the dependency, you only need to add the following import to your Kotlin file:
5680
```kotlin
5781
import org.jetbrains.spark.api.*
58-
```
82+
```
5983

60-
Then you can create a SparkSession:
84+
## Kotlin for Apache Spark features
6185

86+
### Creating a SparkSession in Kotlin
6287
```kotlin
6388
val spark = SparkSession
6489
.builder()
@@ -67,22 +92,19 @@ val spark = SparkSession
6792

6893
```
6994

70-
To create a Dataset you can call `toDS` method:
71-
95+
### Creating a Dataset in Kotlin
7296
```kotlin
7397
spark.toDS("a" to 1, "b" to 2)
7498
```
99+
The example above produces `Dataset<Pair<String, Int>>`.
100+
101+
### Null safety
102+
There are several aliases in API, like `leftJoin`, `rightJoin` etc. These are null-safe by design.
103+
For example, `leftJoin` is aware of nullability and returns `Dataset<Pair<LEFT, RIGHT?>>`.
104+
Note that we are forcing `RIGHT` to be nullable for you as a developer to be able to handle this situation.
105+
`NullPointerException`s are hard to debug in Spark, and we doing our best to make them as rare as possible.
75106

76-
Indeed, this produces `Dataset<Pair<String, Int>>`. There are a couple more `toDS` methods which accept different arguments.
77-
78-
Also, there are several aliases in API, like `leftJoin`, `rightJoin` etc. These are null-safe by design. For example, `leftJoin` is aware of nullability and returns `Dataset<Pair<LEFT, RIGHT?>>`.
79-
Note that we are forcing `RIGHT` to be nullable for you as a developer to be able to handle this situation.
80-
81-
We know that `NullPointerException`s are hard to debug in Spark, and we are trying hard to make them as rare as possible.
82-
83-
## Useful helper methods
84-
85-
### `withSpark`
107+
### withSpark function
86108

87109
We provide you with useful function `withSpark`, which accepts everything that may be needed to run Spark — properties, name, master location and so on. It also accepts a block of code to execute inside Spark context.
88110

@@ -98,14 +120,13 @@ withSpark {
98120

99121
`dsOf` is just one more way to create `Dataset` (`Dataset<Int>`) from varargs.
100122

101-
### `withCached`
102-
123+
### withCached function
103124
It can easily happen that we need to fork our computation to several paths. To compute things only once we should call `cache`
104-
method. But there it is hard to control when we're using cached `Dataset` and when not.
105-
It is also easy to forget to unpersist cached data, which can break things unexpectably or take more memory
125+
method. However, it becomes difficult to control when we're using cached `Dataset` and when not.
126+
It is also easy to forget to unpersist cached data, which can break things unexpectedly or take up more memory
106127
than intended.
107128

108-
To solve these problems we introduce `withCached` function
129+
To solve these problems we've added `withCached` function
109130

110131
```kotlin
111132
withSpark {
@@ -121,19 +142,28 @@ withSpark {
121142
}
122143
```
123144

124-
Here we're showing cached `Dataset` for debugging purposes then filtering it. The `filter` method returns filtered `Dataset` and then the cached `Dataset` is being unpersisted, so we have more memory to call the `map` method and collect the resulting `Dataset`.
145+
Here we're showing cached `Dataset` for debugging purposes then filtering it.
146+
The `filter` method returns filtered `Dataset` and then the cached `Dataset` is being unpersisted, so we have more memory t
147+
o call the `map` method and collect the resulting `Dataset`.
125148

126-
### `toList` and `toArray`
149+
### toList and toArray methods
127150

128-
Kotlin uses `to` method on sequences to convert them to collections, so we have `toList` and `toArray` methods in our API for your code to look idiomatic. Usual `collect` method works too, but result should be casted to `Array` because `collect` returns Scala's array, which is not the same as Java/Kotlin one.
151+
For more idiomatic Kotlin code we've added `toList` and `toArray` methods in this API. You can still use the `collect` method as in Scala API, however the result should be casted to `Array`.
152+
This is because `collect` returns a Scala array, which is not the same as Java/Kotlin one.
129153

130154
## Examples
131155

132156
For more, check out [examples](https://github.com/JetBrains/kotlin-spark-api/tree/master/examples/src/main/kotlin/org/jetbrains/spark/api/examples) module.
133157
To get up and running quickly, check out this [tutorial](docs/quick-start-guide.md).
134158

135-
## Issues and feedback
159+
## Reporting issues/Support
160+
Please use [GitHub issues](https://github.com/JetBrains/kotlin-spark-api/issues) for filing feature requests and bug reports.
161+
You are also welcome to join [kotlin-spark channel](https://kotlinlang.slack.com/archives/C015B9ZRGJF) in the Kotlin Slack.
162+
163+
## Code of Conduct
164+
This project and the corresponding community is governed by the [JetBrains Open Source and Community Code of Conduct](https://confluence.jetbrains.com/display/ALL/JetBrains+Open+Source+and+Community+Code+of+Conduct). Please make sure you read it.
165+
166+
## License
167+
Kotlin for Apache Spark is licensed under the [Apache 2.0 License](LICENSE).
136168

137-
Issues and any feedback are very welcome in `Issues` here.
138169

139-
If you find that we missed some important features — let us know!

0 commit comments

Comments
 (0)