Skip to content

Commit b170122

Browse files
committed
Update docs about supported formats
1 parent 05af7e8 commit b170122

File tree

6 files changed

+52
-6
lines changed

6 files changed

+52
-6
lines changed

docs/StardustDocs/topics/io.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
[//]: # (title: Input/output)
22

33
When you work with data, you have to [read](read.md) it from disk or from remote URLs and [write](write.md) it on disk.
4-
This section describes how to do it. For now, only CSV, TSV, JSON, XLS and XLSX formats are supported.
4+
This section describes how to do it. For now, CSV, TSV, JSON, XLS, XLSX, Apache Arrow formats are supported.

docs/StardustDocs/topics/read.md

+31-4
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
[//]: # (title: Read)
22
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Read-->
33

4-
`DataFrame` supports CSV, TSV, JSON, XLS and XLSX input formats.
4+
`DataFrame` supports CSV, TSV, JSON, XLS and XLSX, Apache Arrow input formats.
55

66
`read` method automatically detects input format based on file extension and content
77

@@ -136,17 +136,24 @@ D: Boolean?
136136

137137
Column A has `String` type because all values are string literals, no implicit conversion is performed. Column C has `Number` type because it's the least common type for `Int` and `Double`.
138138

139-
### Reading spreadsheets
139+
### Reading Excel
140140

141-
Right now DataFrame only supports reading Excel formats: xls, xlsx.
141+
Add dependency:
142+
143+
```kotlin
144+
implementation("org.jetbrains.kotlinx:dataframe-excel:$dataframe_version")
145+
```
146+
147+
Right now DataFrame supports reading Excel spreadsheet formats: xls, xlsx.
142148

143149
You can read from file or URL.
144150

145151
Cells representing dates will be read as `kotlinx.datetime.LocalDateTime`.
146152
Cells with number values, including whole numbers such as "100", or calculated formulas will be read as `Double`
147153

148154
Sometimes cells can have wrong format in Excel file, for example you expect to read column of String:
149-
```
155+
156+
```text
150157
IDS
151158
100 <-- Intended to be String, but has wrong cell format in original .xlsx file
152159
A100
@@ -173,3 +180,23 @@ df1["IDS"].type() shouldBe typeOf<String>()
173180
```
174181

175182
<!---END-->
183+
184+
### Reading Apache Arrow formats
185+
186+
Add dependency:
187+
188+
```kotlin
189+
implementation("org.jetbrains.kotlinx:dataframe-arrow:$dataframe_version")
190+
```
191+
192+
Dataframe supports reading from [Arrow interprocess streaming format](https://arrow.apache.org/docs/java/ipc.html#writing-and-reading-streaming-format) and [Arrow random access format](https://arrow.apache.org/docs/java/ipc.html#writing-and-reading-random-access-files)
193+
194+
<!---FUN readArrowFeather-->
195+
196+
```kotlin
197+
val df = DataFrame.readArrowFeather(file)
198+
```
199+
200+
<!---END-->
201+
202+

docs/StardustDocs/topics/write.md

+7-1
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,13 @@ val jsonStr = df.toJson(prettyPrint = true)
5151

5252
### Writing spreadsheets
5353

54-
You can write your dataframe in XLS, XLSX format to a file or `OutputStream`
54+
Add dependency:
55+
56+
```kotlin
57+
implementation("org.jetbrains.kotlinx:dataframe-excel:$dataframe_version")
58+
```
59+
60+
You can write your dataframe in XLS, XLSX format to a file, `OutputStream` or Workbook object.
5561

5662
<!---FUN writeXls-->
5763

tests/build.gradle.kts

+1
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ repositories {
1616
dependencies {
1717
implementation(project(":"))
1818
implementation(project(":dataframe-excel"))
19+
implementation(project(":dataframe-arrow"))
1920
testImplementation(libs.junit)
2021
testImplementation(libs.kotestAssertions) {
2122
exclude("org.jetbrains.kotlin", "kotlin-stdlib-jdk8")

tests/src/test/kotlin/org/jetbrains/kotlinx/dataframe/samples/api/Read.kt

+12
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,10 @@ import org.jetbrains.kotlinx.dataframe.api.columnTypes
1010
import org.jetbrains.kotlinx.dataframe.api.convert
1111
import org.jetbrains.kotlinx.dataframe.api.dataFrameOf
1212
import org.jetbrains.kotlinx.dataframe.api.with
13+
import org.jetbrains.kotlinx.dataframe.io.readArrowFeather
1314
import org.jetbrains.kotlinx.dataframe.io.readCSV
1415
import org.jetbrains.kotlinx.dataframe.io.readJson
16+
import org.jetbrains.kotlinx.dataframe.testArrowFeather
1517
import org.jetbrains.kotlinx.dataframe.testCsv
1618
import org.jetbrains.kotlinx.dataframe.testJson
1719
import org.junit.Test
@@ -72,4 +74,14 @@ class Read : TestBase() {
7274
df1["IDS"].type() shouldBe typeOf<String>()
7375
// SampleEnd
7476
}
77+
78+
@Test
79+
fun readArrowFeather() {
80+
val file = testArrowFeather("data-arrow_2.0.0_uncompressed")
81+
// SampleStart
82+
val df = DataFrame.readArrowFeather(file)
83+
// SampleEnd
84+
df.rowsCount() shouldBe 1
85+
df.columnsCount() shouldBe 4
86+
}
7587
}
Binary file not shown.

0 commit comments

Comments
 (0)