Skip to content

Commit 7de6022

Browse files
authored
Updated mentions of DataFrame to represent objects (#664)
* Update mentions of DataFrame to represent objects * Improve DataColumn.md documentation clarity
1 parent 41577df commit 7de6022

22 files changed

+74
-64
lines changed

docs/StardustDocs/topics/DataColumn.md

+9-6
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,15 @@
11
[//]: # (title: DataColumn)
22
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Create-->
33

4-
[`DataColumn`](DataColumn.md) represents a column of values. It can store objects of primitive or reference types, or other [`DataFrames`](DataFrame.md).
4+
[`DataColumn`](DataColumn.md) represents a column of values.
5+
It can store objects of primitive or reference types,
6+
or other [`DataFrame`](DataFrame.md) objects.
57

68
See [how to create columns](createColumn.md)
79

810
### Properties
9-
* `name: String` — name of the column, should be unique within containing dataframe
10-
* `path: ColumnPath` — path to the column, depends on the way column was retrieved from dataframe
11+
* `name: String` — name of the column; should be unique within containing dataframe
12+
* `path: ColumnPath` — path to the column; depends on the way column was retrieved from dataframe
1113
* `type: KType` — type of elements in the column
1214
* `hasNulls: Boolean` — flag indicating whether column contains `null` values
1315
* `values: Iterable<T>` — column data
@@ -20,17 +22,18 @@ See [how to create columns](createColumn.md)
2022

2123
Represents a sequence of values.
2224

23-
It can store values of primitive (integers, strings, decimals etc.) or reference types. Currently, it uses [`List`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-list/) as underlying data storage.
25+
It can store values of primitive (integers, strings, decimals, etc.) or reference types.
26+
Currently, it uses [`List`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/-list/) as underlying data storage.
2427

2528
#### ColumnGroup
2629

2730
Container for nested columns. Is used to create column hierarchy.
2831

2932
#### FrameColumn
3033

31-
Special case of [`ValueColumn`](#valuecolumn) that stores other [`DataFrames`](DataFrame.md) as elements.
34+
Special case of [`ValueColumn`](#valuecolumn) that stores another [`DataFrame`](DataFrame.md) objects as elements.
3235

33-
[`DataFrames`](DataFrame.md) stored in [`FrameColumn`](DataColumn.md#framecolumn) may have different schemas.
36+
[`DataFrame`](DataFrame.md) stored in [`FrameColumn`](DataColumn.md#framecolumn) may have different schemas.
3437

3538
[`FrameColumn`](DataColumn.md#framecolumn) may appear after [reading](read.md) from JSON or other hierarchical data structures, or after grouping operations such as [groupBy](groupBy.md) or [pivot](pivot.md).
3639

docs/StardustDocs/topics/DataFrame.md

+8-8
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,13 @@
22

33
[`DataFrame`](DataFrame.md) represents a list of [`DataColumn`](DataColumn.md).
44

5-
Columns in dataframe must have equal size and unique names.
5+
Columns in [`DataFrame`](DataFrame.md) must have equal size and unique names.
66

77
**Learn how to:**
8-
- [Create dataframe](createDataFrame.md)
9-
- [Read dataframe](read.md)
10-
- [Get an overview of dataframe](info.md)
11-
- [Access data in dataframe](access.md)
12-
- [Modify data in dataframe](modify.md)
13-
- [Compute statistics for dataframe](summaryStatistics.md)
14-
- [Combine several dataframes](multipleDataFrames.md)
8+
- [Create DataFrame](createDataFrame.md)
9+
- [Read DataFrame](read.md)
10+
- [Get an overview of DataFrame](info.md)
11+
- [Access data in DataFrame](access.md)
12+
- [Modify data in DataFrame](modify.md)
13+
- [Compute statistics for DataFrame](summaryStatistics.md)
14+
- [Combine several DataFrame objects](multipleDataFrames.md)

docs/StardustDocs/topics/DataRow.md

+5-5
Original file line numberDiff line numberDiff line change
@@ -11,19 +11,19 @@
1111
* `prev(): DataRow?` — previous row (`null` for the first row)
1212
* `next(): DataRow?` — next row (`null` for the last row)
1313
* `diff(T) { rowExpression }: T / diffOrNull { rowExpression }: T?` — difference between the results of a [row expression](DataRow.md#row-expressions) calculated for current and previous rows
14-
* `explode(columns): DataFrame<T>` — spread lists and [`DataFrames`](DataFrame.md) vertically into new rows
14+
* `explode(columns): DataFrame<T>` — spread lists and [`DataFrame`](DataFrame.md) objects vertically into new rows
1515
* `values(): List<Any?>` — list of all cell values from the current row
1616
* `valuesOf<T>(): List<T>` — list of values of the given type
1717
* `columnsCount(): Int` — number of columns
1818
* `columnNames(): List<String>` — list of all column names
1919
* `columnTypes(): List<KType>` — list of all column types
2020
* `namedValues(): List<NameValuePair<Any?>>` — list of name-value pairs where `name` is a column name and `value` is cell value
2121
* `namedValuesOf<T>(): List<NameValuePair<T>>` — list of name-value pairs where value has given type
22-
* `transpose(): DataFrame<NameValuePair<*>>`dataframe of two columns: `name: String` is column names and `value: Any?` is cell values
23-
* `transposeTo<T>(): DataFrame<NameValuePair<T>>`dataframe of two columns: `name: String` is column names and `value: T` is cell values
22+
* `transpose(): DataFrame<NameValuePair<*>>`[`DataFrame`](DataFrame.md) of two columns: `name: String` is column names and `value: Any?` is cell values
23+
* `transposeTo<T>(): DataFrame<NameValuePair<T>>`[`DataFrame`](DataFrame.md) of two columns: `name: String` is column names and `value: T` is cell values
2424
* `getRow(Int): DataRow` — row from [`DataFrame`](DataFrame.md) by row index
25-
* `getRows(Iterable<Int>): DataFrame`dataframe with subset of rows selected by absolute row index.
26-
* `relative(Iterable<Int>): DataFrame`dataframe with subset of rows selected by relative row index: `relative(-1..1)` will return previous, current and next row. Requested indices will be coerced to the valid range and invalid indices will be skipped
25+
* `getRows(Iterable<Int>): DataFrame`[`DataFrame`](DataFrame.md) with subset of rows selected by absolute row index.
26+
* `relative(Iterable<Int>): DataFrame`[`DataFrame`](DataFrame.md) with subset of rows selected by relative row index: `relative(-1..1)` will return previous, current and next row. Requested indices will be coerced to the valid range and invalid indices will be skipped
2727
* `getValue<T>(columnName)` — cell value of type `T` by this row and given `columnName`
2828
* `getValueOrNull<T>(columnName)` — cell value of type `T?` by this row and given `columnName` or `null` if there's no such column
2929
* `get(column): T` — cell value by this row and given `column`

docs/StardustDocs/topics/addDf.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Modify-->
44

5-
Returns [`DataFrame`](DataFrame.md) with union of columns from several given [`DataFrames`](DataFrame.md).
5+
Returns [`DataFrame`](DataFrame.md) with union of columns from several given [`DataFrame`](DataFrame.md) objects.
66

77
<!---FUN addDataFrames-->
88

docs/StardustDocs/topics/concat.md

+5-5
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Modify-->
44

5-
Returns a [`DataFrame`](DataFrame.md) with the union of rows from several given [`DataFrames`](DataFrame.md).
5+
Returns a [`DataFrame`](DataFrame.md) with the union of rows from several given [`DataFrame`](DataFrame.md) objects.
66

77
`concat` is available for:
88

@@ -91,14 +91,14 @@ frameColumn.concat()
9191

9292
<!---END-->
9393

94-
If you want to take the union of columns (not rows) from several [`DataFrames`](DataFrame.md), see [`add`](add.md).
94+
If you want to take the union of columns (not rows) from several [`DataFrame`](DataFrame.md) objects, see [`add`](add.md).
9595

9696
## Schema unification
9797

98-
If input [`DataFrames`](DataFrame.md) have different schemas, every column in the resulting [`DataFrames`](DataFrame.md)
98+
If input [`DataFrame`](DataFrame.md) objects have different schemas, every column in the resulting [`DataFrame`](DataFrame.md)
9999
will get the lowest common type of the original columns with the same name.
100100

101101
For example, if one [`DataFrame`](DataFrame.md) has a column `A: Int` and another [`DataFrame`](DataFrame.md) has a column `A: Double`,
102-
the resulting ` DataFrame ` will have a column `A: Number`.
102+
the resulting [`DataFrame`](DataFrame.md) will have a column `A: Number`.
103103

104-
Missing columns in dataframes will be filled with `null`.
104+
Missing columns in [`DataFrame`](DataFrame.md) objects will be filled with `null`.

docs/StardustDocs/topics/concatDf.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Modify-->
44

5-
Returns [`DataFrame`](DataFrame.md) with the union of rows from several given [`DataFrames`](DataFrame.md).
5+
Returns [`DataFrame`](DataFrame.md) with the union of rows from several given [`DataFrame`](DataFrame.md) objects.
66

77
<!---FUN concatDataFrames-->
88

docs/StardustDocs/topics/create.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22
<show-structure depth="3"/>
33
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Create-->
44

5-
There are several ways to create [`dataframes`](DataFrame.md) from data that is already loaded into memory:
5+
There are several ways to create [`DataFrame`](DataFrame.md) objects from data that is already loaded into memory:
66
* [create columns with data](createColumn.md) and then [bundle them](createDataFrame.md) into a [`DataFrame`](DataFrame.md)
77
* create and initialize [`DataFrame`](DataFrame.md) directly from values using `vararg` variants of the [corresponding functions](createDataFrame.md).
88
* [convert Kotlin objects](createDataFrame.md#todataframe) into [`DataFrame`](DataFrame.md)
99

10-
To learn how to read [`dataframes`](DataFrame.md) from files and URLs, go to the [next section](read.md).
10+
To learn how to read dataframes from files and URLs, go to the [next section](read.md).

docs/StardustDocs/topics/createColumn.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ val fullName by columnOf(firstName, lastName)
4242

4343
<!---END-->
4444

45-
When column elements are [`DataFrames`](DataFrame.md) it returns a [`FrameColumn`](DataColumn.md#framecolumn):
45+
When column elements are [`DataFrame`](DataFrame.md) objects it returns a [`FrameColumn`](DataColumn.md#framecolumn):
4646

4747
<!---FUN createFrameColumn-->
4848

docs/StardustDocs/topics/createDataFrame.md

+3-2
Original file line numberDiff line numberDiff line change
@@ -218,8 +218,9 @@ val df = students.toDataFrame {
218218

219219
### DynamicDataFrameBuilder
220220

221-
Previously mentioned dataframe constructors throw an exception when column names are duplicated.
222-
When implementing a custom operation involving multiple dataframes, or computed columns or when parsing some third-party data,
221+
Previously mentioned [`DataFrame`](DataFrame.md) constructors throw an exception when column names are duplicated.
222+
When implementing a custom operation involving multiple [`DataFrame`](DataFrame.md) objects,
223+
or computed columns or when parsing some third-party data,
223224
it might be desirable to disambiguate column names instead of throwing an exception.
224225

225226
<!---FUN duplicatedColumns-->

docs/StardustDocs/topics/explode.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ explode(dropEmpty = true) [ { columns } ]
99
```
1010

1111
**Parameters:**
12-
* `dropEmpty` — if `true`, removes rows with empty lists or dataframes. Otherwise, they will be exploded into `null`.
12+
* `dropEmpty` — if `true`, removes rows with empty lists or [`DataFrame`](DataFrame.md) objects. Otherwise, they will be exploded into `null`.
1313

1414
**Available for:**
1515
* [`DataFrame`](DataFrame.md)
+2-2
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
[//]: # (title: Explode / implode columns)
22

3-
* [`explode`](explode.md) — distributes lists of values or [`DataFrames`](DataFrame.md) in given columns vertically, replicating data in other columns
4-
* [`implode`](implode.md) — collects column values in given columns into lists or [`DataFrames`](DataFrame.md), grouping by other columns
3+
* [`explode`](explode.md) — distributes lists of values or [`DataFrame`](DataFrame.md) object in given columns vertically, replicating data in other columns
4+
* [`implode`](implode.md) — collects column values in given columns into lists or [`DataFrame`](DataFrame.md) objects, grouping by other columns

docs/StardustDocs/topics/extensionPropertiesApi.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ In notebooks, extension properties are generated for [`DataSchema`](schemas.md)
3232
instance after REPL line execution.
3333
After that [`DataFrame`](DataFrame.md) variable is typed with its own [`DataSchema`](schemas.md), so only valid extension properties corresponding to actual columns in DataFrame will be allowed by the compiler and suggested by completion.
3434

35-
Extension properties can be generated in IntelliJ IDEA using the [Kotlin Dataframe Gradle plugin](schemasGradle.md#configuration).
35+
Extension properties can be generated in IntelliJ IDEA using the [Kotlin DataFrame Gradle plugin](schemasGradle.md#configuration).
3636

3737
<warning>
3838
In notebooks generated properties won't appear and be updated until the cell has been executed. It often means that you have to introduce new variable frequently to sync extension properties with actual schema
+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
[//]: # (title: GroupBy / concat rows)
22

33
* [`groupBy`](groupBy.md) — groups rows of [`DataFrame`](DataFrame.md) by given key columns.
4-
* [`concat`](concat.md) — concatenates rows from several [`DataFrames`](DataFrame.md) into single [`DataFrame`](DataFrame.md).
4+
* [`concat`](concat.md) — concatenates rows from several [`DataFrame`](DataFrame.md) objects into single [`DataFrame`](DataFrame.md).

docs/StardustDocs/topics/join.md

+5-5
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Join-->
44

5-
Joins two [`DataFrames`](DataFrame.md) by join columns.
5+
Joins two [`DataFrame`](DataFrame.md) object by join columns.
66

77
```kotlin
88
join(otherDf, type = JoinType.Inner) [ { joinColumns } ]
@@ -79,7 +79,7 @@ df.join(other, "name", "city")
7979
<dataFrame src="org.jetbrains.kotlinx.dataframe.samples.api.Join.join.html"/>
8080
<!---END-->
8181

82-
If `joinColumns` is not specified, columns with the same name from both [`DataFrames`](DataFrame.md) will be used as join columns:
82+
If `joinColumns` is not specified, columns with the same name from both [`DataFrame`](DataFrame.md) objects will be used as join columns:
8383

8484
<!---FUN joinDefault-->
8585

@@ -93,12 +93,12 @@ df.join(other)
9393
### Join types
9494

9595
Supported join types:
96-
* `Inner` (default) — only matched rows from left and right [`DataFrames`](DataFrame.md)
96+
* `Inner` (default) — only matched rows from left and right [`DataFrame`](DataFrame.md) objects
9797
* `Filter` — only matched rows from left [`DataFrame`](DataFrame.md)
9898
* `Left` — all rows from left [`DataFrame`](DataFrame.md), mismatches from right [`DataFrame`](DataFrame.md) filled with `null`
9999
* `Right` — all rows from right [`DataFrame`](DataFrame.md), mismatches from left [`DataFrame`](DataFrame.md) filled with `null`
100-
* `Full` — all rows from left and right [`DataFrames`](DataFrame.md), any mismatches filled with `null`
101-
* `Exclude` — only mismatched rows from left
100+
* `Full` — all rows from left and right [`DataFrame`](DataFrame.md) objects, any mismatches filled with `null`
101+
* `Exclude` — only mismatched rows from left [`DataFrame`](DataFrame.md)
102102

103103
For every join type there is a shortcut operation:
104104

docs/StardustDocs/topics/joinWith.md

+8-6
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.JoinWith-->
44

5-
Joins two [`DataFrames`](DataFrame.md) by a join expression.
5+
Joins two [`DataFrame`](DataFrame.md) objects by a join expression.
66

77
```kotlin
88
joinWith(otherDf, type = JoinType.Inner) { joinExpression }
@@ -29,11 +29,11 @@ For example, you can match rows based on:
2929
### Join types with examples
3030

3131
Supported join types:
32-
* `Inner` (default) — only matched rows from left and right [`DataFrames`](DataFrame.md)
32+
* `Inner` (default) — only matched rows from left and right [`DataFrame`](DataFrame.md) objects
3333
* `Filter` — only matched rows from left [`DataFrame`](DataFrame.md)
3434
* `Left` — all rows from left [`DataFrame`](DataFrame.md), mismatches from right [`DataFrame`](DataFrame.md) filled with `null`
3535
* `Right` — all rows from right [`DataFrame`](DataFrame.md), mismatches from left [`DataFrame`](DataFrame.md) filled with `null`
36-
* `Full` — all rows from left and right [`DataFrames`](DataFrame.md), any mismatches filled with `null`
36+
* `Full` — all rows from left and right [`DataFrame`](DataFrame.md) objects, any mismatches filled with `null`
3737
* `Exclude` — only mismatched rows from left
3838

3939
For every join type there is a shortcut operation:
@@ -272,7 +272,7 @@ campaigns.excludeJoinWith(visits) {
272272

273273
#### Cross join
274274

275-
Can also be called cross product of two dataframes
275+
It can also be called cross product of two [`DataFrame`](DataFrame.md) objects.
276276

277277
<!---FUN crossProduct-->
278278

@@ -308,8 +308,10 @@ df1.innerJoinWith(df2) { it["index"] == right["index"] && it["age"] == right["ag
308308
<dataFrame src="org.jetbrains.kotlinx.dataframe.samples.api.JoinWith.compareInnerValues.html"/>
309309
<!---END-->
310310

311-
Here columns from both dataframes are presented as is. So [join](join.md) is better suited for `equals` relation, and joinWith is for everything else.
312-
Below are two more examples with join types that allow mismatches. Note the difference in `null` values
311+
Here columns from both [`DataFrame`](DataFrame.md) objects are presented as is.
312+
So [join](join.md) is better suited for `equals` relation, and joinWith is for everything else.
313+
Below are two more examples with join types that allow mismatches.
314+
Note the difference in `null` values
313315

314316
<!---FUN compareLeft-->
315317

docs/StardustDocs/topics/modify.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -42,11 +42,11 @@ as [`DataFrame`](DataFrame.md) can be interpreted as a [`Collection`](https://ko
4242

4343
**Vertical (row) operations:**
4444
* [append](append.md) — add rows
45-
* [concat](concat.md) — union rows from several [`DataFrames`](DataFrame.md)
45+
* [concat](concat.md) — union rows from several [`DataFrame`](DataFrame.md) objects
4646
* [distinct](distinct.md) / [distinctBy](distinct.md#distinctby) — remove duplicated rows
4747
* [drop](drop.md) / [dropLast](sliceRows.md#droplast) / [dropWhile](sliceRows.md#dropwhile) / [dropNulls](drop.md#dropnulls) / [dropNA](drop.md#dropna) — remove rows by condition
4848
* [duplicate](duplicate.md) — duplicate rows
49-
* [explode](explode.md) — spread lists and [`DataFrames`](DataFrame.md) vertically into new rows
49+
* [explode](explode.md) — spread lists and [`DataFrame`](DataFrame.md) objects vertically into new rows
5050
* [filter](filter.md) / [filterBy](filter.md#filterby) — filter rows
5151
* [implode](implode.md) — merge column values into lists grouping by other columns
5252
* [reverse](reverse.md) — reverse rows
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
[//]: # (title: Multiple DataFrames)
22
<show-structure depth="3"/>
33

4-
* [`add`](add.md) — union of columns from several [`DataFrames`](DataFrame.md)
5-
* [`concat`](concat.md) — union of rows from several [`DataFrames`](DataFrame.md)
6-
* [`join`](join.md) — sql-like join of two [`DataFrames`](DataFrame.md) by key columns
7-
* [`joinWith`](joinWith.md) — join of two [`DataFrames`](DataFrame.md) by an expression that evaluates joined [DataRows](DataRow.md) to Boolean
4+
* [`add`](add.md) — union of columns from several [`DataFrame`](DataFrame.md) objects
5+
* [`concat`](concat.md) — union of rows from several [`DataFrame`](DataFrame.md) objects
6+
* [`join`](join.md) — sql-like join of two [`DataFrame`](DataFrame.md) objects by key columns
7+
* [`joinWith`](joinWith.md) — join of two [`DataFrame`](DataFrame.md) objects by an expression that evaluates joined [DataRows](DataRow.md) to Boolean

0 commit comments

Comments
 (0)