Skip to content

Commit 9a7fb0f

Browse files
data schemas docs restructure
1 parent aef7afb commit 9a7fb0f

16 files changed

+324
-255
lines changed

docs/StardustDocs/d.tree

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -35,17 +35,16 @@
3535
</toc-element>
3636
<toc-element topic="extensionPropertiesApi.md"/>
3737
<toc-element topic="schemas.md">
38-
<toc-element topic="schemasGradle.md"/>
39-
<toc-element topic="schemasJupyter.md"/>
38+
<toc-element topic="DataSchemaGenerationMethods.md"/>
4039
<toc-element topic="schemasInheritance.md"/>
41-
<toc-element topic="schemasCustom.md"/>
42-
<toc-element topic="schemasExternalJupyter.md"/>
43-
<toc-element topic="schemasImportSqlGradle.md"/>
44-
<toc-element topic="schemasImportOpenApiGradle.md"/>
40+
<toc-element topic="Data-Schemas-In-Kotlin-Notebook.md"/>
4541
<toc-element topic="schemasImportOpenApiJupyter.md"/>
46-
<toc-element topic="DataSchemaGenerationGradle.md"/>
42+
<toc-element topic="Gradle-Plugin.md">
43+
<toc-element topic="schemasGradle.md"/>
44+
<toc-element topic="schemasImportSqlGradle.md"/>
45+
<toc-element topic="schemasImportOpenApiGradle.md"/>
46+
</toc-element>
4747
</toc-element>
48-
<toc-element topic="DataSchemaGenerationMethods.md"/>
4948
<toc-element topic="Compiler-Plugin.md">
5049
<toc-element topic="staticInterpretation.md"/>
5150
<toc-element topic="dataSchema.md"/>
@@ -194,4 +193,5 @@
194193
<toc-element topic="_shadow_resources.md" hidden="true"/>
195194
<toc-element topic="Support.md"/>
196195
<toc-element topic="FAQ.md"/>
196+
<toc-element topic="Migration-From-Plugins.md"/>
197197
</instance-profile>

docs/StardustDocs/topics/FAQ.md

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -174,15 +174,17 @@ and examples with beautiful [Kandy](https://kotlin.github.io/kandy) geo visualiz
174174

175175
> The current Gradle plugin is **under consideration for deprecation** and may be officially marked as deprecated
176176
> in future releases.
177-
> The KSP plugin doesn't work for now.
177+
>
178+
> The KSP plugin is **not compatible with Kotlin 2.1 or newer**.
178179
>
179180
> At the moment, **[data schema generation is handled via dedicated methods](DataSchemaGenerationMethods.md)** instead
180-
> of relying on the plugins.
181+
> of relying on the plugins.
182+
> See [](Migration-From-Plugins.md).
181183
{style="warning"}
182184

183185
All these plugins relate to working with [dataframe schemas](schemas.md), but they serve different purposes:
184186

185-
- **[Gradle Plugin](DataSchemaGenerationGradle.md)** and **[KSP Plugin](https://github.com/Kotlin/dataframe/tree/master/plugins/symbol-processor)**
187+
- **[Gradle Plugin](Gradle-Plugin.md)** and **[KSP Plugin](https://github.com/Kotlin/dataframe/tree/master/plugins/symbol-processor)**
186188
are used to **generate data schemas** from external sources as part of the Gradle build process.
187189

188190
- **Gradle Plugin**: You declare the data source in your `build.gradle.kts` file
@@ -193,12 +195,13 @@ All these plugins relate to working with [dataframe schemas](schemas.md), but th
193195

194196
See [Data Schemas in Gradle Projects](https://kotlin.github.io/dataframe/schemasgradle.html) for more.
195197

196-
- **[Compiler Plugin](Compiler-Plugin.md)**
197-
provides **on-the-fly generation** of [extension properties](extensionPropertiesApi.md)
198-
based on an existing schema **during compilation**.
199-
However, when reading data from files or external sources (like SQL),
200-
the schema cannot be inferred automatically — you need to
201-
specify it manually or use the Gradle or KSP plugin to generate it.
198+
- **[Compiler Plugin](Compiler-Plugin.md)** provides **on-the-fly generation** of
199+
[extension properties](extensionPropertiesApi.md)
200+
based on an existing schema **during compilation**, and updates the [`DataFrame`](DataFrame.md)
201+
schema seamlessly after operations.
202+
However, when reading data from files or external sources (like SQL),
203+
the initial `DataFrame` schema cannot be inferred automatically —
204+
you need to specify it manually or generate it using the [`generate..()` methods](DataSchemaGenerationMethods.md).
202205

203206
## How do I contribute or report an issue?
204207

docs/StardustDocs/topics/gettingStarted/Modules.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -517,7 +517,7 @@ The Gradle plugin allows generating [data schemas](schemas.md) from samples of d
517517
(of supported formats) like JSON, CSV, Excel files, or URLs, as well as from data fetched from SQL databases
518518
using Gradle.
519519

520-
See the [Gradle Plugin Reference](DataSchemaGenerationGradle.md) for installation
520+
See the [Gradle Plugin Reference](Gradle-Plugin.md) for installation
521521
and usage instructions in Gradle projects.
522522

523523
> By default, the Gradle plugin also applies the [KSP plugin](#ksp-plugin).

docs/StardustDocs/topics/readSqlFromCustomDatabase.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -129,7 +129,7 @@ fun createAndPopulateTable(con: Connection) {
129129

130130
**Define the Table Schema**
131131

132-
Use the `@DataSchema` annotation to define a [**custom data schema**](schemasCustom.md) for the `orders` table.
132+
Use the `@DataSchema` annotation to define a [**custom data schema**](schemas.md) for the `orders` table.
133133

134134
```kotlin
135135
@DataSchema
Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
[//]: # (title: Data Schemas in Kotlin Notebook)
2+
3+
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Schemas-->
4+
5+
After execution of a cell
6+
7+
<!---FUN createDfNullable-->
8+
9+
```kotlin
10+
val df = dataFrameOf("name", "age")(
11+
"Alice", 15,
12+
"Bob", null,
13+
)
14+
```
15+
16+
<!---END-->
17+
18+
the following actions take place:
19+
20+
1. Columns in `df` are analyzed to extract data schema
21+
2. Empty interface with [`DataSchema`](schema.md) annotation is generated:
22+
23+
```kotlin
24+
@DataSchema
25+
interface DataFrameType
26+
```
27+
28+
3. Extension properties for this [`DataSchema`](schema.md) are generated:
29+
30+
```kotlin
31+
val ColumnsContainer<DataFrameType>.age: DataColumn<Int?> @JvmName("DataFrameType_age") get() = this["age"] as DataColumn<Int?>
32+
val DataRow<DataFrameType>.age: Int? @JvmName("DataFrameType_age") get() = this["age"] as Int?
33+
val ColumnsContainer<DataFrameType>.name: DataColumn<String> @JvmName("DataFrameType_name") get() = this["name"] as DataColumn<String>
34+
val DataRow<DataFrameType>.name: String @JvmName("DataFrameType_name") get() = this["name"] as String
35+
```
36+
37+
Every column produces two extension properties:
38+
39+
* Property for `ColumnsContainer<DataFrameType>` returns column
40+
* Property for `DataRow<DataFrameType>` returns cell value
41+
42+
4. `df` variable is typed by schema interface:
43+
44+
```kotlin
45+
val temp = df
46+
```
47+
48+
```kotlin
49+
val df = temp.cast<DataFrameType>()
50+
```
51+
52+
> _Note, that object instance after casting remains the same. See [cast](cast.md).
53+
54+
To log all these additional code executions, use cell magic
55+
56+
```
57+
%trackExecution -all
58+
```
59+
60+
## Custom Data Schemas
61+
62+
You can define your own [`DataSchema`](schema.md) interfaces and use them in functions and classes to represent [`DataFrame`](DataFrame.md) with
63+
a specific set of columns:
64+
65+
```kotlin
66+
@DataSchema
67+
interface Person {
68+
val name: String
69+
val age: Int
70+
}
71+
```
72+
73+
After execution of this cell in notebook or annotation processing in IDEA, extension properties for data access will be
74+
generated. Now we can use these properties to create functions for typed [`DataFrame`](DataFrame.md):
75+
76+
```kotlin
77+
fun DataFrame<Person>.splitName() = split { name }.by(",").into("firstName", "lastName")
78+
fun DataFrame<Person>.adults() = filter { age > 18 }
79+
```
80+
81+
In Kotlin Notebook these functions will work automatically for any [`DataFrame`](DataFrame.md) that matches `Person` schema:
82+
83+
<!---FUN extendedDf-->
84+
85+
```kotlin
86+
val df = dataFrameOf("name", "age", "weight")(
87+
"Merton, Alice", 15, 60.0,
88+
"Marley, Bob", 20, 73.5,
89+
)
90+
```
91+
92+
<!---END-->
93+
94+
Schema of `df` is compatible with `Person`, so auto-generated schema interface will inherit from it:
95+
96+
```kotlin
97+
@DataSchema(isOpen = false)
98+
interface DataFrameType : Person
99+
100+
val ColumnsContainer<DataFrameType>.weight: DataColumn<Double> get() = this["weight"] as DataColumn<Double>
101+
val DataRow<DataFrameType>.weight: Double get() = this["weight"] as Double
102+
```
103+
104+
Despite `df` has additional column `weight`, previously defined functions for `DataFrame<Person>` will work for it:
105+
106+
<!---FUN splitNameWorks-->
107+
108+
```kotlin
109+
df.splitName()
110+
```
111+
112+
<!---END-->
113+
114+
```text
115+
firstName lastName age weight
116+
Merton Alice 15 60.000
117+
Marley Bob 20 73.125
118+
```
119+
120+
<!---FUN adultsWorks-->
121+
122+
```kotlin
123+
df.adults()
124+
```
125+
126+
<!---END-->
127+
128+
```text
129+
name age weight
130+
Marley, Bob 20 73.5
131+
```
132+
133+
## Use external Data Schemas
134+
135+
Sometimes it is convenient to extract reusable code from Kotlin Notebook into the Kotlin JVM library.
136+
Schema interfaces should also be extracted if this code uses [Custom Data Schemas](#custom-data-schemas).
137+
138+
In order to enable support them in Kotlin, you should register them in
139+
library [integration class](https://github.com/Kotlin/kotlin-jupyter/blob/master/docs/libraries.md) with `useSchema`
140+
function:
141+
142+
```kotlin
143+
@DataSchema
144+
interface Person {
145+
val name: String
146+
val age: Int
147+
}
148+
149+
fun DataFrame<Person>.countAdults() = count { it[Person::age] > 18 }
150+
151+
@JupyterLibrary
152+
internal class Integration : JupyterIntegration() {
153+
154+
override fun Builder.onLoaded() {
155+
onLoaded {
156+
useSchema<Person>()
157+
}
158+
}
159+
}
160+
```
161+
162+
After loading this library into the notebook, schema interfaces for all [`DataFrame`](DataFrame.md) variables that match `Person`
163+
schema will derive from `Person`
164+
165+
<!---FUN createDf-->
166+
167+
```kotlin
168+
val df = dataFrameOf("name", "age")(
169+
"Alice", 15,
170+
"Bob", 20,
171+
)
172+
```
173+
174+
<!---END-->
175+
176+
Now `df` is assignable to `DataFrame<Person>` and `countAdults` is available:
177+
178+
```kotlin
179+
df.countAdults()
180+
```

docs/StardustDocs/topics/schemas/DataSchemaGenerationMethods.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Data Schemas Generation From Existing DataFrame
1+
# Data Schemas Generation Methods
22

33
<web-summary>
44
Generate useful Kotlin definitions based on your DataFrame structure.
@@ -164,7 +164,7 @@ val customers: List<Customer> = df.cast<Customer>().toList()
164164

165165
<!---END-->
166166

167-
## generateCode
167+
## generateCode {id="generate-code"}
168168

169169
```kotlin
170170
inline fun <reified T> DataFrame<T>.generateCode(
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# Migration from Gradle/KSP Plugin
2+
3+
Gradle and KSP plugins were useful tools in earlier versions of Kotlin DataFrame.
4+
However, they are now being phased out. This section provides an overview of their current state and migration guidance.
5+
6+
## Gradle Plugin
7+
8+
> Do not confuse this with the [](Compiler-Plugin.md), which is a Kotlin compiler plugin
9+
> and has a different plugin ID.
10+
> {style="note"}
11+
12+
1. **Generation of [data schemas](schemas.md)** from data sources
13+
(files, databases, or external URLs).
14+
- You could copy already generated schemas from `build/generate` into your project sources.
15+
- To generate a `DataSchema` for a [`DataFrame`](DataFrame.md) now, use
16+
the [`generate..()` methods](DataSchemaGenerationMethods.md).
17+
18+
2. **Generation of [extension properties](extensionPropertiesApi.md)** from data schemas
19+
This is now handled by the [](Compiler-Plugin.md), which:
20+
- Generates extension properties for declared data schemas.
21+
- Automatically updates the schema and regenerates properties after structural DataFrame operations.
22+
23+
> The Gradle plugin still works and may be helpful for generating schemas from data sources.
24+
> However, it is planned for deprecation, and **we do not recommend using it going forward**.
25+
> {style="warning"}
26+
27+
If you still choose to use it, make sure to disable the automatic KSP dependency
28+
to avoid compatibility issues with Kotlin 2.1+ by adding this line to `gradle.properties`:
29+
30+
```properties
31+
kotlin.dataframe.add.ksp=false
32+
```
33+
34+
## KSP Plugin
35+
36+
> The KSP plugin is **not compatible with Kotlin 2.1 or newer**.
37+
> It is planned for deprecation or major changes, and **we do not recommend using it at this time**.
38+
> {style="warning"}
39+
40+
- **Generation of [data schemas](schemas.md)** from data sources
41+
(files, databases, or external URLs).
42+
- You could copy already generated schemas from `build/generate/ksp` into your project sources.
43+
- To generate a `DataSchema` for a [`DataFrame`](DataFrame.md) now, use the
44+
[`generate..()` methods](DataSchemaGenerationMethods.md) instead.

docs/StardustDocs/topics/schemas/DataSchemaGenerationGradle.md renamed to docs/StardustDocs/topics/schemas/gradle/Gradle-Plugin.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
[//]: # (title: Data Shemas Generation in Gradle)
1+
[//]: # (title: Gradle Plugin (deprecated))
22

33
> The current Gradle plugin is **under consideration for deprecation** and may be officially marked as deprecated in future releases.
44
>

docs/StardustDocs/topics/schemas/schemasGradle.md renamed to docs/StardustDocs/topics/schemas/gradle/schemasGradle.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -136,7 +136,7 @@ dataframes {
136136
}
137137
```
138138

139-
See [reference](DataSchemaGenerationGradle.md) and [examples](DataSchemaGenerationGradle.md#examples) for more details.
139+
See [reference](Gradle-Plugin.md) and [examples](Gradle-Plugin.md#examples) for more details.
140140

141141
</tab>
142142
</tabs>

0 commit comments

Comments
 (0)