Skip to content

Jupyter: Data schema generation bug with mixed nullability of similarly named column #1222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Jolanrensen opened this issue May 27, 2025 · 0 comments · May be fixed by #1252 or #1223
Open

Jupyter: Data schema generation bug with mixed nullability of similarly named column #1222

Jolanrensen opened this issue May 27, 2025 · 0 comments · May be fixed by #1252 or #1223
Assignees
Labels
bug Something isn't working
Milestone

Comments

@Jolanrensen
Copy link
Collaborator

Jolanrensen commented May 27, 2025

Can be reproduced on 1.0.0-Beta2:

val df1 = dataFrameOf(
    "group" to columnOf(
        "a" to columnOf(1, null, 3),
    )
)
val df2 = dataFrameOf(
    "group" to columnOf(
        "a" to columnOf(1, 2, 3),
    )
)

produces:

// for df1

@DataSchema(isOpen = false)
interface _DataFrameType1 {
    val a: Int?
}

@DataSchema
interface _DataFrameType {
    val group: _DataFrameType1
}

// for df2

@DataSchema(isOpen = false)
interface _DataFrameType3 {
    val a: Int
}

@DataSchema
interface _DataFrameType2 : _DataFrameType {
    override val group: _DataFrameType3 // Type of 'group' is not a subtype of overridden property 'val group: _DataFrameType1' defined in '_DataFrameType'
}

What I suspect was meant to be generated is something like this:

// for df1

@DataSchema(isOpen = true)
interface _DataFrameType1 {
    val a: Int?
}

@DataSchema
interface _DataFrameType {
    val group: _DataFrameType1
}

// for df2

@DataSchema(isOpen = true)
interface _DataFrameType3 : _DataFrameType1 { // now the non-nullable variant extends the nullable variant
    override val a: Int // requires override
}

@DataSchema
interface _DataFrameType2 : _DataFrameType {
    override val group: _DataFrameType3
}

or when disconnected:

// for df1

@DataSchema(isOpen = false)
interface _DataFrameType1 {
    val a: Int?
}

@DataSchema
interface _DataFrameType {
    val group: _DataFrameType1
}

// for df2

@DataSchema(isOpen = false)
interface _DataFrameType3 {
    val a: Int
}

@DataSchema
interface _DataFrameType2 {
    val group: _DataFrameType3
}
@Jolanrensen Jolanrensen added the bug Something isn't working label May 27, 2025
@Jolanrensen Jolanrensen added this to the 1.0.0-Beta3 milestone May 27, 2025
Jolanrensen added a commit that referenced this issue Jun 13, 2025
Jolanrensen added a commit that referenced this issue Jun 13, 2025
…king. Changed behavior for nested schema comparison when strictlyEqualNestedSchemas == true
@Jolanrensen Jolanrensen linked a pull request Jun 13, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants