-
Notifications
You must be signed in to change notification settings - Fork 72
New CSV implementation #903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2f51ebf
to
ac8a5e5
Compare
bdb5bc2
to
cdf47a0
Compare
56934c2
to
86dc97f
Compare
|
d837c58
to
258f5b7
Compare
@@ -64,8 +67,17 @@ public data class ParserOptions( | |||
val dateTimeFormatter: DateTimeFormatter? = null, | |||
val dateTimePattern: String? = null, | |||
val nullStrings: Set<String>? = null, | |||
val skipTypes: Set<KType> = emptySet(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe skipTypes
be better be inverted to checkTypes
. But that would mean exposing all Parsers
so users/csv readers can filter which types to consider when parsing and which to skip. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really weird feature, let's live a little bit with that solution to reflect later in code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's mostly a performance thing, but the problem is that we don't have a user-facing way to refer to parsers, so we cannot say: "please parse this string column and try this type but not this type". I'm not sure how the API should look here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I managed to leverage convert
to at least avoid skipTypes = allTypesExcept(...)
. Let's see if I can remove it altogether
Released dev version to test: |
0b0776c
to
92c0515
Compare
dataframe-csv/src/test/kotlin/org/jetbrains/kotlinx/dataframe/io/DelimCsvTsvTests.kt
Outdated
Show resolved
Hide resolved
dataframe-csv/src/test/kotlin/org/jetbrains/kotlinx/dataframe/io/DelimCsvTsvTests.kt
Outdated
Show resolved
Hide resolved
dataframe-csv/src/test/kotlin/org/jetbrains/kotlinx/dataframe/io/DelimCsvTsvTests.kt
Outdated
Show resolved
Hide resolved
dataframe-csv/src/test/kotlin/org/jetbrains/kotlinx/dataframe/io/DelimCsvTsvTests.kt
Show resolved
Hide resolved
dataframe-csv/src/test/kotlin/org/jetbrains/kotlinx/dataframe/io/DelimCsvTsvTests.kt
Show resolved
Hide resolved
1114cb0
to
5bd7567
Compare
2f5c4ec
to
703338a
Compare
703338a
to
6bb3502
Compare
…pted both csv implementations to use convertTo. Addition of DataColumn<String>.convertTo overloads to allow for ParserOptions (for nullStrings etc.) Moved DataColumn<String>.convertToDouble to impl. Fixed nullstrings support for it, cleaned the parsers. Added tests for Issue #921
…e global setting `DataFrame.parser`.
# Conflicts: # core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/util/deprecationMessages.kt
@zaleslaw @koperagen My two latest commits (https://github.com/Kotlin/dataframe/compare/703338a7042c30f8629bfc9e5979c0a6560a067f..53e2f64daa1b8167c56d5c4a4fd33a66cf78bfe2) contain a lot of new changes because I figured out how
Could you rereview if you like? |
core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/impl/api/parse.kt
Outdated
Show resolved
Hide resolved
…ting the global setting `DataFrame.parser`.
… when no global locale was set.
Is it a first global setting in KDF, or we had something earlier? Should user set up it manually and if it will be replaced is it global for the whole project, using our library? In that way it's closer to the CONFIG file which is visible or Spring Bean. I see that it's inherited from external component, isn't it? I'm ok, but looks like a temporary solution, could you create a ticket with research tag for that |
In all cases, the global default can also be overridden by providing a |
See the issue for the progress: #827
TODO:Some high level overview of the changes:
:dataframe-csv
module@OptIn(ExperimentalCsv::class)
:read(Csv|Tsv|Delim)(Str)
based on Deephavenwrite(Csv|Tsv|Delim)
, andto(Csv|Tsv|Delim)Str
based on Apache commons csvParserOptions.skipTypes
parse
can now recognizeChar
ColTypes
to all the types we can parsereadCSV
function informing the user about our new experimental functions :)Fixes #787
Fixes #508
Fixes #589
Fixes #469
Fixes #921
Example of the new KDocs and all supported parameters:


