Skip to content

New CSV implementation #903

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Nov 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
e26e51a
Parser can now parse Chars. Parser can now skipTypes, extra arg in Pa…
Jolanrensen Nov 1, 2024
21028f3
introducing new dataframe-csv module; preprocess KDocs is enabled. in…
Jolanrensen Nov 1, 2024
16c60b5
changed additionalCsv arguments to a lambda with the builder instance…
Jolanrensen Nov 5, 2024
d436086
updated fastDoubleParser to 2.0.1, removed minusSignIsFormatSymbol wo…
Jolanrensen Nov 11, 2024
7feac5a
updated deephavenCsv to 0.15.0, added hasFixedWidthColumns and fixedC…
Jolanrensen Nov 11, 2024
8c9c56f
simplified ColType.DEFAULT implementation by a lot
Jolanrensen Nov 11, 2024
6e34b73
removed @ExperimentalCsv annotation, as manually having to add a depe…
Jolanrensen Nov 12, 2024
766cad1
added Path support for new csv reader
Jolanrensen Nov 12, 2024
f751b0d
Merge branch 'master' into new-csv-implementation
Jolanrensen Nov 18, 2024
50f4f33
working on review feedback
Jolanrensen Nov 18, 2024
1cfd7cd
disabling FastDoubleParser logs except for DelimCsvTsvTests
Jolanrensen Nov 18, 2024
6bb3502
working on review feedback
Jolanrensen Nov 18, 2024
4ebb5a0
removed ParserOptions.allTypesExcept; use convertTo in this case. Ada…
Jolanrensen Nov 20, 2024
53e2f64
fixed my critical misuse of ParserOptions by completely neglecting th…
Jolanrensen Nov 22, 2024
cdef312
Merge branch 'master' into new-csv-implementation
Jolanrensen Nov 22, 2024
860a5c2
fixup! fixed my critical misuse of ParserOptions by completely neglec…
Jolanrensen Nov 22, 2024
a02b9e6
fixed arrow test by making Parsers.locale requery Locale.getDefault()…
Jolanrensen Nov 22, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/generated-sources-master.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ jobs:
run: |
git config --global user.name 'github-actions[bot]'
git config --global user.email 'github-actions[bot]@users.noreply.github.com'
git add './core/generated-sources' './docs/StardustDocs/snippets' './docs/StardustDocs/topics'
git add './core/generated-sources' './dataframe-csv/generated-sources' './docs/StardustDocs/snippets' './docs/StardustDocs/topics'
git diff --staged --quiet || git commit -m "Automated commit of generated code"
git push
env:
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/generated-sources.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,18 +38,18 @@ jobs:
git config --global user.name "GitHub Actions"

- name: Run Gradle task
run: ./gradlew :core:processKDocsMain korro
run: ./gradlew processKDocsMain korro

- name: Check for changes in generated sources
id: git-diff
run: echo "changed=$(if git diff --quiet './core/generated-sources' './docs/StardustDocs/snippets' './docs/StardustDocs/topics'; then echo 'false'; else echo 'true'; fi)" >> $GITHUB_OUTPUT
run: echo "changed=$(if git diff --quiet './core/generated-sources' './dataframe-csv/generated-sources' './docs/StardustDocs/snippets' './docs/StardustDocs/topics'; then echo 'false'; else echo 'true'; fi)" >> $GITHUB_OUTPUT

- name: Commit and push if changes
id: git-commit
if: steps.git-diff.outputs.changed == 'true'
run: |
git checkout -b generated-sources/docs-update-${{ github.run_number }}
git add './core/generated-sources' './docs/StardustDocs/snippets' './docs/StardustDocs/topics'
git add './core/generated-sources' './dataframe-csv/generated-sources' './docs/StardustDocs/snippets' './docs/StardustDocs/topics'
git commit -m "Update generated sources with recent changes"
git push origin generated-sources/docs-update-${{ github.run_number }}
echo "commit=$(git rev-parse HEAD)" >> $GITHUB_OUTPUT
Expand Down
3 changes: 3 additions & 0 deletions build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -55,12 +55,15 @@ dependencies {
api(project(":dataframe-excel"))
api(project(":dataframe-openapi"))
api(project(":dataframe-jdbc"))
// TODO enable when it leaves the experimental phase
// api(project(":dataframe-csv"))

kover(project(":core"))
kover(project(":dataframe-arrow"))
kover(project(":dataframe-excel"))
kover(project(":dataframe-openapi"))
kover(project(":dataframe-jdbc"))
kover(project(":dataframe-csv"))
kover(project(":plugins:kotlin-dataframe"))
}

Expand Down
59 changes: 45 additions & 14 deletions core/api/core.api
Original file line number Diff line number Diff line change
Expand Up @@ -3809,6 +3809,8 @@ public final class org/jetbrains/kotlinx/dataframe/api/ConvertKt {
public static final fun convert (Lorg/jetbrains/kotlinx/dataframe/DataFrame;[Lkotlin/reflect/KProperty;)Lorg/jetbrains/kotlinx/dataframe/api/Convert;
public static final fun convert (Lorg/jetbrains/kotlinx/dataframe/DataFrame;[Lorg/jetbrains/kotlinx/dataframe/columns/ColumnReference;)Lorg/jetbrains/kotlinx/dataframe/api/Convert;
public static final fun convertTo (Lorg/jetbrains/kotlinx/dataframe/DataColumn;Lkotlin/reflect/KType;)Lorg/jetbrains/kotlinx/dataframe/DataColumn;
public static final fun convertTo (Lorg/jetbrains/kotlinx/dataframe/DataColumn;Lkotlin/reflect/KType;Lorg/jetbrains/kotlinx/dataframe/api/ParserOptions;)Lorg/jetbrains/kotlinx/dataframe/DataColumn;
public static synthetic fun convertTo$default (Lorg/jetbrains/kotlinx/dataframe/DataColumn;Lkotlin/reflect/KType;Lorg/jetbrains/kotlinx/dataframe/api/ParserOptions;ILjava/lang/Object;)Lorg/jetbrains/kotlinx/dataframe/DataColumn;
public static final fun convertToBigDecimal (Lorg/jetbrains/kotlinx/dataframe/DataColumn;)Lorg/jetbrains/kotlinx/dataframe/DataColumn;
public static final fun convertToBigDecimalFromT (Lorg/jetbrains/kotlinx/dataframe/DataColumn;)Lorg/jetbrains/kotlinx/dataframe/DataColumn;
public static final fun convertToBoolean (Lorg/jetbrains/kotlinx/dataframe/DataColumn;)Lorg/jetbrains/kotlinx/dataframe/DataColumn;
Expand All @@ -3817,13 +3819,13 @@ public final class org/jetbrains/kotlinx/dataframe/api/ConvertKt {
public static final fun convertToByteFromT (Lorg/jetbrains/kotlinx/dataframe/DataColumn;)Lorg/jetbrains/kotlinx/dataframe/DataColumn;
public static final fun convertToDouble (Lorg/jetbrains/kotlinx/dataframe/DataColumn;)Lorg/jetbrains/kotlinx/dataframe/DataColumn;
public static final fun convertToDoubleFromString (Lorg/jetbrains/kotlinx/dataframe/DataColumn;Ljava/util/Locale;)Lorg/jetbrains/kotlinx/dataframe/DataColumn;
public static final fun convertToDoubleFromString (Lorg/jetbrains/kotlinx/dataframe/DataColumn;Ljava/util/Locale;Z)Lorg/jetbrains/kotlinx/dataframe/DataColumn;
public static final fun convertToDoubleFromString (Lorg/jetbrains/kotlinx/dataframe/DataColumn;Ljava/util/Locale;Ljava/util/Set;Ljava/lang/Boolean;)Lorg/jetbrains/kotlinx/dataframe/DataColumn;
public static synthetic fun convertToDoubleFromString$default (Lorg/jetbrains/kotlinx/dataframe/DataColumn;Ljava/util/Locale;ILjava/lang/Object;)Lorg/jetbrains/kotlinx/dataframe/DataColumn;
public static synthetic fun convertToDoubleFromString$default (Lorg/jetbrains/kotlinx/dataframe/DataColumn;Ljava/util/Locale;ZILjava/lang/Object;)Lorg/jetbrains/kotlinx/dataframe/DataColumn;
public static synthetic fun convertToDoubleFromString$default (Lorg/jetbrains/kotlinx/dataframe/DataColumn;Ljava/util/Locale;Ljava/util/Set;Ljava/lang/Boolean;ILjava/lang/Object;)Lorg/jetbrains/kotlinx/dataframe/DataColumn;
public static final fun convertToDoubleFromStringNullable (Lorg/jetbrains/kotlinx/dataframe/DataColumn;Ljava/util/Locale;)Lorg/jetbrains/kotlinx/dataframe/DataColumn;
public static final fun convertToDoubleFromStringNullable (Lorg/jetbrains/kotlinx/dataframe/DataColumn;Ljava/util/Locale;Z)Lorg/jetbrains/kotlinx/dataframe/DataColumn;
public static final fun convertToDoubleFromStringNullable (Lorg/jetbrains/kotlinx/dataframe/DataColumn;Ljava/util/Locale;Ljava/util/Set;Ljava/lang/Boolean;)Lorg/jetbrains/kotlinx/dataframe/DataColumn;
public static synthetic fun convertToDoubleFromStringNullable$default (Lorg/jetbrains/kotlinx/dataframe/DataColumn;Ljava/util/Locale;ILjava/lang/Object;)Lorg/jetbrains/kotlinx/dataframe/DataColumn;
public static synthetic fun convertToDoubleFromStringNullable$default (Lorg/jetbrains/kotlinx/dataframe/DataColumn;Ljava/util/Locale;ZILjava/lang/Object;)Lorg/jetbrains/kotlinx/dataframe/DataColumn;
public static synthetic fun convertToDoubleFromStringNullable$default (Lorg/jetbrains/kotlinx/dataframe/DataColumn;Ljava/util/Locale;Ljava/util/Set;Ljava/lang/Boolean;ILjava/lang/Object;)Lorg/jetbrains/kotlinx/dataframe/DataColumn;
public static final fun convertToDoubleFromT (Lorg/jetbrains/kotlinx/dataframe/DataColumn;)Lorg/jetbrains/kotlinx/dataframe/DataColumn;
public static final fun convertToFloat (Lorg/jetbrains/kotlinx/dataframe/DataColumn;)Lorg/jetbrains/kotlinx/dataframe/DataColumn;
public static final fun convertToFloatFromT (Lorg/jetbrains/kotlinx/dataframe/DataColumn;)Lorg/jetbrains/kotlinx/dataframe/DataColumn;
Expand Down Expand Up @@ -4813,9 +4815,14 @@ public final class org/jetbrains/kotlinx/dataframe/api/GenerateCodeKt {
public abstract interface class org/jetbrains/kotlinx/dataframe/api/GlobalParserOptions {
public abstract fun addDateTimePattern (Ljava/lang/String;)V
public abstract fun addNullString (Ljava/lang/String;)V
public abstract fun addSkipType (Lkotlin/reflect/KType;)V
public abstract fun getLocale ()Ljava/util/Locale;
public abstract fun getNulls ()Ljava/util/Set;
public abstract fun getSkipTypes ()Ljava/util/Set;
public abstract fun getUseFastDoubleParser ()Z
public abstract fun resetToDefault ()V
public abstract fun setLocale (Ljava/util/Locale;)V
public abstract fun setUseFastDoubleParser (Z)V
}

public abstract interface class org/jetbrains/kotlinx/dataframe/api/GroupBy : org/jetbrains/kotlinx/dataframe/api/Grouped {
Expand Down Expand Up @@ -6490,23 +6497,19 @@ public final class org/jetbrains/kotlinx/dataframe/api/ParserOptions {
public fun <init> ()V
public synthetic fun <init> (Ljava/util/Locale;Ljava/time/format/DateTimeFormatter;Ljava/lang/String;Ljava/util/Set;)V
public synthetic fun <init> (Ljava/util/Locale;Ljava/time/format/DateTimeFormatter;Ljava/lang/String;Ljava/util/Set;ILkotlin/jvm/internal/DefaultConstructorMarker;)V
public fun <init> (Ljava/util/Locale;Ljava/time/format/DateTimeFormatter;Ljava/lang/String;Ljava/util/Set;Z)V
public synthetic fun <init> (Ljava/util/Locale;Ljava/time/format/DateTimeFormatter;Ljava/lang/String;Ljava/util/Set;ZILkotlin/jvm/internal/DefaultConstructorMarker;)V
public final fun component1 ()Ljava/util/Locale;
public final fun component2 ()Ljava/time/format/DateTimeFormatter;
public final fun component3 ()Ljava/lang/String;
public final fun component4 ()Ljava/util/Set;
public final fun component5 ()Z
public fun <init> (Ljava/util/Locale;Ljava/time/format/DateTimeFormatter;Ljava/lang/String;Ljava/util/Set;Ljava/util/Set;Ljava/lang/Boolean;)V
public synthetic fun <init> (Ljava/util/Locale;Ljava/time/format/DateTimeFormatter;Ljava/lang/String;Ljava/util/Set;Ljava/util/Set;Ljava/lang/Boolean;ILkotlin/jvm/internal/DefaultConstructorMarker;)V
public final synthetic fun copy (Ljava/util/Locale;Ljava/time/format/DateTimeFormatter;Ljava/lang/String;Ljava/util/Set;)Lorg/jetbrains/kotlinx/dataframe/api/ParserOptions;
public final fun copy (Ljava/util/Locale;Ljava/time/format/DateTimeFormatter;Ljava/lang/String;Ljava/util/Set;Z)Lorg/jetbrains/kotlinx/dataframe/api/ParserOptions;
public final fun copy (Ljava/util/Locale;Ljava/time/format/DateTimeFormatter;Ljava/lang/String;Ljava/util/Set;Ljava/util/Set;Ljava/lang/Boolean;)Lorg/jetbrains/kotlinx/dataframe/api/ParserOptions;
public static synthetic fun copy$default (Lorg/jetbrains/kotlinx/dataframe/api/ParserOptions;Ljava/util/Locale;Ljava/time/format/DateTimeFormatter;Ljava/lang/String;Ljava/util/Set;ILjava/lang/Object;)Lorg/jetbrains/kotlinx/dataframe/api/ParserOptions;
public static synthetic fun copy$default (Lorg/jetbrains/kotlinx/dataframe/api/ParserOptions;Ljava/util/Locale;Ljava/time/format/DateTimeFormatter;Ljava/lang/String;Ljava/util/Set;ZILjava/lang/Object;)Lorg/jetbrains/kotlinx/dataframe/api/ParserOptions;
public static synthetic fun copy$default (Lorg/jetbrains/kotlinx/dataframe/api/ParserOptions;Ljava/util/Locale;Ljava/time/format/DateTimeFormatter;Ljava/lang/String;Ljava/util/Set;Ljava/util/Set;Ljava/lang/Boolean;ILjava/lang/Object;)Lorg/jetbrains/kotlinx/dataframe/api/ParserOptions;
public fun equals (Ljava/lang/Object;)Z
public final fun getDateTimeFormatter ()Ljava/time/format/DateTimeFormatter;
public final fun getDateTimePattern ()Ljava/lang/String;
public final fun getLocale ()Ljava/util/Locale;
public final fun getNullStrings ()Ljava/util/Set;
public final fun getUseFastDoubleParser ()Z
public final fun getSkipTypes ()Ljava/util/Set;
public final fun getUseFastDoubleParser ()Ljava/lang/Boolean;
public fun hashCode ()I
public fun toString ()Ljava/lang/String;
}
Expand Down Expand Up @@ -9948,6 +9951,16 @@ public final class org/jetbrains/kotlinx/dataframe/impl/ColumnAccessTrackerKt {
public static final fun trackColumnAccess (Lkotlin/jvm/functions/Function0;)Ljava/util/List;
}

public final class org/jetbrains/kotlinx/dataframe/impl/ColumnNameGenerator {
public fun <init> ()V
public fun <init> (Ljava/util/List;)V
public synthetic fun <init> (Ljava/util/List;ILkotlin/jvm/internal/DefaultConstructorMarker;)V
public final fun addIfAbsent (Ljava/lang/String;)V
public final fun addUnique (Ljava/lang/String;)Ljava/lang/String;
public final fun contains (Ljava/lang/String;)Z
public final fun getNames ()Ljava/util/List;
}

public final class org/jetbrains/kotlinx/dataframe/impl/DataFrameSize {
public fun <init> (II)V
public final fun component1 ()I
Expand Down Expand Up @@ -10211,7 +10224,9 @@ public final class org/jetbrains/kotlinx/dataframe/impl/columns/UtilsKt {
}

public final class org/jetbrains/kotlinx/dataframe/impl/io/FastDoubleParser {
public fun <init> ()V
public fun <init> (Lorg/jetbrains/kotlinx/dataframe/api/ParserOptions;)V
public synthetic fun <init> (Lorg/jetbrains/kotlinx/dataframe/api/ParserOptions;ILkotlin/jvm/internal/DefaultConstructorMarker;)V
public final fun parseOrNull (Ljava/lang/CharSequence;)Ljava/lang/Double;
public final fun parseOrNull ([BIILjava/nio/charset/Charset;)Ljava/lang/Double;
public final fun parseOrNull ([CII)Ljava/lang/Double;
Expand Down Expand Up @@ -10273,23 +10288,38 @@ public final class org/jetbrains/kotlinx/dataframe/io/CSVType : java/lang/Enum {
public final class org/jetbrains/kotlinx/dataframe/io/ColType : java/lang/Enum {
public static final field BigDecimal Lorg/jetbrains/kotlinx/dataframe/io/ColType;
public static final field Boolean Lorg/jetbrains/kotlinx/dataframe/io/ColType;
public static final field Char Lorg/jetbrains/kotlinx/dataframe/io/ColType;
public static final field Companion Lorg/jetbrains/kotlinx/dataframe/io/ColType$Companion;
public static final field DEFAULT Ljava/lang/String;
public static final field Double Lorg/jetbrains/kotlinx/dataframe/io/ColType;
public static final field Duration Lorg/jetbrains/kotlinx/dataframe/io/ColType;
public static final field Instant Lorg/jetbrains/kotlinx/dataframe/io/ColType;
public static final field Int Lorg/jetbrains/kotlinx/dataframe/io/ColType;
public static final field JsonArray Lorg/jetbrains/kotlinx/dataframe/io/ColType;
public static final field JsonObject Lorg/jetbrains/kotlinx/dataframe/io/ColType;
public static final field LocalDate Lorg/jetbrains/kotlinx/dataframe/io/ColType;
public static final field LocalDateTime Lorg/jetbrains/kotlinx/dataframe/io/ColType;
public static final field LocalTime Lorg/jetbrains/kotlinx/dataframe/io/ColType;
public static final field Long Lorg/jetbrains/kotlinx/dataframe/io/ColType;
public static final field String Lorg/jetbrains/kotlinx/dataframe/io/ColType;
public static final field Url Lorg/jetbrains/kotlinx/dataframe/io/ColType;
public static fun getEntries ()Lkotlin/enums/EnumEntries;
public static fun valueOf (Ljava/lang/String;)Lorg/jetbrains/kotlinx/dataframe/io/ColType;
public static fun values ()[Lorg/jetbrains/kotlinx/dataframe/io/ColType;
}

public final class org/jetbrains/kotlinx/dataframe/io/ColType$Companion {
}

public final class org/jetbrains/kotlinx/dataframe/io/CommonKt {
public static final fun asFileOrNull (Ljava/net/URL;)Ljava/io/File;
public static final fun asUrl (Ljava/lang/String;)Ljava/net/URL;
public static final fun catchHttpResponse (Ljava/net/URL;Lkotlin/jvm/functions/Function1;)Lorg/jetbrains/kotlinx/dataframe/DataFrame;
public static final fun isFile (Ljava/net/URL;)Z
public static final fun isProtocolSupported (Ljava/net/URL;)Z
public static final fun isURL (Ljava/lang/String;)Z
public static final fun isUrl (Ljava/lang/String;)Z
public static final fun skippingBomCharacters (Ljava/io/InputStream;)Ljava/io/InputStream;
public static final fun toDataFrame (Ljava/util/List;Z)Lorg/jetbrains/kotlinx/dataframe/DataFrame;
public static synthetic fun toDataFrame$default (Ljava/util/List;ZILjava/lang/Object;)Lorg/jetbrains/kotlinx/dataframe/DataFrame;
public static final fun urlAsFile (Ljava/net/URL;)Ljava/io/File;
Expand All @@ -10315,6 +10345,7 @@ public final class org/jetbrains/kotlinx/dataframe/io/CsvKt {
public static synthetic fun readDelimStr$default (Lorg/jetbrains/kotlinx/dataframe/DataFrame$Companion;Ljava/lang/String;CLjava/util/Map;ILjava/lang/Integer;ILjava/lang/Object;)Lorg/jetbrains/kotlinx/dataframe/DataFrame;
public static final fun toCsv (Lorg/jetbrains/kotlinx/dataframe/DataFrame;Lorg/apache/commons/csv/CSVFormat;)Ljava/lang/String;
public static synthetic fun toCsv$default (Lorg/jetbrains/kotlinx/dataframe/DataFrame;Lorg/apache/commons/csv/CSVFormat;ILjava/lang/Object;)Ljava/lang/String;
public static final fun toKType (Lorg/jetbrains/kotlinx/dataframe/io/ColType;)Lkotlin/reflect/KType;
public static final fun toType (Lorg/jetbrains/kotlinx/dataframe/io/ColType;)Lkotlin/reflect/KClass;
public static final fun writeCSV (Lorg/jetbrains/kotlinx/dataframe/DataFrame;Ljava/io/File;Lorg/apache/commons/csv/CSVFormat;)V
public static final fun writeCSV (Lorg/jetbrains/kotlinx/dataframe/DataFrame;Ljava/lang/Appendable;Lorg/apache/commons/csv/CSVFormat;)V
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ public enum class DataSchemaVisibility {
EXPLICIT_PUBLIC,
}

// TODO add more options
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: convert to issues or fix

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already part of the umbrella issue todo-list: #827

public annotation class CsvOptions(public val delimiter: Char)

/**
Expand Down
Loading