File tree Expand file tree Collapse file tree 8 files changed +55
-3
lines changed
java/com/marklogic/flux/impl/importdata
resources/delimited-files Expand file tree Collapse file tree 8 files changed +55
-3
lines changed Original file line number Diff line number Diff line change @@ -32,10 +32,10 @@ are followed by a list of options common to every Flux command.
32
32
You can specify a command name without entering its full name, as long as you enter a sufficient number of characters
33
33
such that Flux can uniquely identify the command name.
34
34
35
- For example, instead of entering ` import-aggregate-xml- files ` , you can enter ` import-ag ` as it is the only command in
36
- Flux with that sequence of letters:
35
+ For example, instead of entering ` import-parquet- files ` , you can enter ` import-p ` as it is the only command in
36
+ Flux beginning with that sequence of letters:
37
37
38
- ./bin/flux import-ag --path path/to/data etc...
38
+ ./bin/flux import-p --path path/to/data etc...
39
39
40
40
If Flux cannot uniquely identify the command name, it will print an error and list the command names that match what
41
41
you entered.
Original file line number Diff line number Diff line change @@ -86,6 +86,12 @@ it may be important to query for documents that have a particular field with a v
86
86
The ` import-avro-files ` command supports aggregating related rows together to produce hierarchical documents. See
87
87
[ Aggregating rows] ( ../aggregating-rows.md ) for more information.
88
88
89
+ ## Reading compressed files
90
+
91
+ Flux will automatically read files compressed with GZIP when they have a filename ending in ` .gz ` ; you do not need to
92
+ specify a compression option. As noted in the "Advanced options" section below, you can use ` -Pcompression= ` to
93
+ explicitly specify a compression algorithm if Flux is not able to read your compressed files automatically.
94
+
89
95
## Advanced options
90
96
91
97
The ` import-avro-files ` command reuses Spark's support for reading Avro files. You can include any of
Original file line number Diff line number Diff line change @@ -106,6 +106,12 @@ the content can be correctly translated to UTF-8 when written to MarkLogic - e.g
106
106
The ` import-delimited-files ` command supports aggregating related rows together to produce hierarchical documents. See
107
107
[ Aggregating rows] ( ../aggregating-rows.md ) for more information.
108
108
109
+ ## Reading compressed files
110
+
111
+ Flux will automatically read files compressed with GZIP when they have a filename ending in ` .gz ` ; you do not need to
112
+ specify a compression option. As noted in the "Advanced options" section below, you can use ` -Pcompression= ` to
113
+ explicitly specify a compression algorithm if Flux is not able to read your compressed files automatically.
114
+
109
115
## Advanced options
110
116
111
117
The ` import-delimited-files ` command reuses Spark's support for reading delimited text data. You can include any of
Original file line number Diff line number Diff line change @@ -83,6 +83,12 @@ the content can be correctly translated to UTF-8 when written to MarkLogic:
83
83
etc...
84
84
```
85
85
86
+ ## Reading compressed files
87
+
88
+ Flux will automatically read files compressed with GZIP when they have a filename ending in ` .gz ` ; you do not need to
89
+ specify a compression option. As noted in the "Advanced options" section below, you can use ` -Pcompression= ` to
90
+ explicitly specify a compression algorithm if Flux is not able to read your compressed files automatically.
91
+
86
92
## Advanced options
87
93
88
94
The ` import-aggregate-json-files ` command reuses Spark's support for reading JSON files. You can include any of
Original file line number Diff line number Diff line change @@ -86,6 +86,12 @@ it may be important to query for documents that have a particular field with a v
86
86
The ` import-orc-files ` command supports aggregating related rows together to produce hierarchical documents. See
87
87
[ Aggregating rows] ( ../aggregating-rows.md ) for more information.
88
88
89
+ ## Reading compressed files
90
+
91
+ Flux will automatically read files compressed with GZIP when they have a filename ending in ` .gz ` ; you do not need to
92
+ specify a compression option. As noted in the "Advanced options" section below, you can use ` -Pcompression= ` to
93
+ explicitly specify a compression algorithm if Flux is not able to read your compressed files automatically.
94
+
89
95
## Advanced options
90
96
91
97
The ` import-orc-files ` command reuses Spark's support for reading ORC files. You can include any of
Original file line number Diff line number Diff line change @@ -86,6 +86,12 @@ it may be important to query for documents that have a particular field with a v
86
86
The ` import-parquet-files ` command supports aggregating related rows together to produce hierarchical documents. See
87
87
[ Aggregating rows] ( ../aggregating-rows.md ) for more information.
88
88
89
+ ## Reading compressed files
90
+
91
+ Flux will automatically read files compressed with GZIP when they have a filename ending in ` .gz ` ; you do not need to
92
+ specify a compression option. As noted in the "Advanced options" section below, you can use ` -Pcompression= ` to
93
+ explicitly specify a compression algorithm if Flux is not able to read your compressed files automatically.
94
+
89
95
## Advanced options
90
96
91
97
The ` import-parquet-files ` command reuses Spark's support for reading Parquet files. You can include any of
Original file line number Diff line number Diff line change @@ -124,6 +124,28 @@ void jsonLines() {
124
124
verifyDoc ("/delimited/lastName-3.json" , "firstName-3" , "lastName-3" );
125
125
}
126
126
127
+ @ Test
128
+ void gzippedJsonLines () {
129
+ run (
130
+ "import-aggregate-json-files" ,
131
+ "--path" , "src/test/resources/delimited-files/line-delimited-json.txt.gz" ,
132
+ "--json-lines" ,
133
+ "--connection-string" , makeConnectionString (),
134
+ "--permissions" , DEFAULT_PERMISSIONS ,
135
+ "--collections" , "delimited-json-test" ,
136
+ "--uri-template" , "/delimited/{lastName}.json"
137
+ );
138
+
139
+ assertCollectionSize (
140
+ "Spark data sources will automatically handle .gz files without -Pcompression=gzip being specified." ,
141
+ "delimited-json-test" , 3
142
+ );
143
+ verifyDoc ("/delimited/lastName-1.json" , "firstName-1" , "lastName-1" );
144
+ verifyDoc ("/delimited/lastName-2.json" , "firstName-2" , "lastName-2" );
145
+ verifyDoc ("/delimited/lastName-3.json" , "firstName-3" , "lastName-3" );
146
+ }
147
+
148
+
127
149
@ Test
128
150
void jsonRootName () {
129
151
run (
You can’t perform that action at this time.
0 commit comments