Skip to content

Commit 8916c8d

Browse files
authored
Merge pull request #279 from marklogic/release/1.1.2
Merge 1.1.2 into master
2 parents e0796d0 + 1be7440 commit 8916c8d

29 files changed

+321
-58
lines changed

docs/api.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -22,15 +22,15 @@ To add Flux as a dependency to your application, add the following to your Maven
2222
<dependency>
2323
<groupId>com.marklogic</groupId>
2424
<artifactId>flux-api</artifactId>
25-
<version>1.1.0</version>
25+
<version>1.1.2</version>
2626
</dependency>
2727
```
2828

2929
Or if you are using Gradle, add the following to your `build.gradle` file:
3030

3131
```
3232
dependencies {
33-
implementation "com.marklogic:flux-api:1.1.0"
33+
implementation "com.marklogic:flux-api:1.1.2"
3434
}
3535
```
3636

@@ -97,7 +97,7 @@ buildscript {
9797
mavenCentral()
9898
}
9999
dependencies {
100-
classpath "com.marklogic:flux-api:1.1.0"
100+
classpath "com.marklogic:flux-api:1.1.2"
101101
}
102102
}
103103
```
@@ -139,7 +139,7 @@ buildscript {
139139
mavenCentral()
140140
}
141141
dependencies {
142-
classpath "com.marklogic:flux-api:1.1.0"
142+
classpath "com.marklogic:flux-api:1.1.2"
143143
classpath("com.marklogic:ml-gradle:4.8.0") {
144144
exclude group: "com.fasterxml.jackson.databind"
145145
exclude group: "com.fasterxml.jackson.core"

docs/export/export-archives.md

+12-4
Original file line numberDiff line numberDiff line change
@@ -82,9 +82,9 @@ included with the `--categories` option. This option accepts a comma-delimited s
8282

8383
If the option is not included, all metadata will be included.
8484

85-
## Enabling point-in-time queries
85+
## Exporting consistent results
8686

87-
Flux depends on MarkLogic's support for
87+
By default, Flux uses MarkLogic's support for
8888
[point-in-time queries](https://docs.marklogic.com/11.0/guide/app-dev/point_in_time#id_47946) when querying for
8989
documents, thus ensuring a [consistent snapshot of data](https://docs.marklogic.com/guide/java/data-movement#id_18227).
9090
Point-in-time queries depend on the same MarkLogic system timestamp being used for each query. Because system timestamps
@@ -102,8 +102,16 @@ by configuring the `merge timestamp` setting. The recommended practice is to
102102
that exceeds the expected duration of the export operation. For example, a value of `-864,000,000,000` for the merge
103103
timestamp would give the export operation 24 hours to complete.
104104

105-
Flux will soon include an option to not use a snapshot for queries for when the risk of inconsistent results is deemed
106-
to be acceptable.
105+
Alternatively, you can disable the use of point-in-time queries by including the following option:
106+
107+
```
108+
--no-snapshot
109+
```
110+
111+
The above option will not use a snapshot for queries but instead will query for data at multiple points in time. As
112+
noted above in the guide for [consistent snapshots](https://docs.marklogic.com/guide/java/data-movement#id_18227), you
113+
may get unpredictable results if your query matches on data that changes during the export operation. If your data is
114+
not changing, this approach is recommended as it avoids the need to configure merge timestamp.
107115

108116
## Transforming document content
109117

docs/export/export-documents.md

+13-4
Original file line numberDiff line numberDiff line change
@@ -122,9 +122,9 @@ you use for running Flux to break the value into multiple lines:
122122
For queries expressed in XML, you may find it easier to use single quotes instead of double quotes, as single quotes
123123
do not require any escaping.
124124

125-
## Enabling point-in-time queries
125+
## Exporting consistent results
126126

127-
Flux depends on MarkLogic's support for
127+
By default, Flux uses MarkLogic's support for
128128
[point-in-time queries](https://docs.marklogic.com/11.0/guide/app-dev/point_in_time#id_47946) when querying for
129129
documents, thus ensuring a [consistent snapshot of data](https://docs.marklogic.com/guide/java/data-movement#id_18227).
130130
Point-in-time queries depend on the same MarkLogic system timestamp being used for each query. Because system timestamps
@@ -142,8 +142,17 @@ by configuring the `merge timestamp` setting. The recommended practice is to
142142
that exceeds the expected duration of the export operation. For example, a value of `-864,000,000,000` for the merge
143143
timestamp would give the export operation 24 hours to complete.
144144

145-
Flux will soon include an option to not use a snapshot for queries for when the risk of inconsistent results is deemed
146-
to be acceptable.
145+
Alternatively, you can disable the use of point-in-time queries by including the following option:
146+
147+
```
148+
--no-snapshot
149+
```
150+
151+
The above option will not use a snapshot for queries but instead will query for data at multiple points in time. As
152+
noted above in the guide for [consistent snapshots](https://docs.marklogic.com/guide/java/data-movement#id_18227), you
153+
may get unpredictable results if your query matches on data that changes during the export operation. If your data is
154+
not changing, this approach is recommended as it avoids the need to configure merge timestamp.
155+
147156

148157
## Transforming document content
149158

docs/export/export-rdf.md

+14-6
Original file line numberDiff line numberDiff line change
@@ -90,11 +90,11 @@ graph value that will then be associated with every triple that Flux writes to a
9090

9191
To compress each file written by Flux using gzip, simply include `--gzip` as an option.
9292

93-
## Enabling point-in-time queries
93+
## Exporting consistent results
9494

95-
Flux depends on MarkLogic's support for
95+
By default, Flux uses MarkLogic's support for
9696
[point-in-time queries](https://docs.marklogic.com/11.0/guide/app-dev/point_in_time#id_47946) when querying for
97-
documents containing RDF data, thus ensuring a [consistent snapshot of data](https://docs.marklogic.com/guide/java/data-movement#id_18227).
97+
documents, thus ensuring a [consistent snapshot of data](https://docs.marklogic.com/guide/java/data-movement#id_18227).
9898
Point-in-time queries depend on the same MarkLogic system timestamp being used for each query. Because system timestamps
9999
can be deleted when MarkLogic [merges data](https://docs.marklogic.com/11.0/guide/admin-guide/en/understanding-and-controlling-database-merges.html),
100100
you may encounter the following error that causes an export command to fail:
@@ -108,7 +108,15 @@ To resolve this issue, you must
108108
by configuring the `merge timestamp` setting. The recommended practice is to
109109
[use a negative value](https://docs.marklogic.com/11.0/guide/admin-guide/en/understanding-and-controlling-database-merges/setting-a-negative-merge-timestamp-to-preserve-fragments-for-a-rolling-window-of-time.html)
110110
that exceeds the expected duration of the export operation. For example, a value of `-864,000,000,000` for the merge
111-
timestamp would give the export operation 24 hours to complete.
111+
timestamp would give the export operation 24 hours to complete.
112112

113-
Flux will soon include an option to not use a snapshot for queries for when the risk of inconsistent results is deemed
114-
to be acceptable.
113+
Alternatively, you can disable the use of point-in-time queries by including the following option:
114+
115+
```
116+
--no-snapshot
117+
```
118+
119+
The above option will not use a snapshot for queries but instead will query for data at multiple points in time. As
120+
noted above in the guide for [consistent snapshots](https://docs.marklogic.com/guide/java/data-movement#id_18227), you
121+
may get unpredictable results if your query matches on data that changes during the export operation. If your data is
122+
not changing, this approach is recommended as it avoids the need to configure merge timestamp.

docs/getting-started.md

+8-8
Original file line numberDiff line numberDiff line change
@@ -15,18 +15,18 @@ This guide describes how to get started with Flux with some examples demonstrati
1515
## Setup
1616

1717
You can download the latest release of the Flux application zip from [the latest Flux release page](https://github.com/marklogic/flux/releases).
18-
The Flux application zip is titled `marklogic-flux-1.1.0.zip`. You can extract this zip to any location on your
18+
The Flux application zip is titled `marklogic-flux-1.1.2.zip`. You can extract this zip to any location on your
1919
filesystem that you prefer.
2020

2121
### Deploying the example application
2222

2323
The examples in this guide, along with examples found throughout this documentation, depend on a small MarkLogic
2424
application that can be deployed to your own instance of MarkLogic server. The application can be downloaded from
2525
[the latest Flux release page](https://github.com/marklogic/flux/releases) in a zip titled
26-
`marklogic-flux-getting-started-1.1.0.zip`. To use Flux with this example application, perform the following steps:
26+
`marklogic-flux-getting-started-1.1.2.zip`. To use Flux with this example application, perform the following steps:
2727

28-
1. Extract the `marklogic-flux-getting-started-1.1.0.zip` file to any location on your local filesystem.
29-
2. Run `cd marklogic-flux-getting-started-1.1.0` to change to the directory created by extracting the ZIP file.
28+
1. Extract the `marklogic-flux-getting-started-1.1.2.zip` file to any location on your local filesystem.
29+
2. Run `cd marklogic-flux-getting-started-1.1.2` to change to the directory created by extracting the ZIP file.
3030
3. Create a file named `gradle-local.properties` and add `mlPassword=your MarkLogic admin user password` to it.
3131
4. Examine the contents of the `gradle.properties` file to ensure that the value of `mlHost` points to your MarkLogic
3232
server and that the value of `mlRestPort` is a port available for a new MarkLogic app server to use.
@@ -38,15 +38,15 @@ privileges for running the examples in this guide. Finally, the application incl
3838
[MarkLogic TDE template](https://docs.marklogic.com/guide/app-dev/TDE) that creates a view in MarkLogic for the purpose
3939
of demonstrating commands that utilize a [MarkLogic Optic query](https://docs.marklogic.com/guide/app-dev/OpticAPI).
4040

41-
It is recommended to extract the Flux application zip into the `marklogic-flux-getting-started-1.1.0` directory so that
41+
It is recommended to extract the Flux application zip into the `marklogic-flux-getting-started-1.1.2` directory so that
4242
you can easily execute the examples in this guide. After extracting the application zip, the directory should have a
4343
structure similar to this (not all files may be shown):
4444

4545
```
46-
./marklogic-flux-getting-started-1.1.0
46+
./marklogic-flux-getting-started-1.1.2
4747
build.gradle
4848
./data
49-
./marklogic-flux-1.1.0
49+
./marklogic-flux-1.1.2
5050
./gradle
5151
gradle.properties
5252
gradlew
@@ -59,7 +59,7 @@ structure similar to this (not all files may be shown):
5959
You can run Flux without any options to see the list of available commands. If you are using Flux to run these examples,
6060
first change your current directory to where you extract Flux:
6161

62-
cd marklogic-flux-1.1.0
62+
cd marklogic-flux-1.1.2
6363

6464
And then run the Flux executable without any options:
6565

docs/import/import-files/json.md

+21-1
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,25 @@ Flux will write two separate JSON documents, each with a completely different sc
9191
The JSON Lines format is often useful for exporting data from MarkLogic as well. Please see
9292
[this guide](../../export/export-rows.md) for more information on exporting data to JSON Lines files.
9393

94+
### Importing JSON Lines files as is
95+
96+
When importing JSON Lines files, Flux uses the
97+
[Spark JSON data source](https://spark.apache.org/docs/latest/sql-data-sources-json.html) to read each line and conform
98+
the JSON objects to a common schema across the entire set of lines. As noted in the Advanced Options section below,
99+
Spark JSON provides a number of configuration options for controlling how the lines are read. These features can result
100+
in changes to the JSON objects, such as the keys being reordered and fields being added to match the common schema.
101+
102+
For some use cases, you may wish to read each line "as is" without any modification to it. To do so, use the
103+
`--json-lines-raw` option instead of `--json-lines`. With the `--json-lines-raw` option, Flux will read each line as
104+
a JSON document and will not attempt to enforce any commonality across the lines. This option also has the following
105+
effects on the `import-aggregate-json-files` command:
106+
107+
1. You cannot use any `-P` options as described in the "Advanced Options" section below.
108+
2. The `--uri-include-file-path` option has no effect as each JSON document will default to a URI including the file path.
109+
3. The following options also have no effect as each JSON document is intentionally left as is: `--json-root-name`, `--xml-root-name`,
110+
`--xml-namespace`, and `--ignore-null-fields`.
111+
4. You can still read a gzipped file if its filename ends in `.gz`.
112+
94113
## Specifying a JSON root name
95114

96115
It is often useful to have a single "root" field in a JSON document so that it is more self-describing. It
@@ -130,7 +149,8 @@ bin\flux import-aggregate-json-files ^
130149

131150
Flux will automatically read files compressed with gzip when they have a filename ending in `.gz`; you do not need to
132151
specify a compression option. As noted in the "Advanced options" section below, you can use `-Pcompression=` to
133-
explicitly specify a compression algorithm if Flux is not able to read your compressed files automatically.
152+
explicitly specify a compression algorithm if Flux is not able to read your compressed files automatically. Note
153+
that the use of `-Pcompression=` is only supported if the `--json-lines-raw` option is not used.
134154

135155
## Advanced options
136156

docs/spark-integration.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -35,8 +35,8 @@ Flux integrates with [spark-submit](https://spark.apache.org/docs/latest/submitt
3535
submit a Flux command invocation to a remote Spark cluster. Every Flux command is a Spark application, and thus every
3636
Flux command, along with all of its option, can be invoked via `spark-submit`.
3737

38-
To use Flux with `spark-submit`, first download the `marklogic-flux-1.1.0-all.jar` file from the
39-
[GitHub release page](https://github.com/marklogic/flux/releases/tag/1.1.0). This jar file includes Flux and all of
38+
To use Flux with `spark-submit`, first download the `marklogic-flux-1.1.2-all.jar` file from the
39+
[GitHub release page](https://github.com/marklogic/flux/releases/tag/1.1.2). This jar file includes Flux and all of
4040
its dependencies, excluding those of Spark itself, which will be provided via the Spark cluster that you connect to
4141
via `spark-submit`.
4242

@@ -48,7 +48,7 @@ The following shows a notional example of running the Flux `import-files` comman
4848
```
4949
$SPARK_HOME/bin/spark-submit --class com.marklogic.flux.spark.Submit \
5050
--master spark://changeme:7077 \
51-
marklogic-flux-1.1.0-all.jar \
51+
marklogic-flux-1.1.2-all.jar \
5252
import-files \
5353
--path path/to/data \
5454
--connection-string user:password@host:8000 \
@@ -59,7 +59,7 @@ $SPARK_HOME/bin/spark-submit --class com.marklogic.flux.spark.Submit \
5959
```
6060
$SPARK_HOME\bin\spark-submit --class com.marklogic.flux.spark.Submit ^
6161
--master spark://changeme:7077 ^
62-
marklogic-flux-1.1.0-all.jar ^
62+
marklogic-flux-1.1.2-all.jar ^
6363
import-files ^
6464
--path path/to/data ^
6565
--connection-string user:password@host:8000 ^

examples/client-project/build.gradle

+2-2
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ buildscript {
66
mavenLocal()
77
}
88
dependencies {
9-
classpath "com.marklogic:flux-api:1.1.0"
9+
classpath "com.marklogic:flux-api:1.1.2"
1010

1111
// Demonstrates removing the Jackson libraries that otherwise cause a conflict with
1212
// Spark, which requires Jackson >= 2.14.0 and < 2.15.0.
@@ -28,7 +28,7 @@ repositories {
2828
}
2929

3030
dependencies {
31-
implementation "com.marklogic:flux-api:1.1.0"
31+
implementation "com.marklogic:flux-api:1.1.2"
3232
}
3333

3434
tasks.register("runApp", JavaExec) {

flux-cli/build.gradle

+1-1
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ dependencies {
1717
// The rocksdbjni dependency weighs in at 50mb and so far does not appear necessary for our use of Spark.
1818
exclude module: "rocksdbjni"
1919
}
20-
implementation "com.marklogic:marklogic-spark-connector:2.4.1"
20+
implementation "com.marklogic:marklogic-spark-connector:2.4.2"
2121
implementation "info.picocli:picocli:4.7.6"
2222

2323
// Spark 3.4.3 depends on Hadoop 3.3.4, which depends on AWS SDK 1.12.262. As of August 2024, all public releases of

flux-cli/src/main/java/com/marklogic/flux/api/AggregateJsonFilesImporter.java

+19
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,28 @@ interface ReadJsonFilesOptions extends ReadFilesOptions<ReadJsonFilesOptions> {
1717
/**
1818
* @param value set to true to read JSON Lines files. Defaults to reading files that either contain an array
1919
* of JSON objects or a single JSON object.
20+
* @deprecated since 1.1.2; use {@code jsonLines()} instead.
2021
*/
22+
@SuppressWarnings("java:S1133") // Telling Sonar we don't need a reminder to remove this some day.
23+
@Deprecated(since = "1.1.2", forRemoval = true)
2124
ReadJsonFilesOptions jsonLines(boolean value);
2225

26+
/**
27+
* Call this to read JSON Lines files. Otherwise, defaults to reading files that either contain an array of
28+
* JSON objects or a single JSON object.
29+
*
30+
* @since 1.1.2
31+
*/
32+
ReadJsonFilesOptions jsonLines();
33+
34+
/**
35+
* Call this to read JSON Lines files "as is", without any alteration to the documents associated with each
36+
* line.
37+
*
38+
* @since 1.1.2
39+
*/
40+
ReadJsonFilesOptions jsonLinesRaw();
41+
2342
ReadJsonFilesOptions encoding(String encoding);
2443

2544
ReadJsonFilesOptions uriIncludeFilePath(boolean value);

flux-cli/src/main/java/com/marklogic/flux/api/RdfFilesExporter.java

+7
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,13 @@ interface ReadTriplesDocumentsOptions {
3333
ReadTriplesDocumentsOptions partitionsPerForest(int partitionsPerForest);
3434

3535
ReadTriplesDocumentsOptions logProgress(int interval);
36+
37+
/**
38+
* Read documents at multiple points in time, as opposed to using a consistent snapshot.
39+
*
40+
* @since 1.1.2
41+
*/
42+
ReadTriplesDocumentsOptions noSnapshot();
3643
}
3744

3845
interface WriteRdfFilesOptions extends WriteFilesOptions<WriteRdfFilesOptions> {

flux-cli/src/main/java/com/marklogic/flux/api/ReadDocumentsOptions.java

+7
Original file line numberDiff line numberDiff line change
@@ -28,4 +28,11 @@ public interface ReadDocumentsOptions<T extends ReadDocumentsOptions> {
2828
T batchSize(int batchSize);
2929

3030
T partitionsPerForest(int partitionsPerForest);
31+
32+
/**
33+
* Read documents at multiple points in time, as opposed to using a consistent snapshot.
34+
*
35+
* @since 1.1.2
36+
*/
37+
T noSnapshot();
3138
}

flux-cli/src/main/java/com/marklogic/flux/cli/Main.java

+26-6
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@
1616
import org.slf4j.LoggerFactory;
1717
import picocli.CommandLine;
1818

19+
import java.io.PrintWriter;
20+
1921
@CommandLine.Command(
2022
name = "./bin/flux",
2123

@@ -98,12 +100,7 @@ private int executeCommand(CommandLine.ParseResult parseResult) {
98100
}
99101
command.execute(session);
100102
} catch (Exception ex) {
101-
if (parseResult.subcommand().hasMatchedOption("--stacktrace")) {
102-
logger.error("Displaying stacktrace due to use of --stacktrace option", ex);
103-
}
104-
String message = removeStacktraceFromExceptionMessage(ex);
105-
parseResult.commandSpec().commandLine().getErr()
106-
.println(String.format("%nCommand failed, cause: %s", message));
103+
printException(parseResult, ex);
107104
return CommandLine.ExitCode.SOFTWARE;
108105
}
109106
return CommandLine.ExitCode.OK;
@@ -121,6 +118,18 @@ protected SparkSession buildSparkSession(Command selectedCommand) {
121118
SparkUtil.buildSparkSession();
122119
}
123120

121+
private void printException(CommandLine.ParseResult parseResult, Exception ex) {
122+
if (parseResult.subcommand().hasMatchedOption("--stacktrace")) {
123+
logger.error("Displaying stacktrace due to use of --stacktrace option", ex);
124+
}
125+
String message = removeStacktraceFromExceptionMessage(ex);
126+
PrintWriter stderr = parseResult.commandSpec().commandLine().getErr();
127+
stderr.println(String.format("%nCommand failed, cause: %s", message));
128+
if (message != null && message.contains("XDMP-OLDSTAMP")) {
129+
printMessageForTimestampError(stderr);
130+
}
131+
}
132+
124133
/**
125134
* In some errors from our connector, such as when the custom code reader invokes invalid code,
126135
* Spark will oddly put the entire stacktrace into the exception message. Showing that stacktrace isn't a
@@ -148,4 +157,15 @@ private String removeStacktraceFromExceptionMessage(Exception ex) {
148157
private boolean isStacktraceLine(String line) {
149158
return line != null && line.trim().startsWith("at ");
150159
}
160+
161+
/**
162+
* A user can encounter an OLDSTAMP error when exporting data with a consistent snapshot, but it can be difficult
163+
* to know how to resolve the error. Thus, additional information is printed to help the user with resolving this
164+
* error.
165+
*/
166+
private void printMessageForTimestampError(PrintWriter stderr) {
167+
stderr.println(String.format("To resolve an XDMP-OLDSTAMP error, consider using the --no-snapshot option " +
168+
"or consult the Flux documentation at https://marklogic.github.io/flux/ for " +
169+
"information on configuring your database to support point-in-time queries."));
170+
}
151171
}

0 commit comments

Comments
 (0)