Skip to content

Commit 956ca3d

Browse files
authored
Merge pull request #202 from marklogic/feature/path-docs
Path option now properly supports wildcards
2 parents af6cec2 + 3827de4 commit 956ca3d

File tree

7 files changed

+111
-40
lines changed

7 files changed

+111
-40
lines changed

docs/common-options.md

Lines changed: 45 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,33 @@ Flux beginning with that sequence of letters:
4040
If Flux cannot uniquely identify the command name, it will print an error and list the command names that match what
4141
you entered.
4242

43+
## Reading options from a file
44+
45+
Flux supports reading options from a file. In a text file, you can either add each option name and value to a separate
46+
line:
47+
48+
```
49+
--host
50+
localhost
51+
--port
52+
8000
53+
etc...
54+
```
55+
56+
Or you can put one or more options on the same line:
57+
58+
```
59+
--host localhost
60+
--port 8000
61+
etc...
62+
```
63+
64+
You then reference the file via the `@` symbol followed by a filename:
65+
66+
./bin/flux import-files @my-options.txt
67+
68+
You can reference multiple files this way and also include additional options on the command line.
69+
4370
## Connecting to MarkLogic
4471

4572
Every command in Flux will need to connect to a MarkLogic database, either for reading data or writing data or both.
@@ -49,6 +76,8 @@ Generally, you must include at least the following information for each command:
4976
- The port of a [MarkLogic REST API app server](https://docs.marklogic.com/guide/rest-dev) connected to the database you wish to interact with.
5077
- Authentication information.
5178

79+
### Using a connection string
80+
5281
For the common use case of using digest or basic authentication with a MarkLogic app server, you can use the
5382
`--connection-string` option to specify the host, port, username, and password in a single concise option:
5483

@@ -68,6 +97,22 @@ password of `sp@r:k`, you would use the following string:
6897
For other authentication mechanisms, you must use the `--host` and `--port` options to define the host and port for
6998
your MarkLogic app server.
7099

100+
### Determining the connection type
101+
102+
The `--connection-type` option determines which of the following approaches Flux uses for connecting to MarkLogic:
103+
104+
- `GATEWAY` = the default value; Flux assumes that it cannot directly connect to each host in the MarkLogic cluster, most
105+
likely due to the value of `--host` or the host value found in `--connection-string` being that of a load balancer that
106+
controls access to MarkLogic.
107+
- `DIRECT` = Flux will try to connect to each host in the MarkLogic cluster.
108+
109+
If you do not have a load balancer in front of MarkLogic, and if Flux is able to connect to each host that hosts one
110+
or more forests for the database you wish to access, then you can set `--connection-type` to a value of `DIRECT`. This
111+
will often improve performance as Flux will be able to both connect to multiple hosts, thereby utilizing the app server
112+
threads available on each host, and also write directly to a forest on the host that it connects to.
113+
114+
### Connection options
115+
71116
All available connection options are shown in the table below:
72117

73118
| Option | Description |
@@ -99,33 +144,6 @@ All available connection options are shown in the table below:
99144
| `--username` | Username when using `DIGEST` or `BASIC` authentication. |
100145

101146

102-
## Reading options from a file
103-
104-
Flux supports reading options from a file. In a text file, you can either add each option name and value to a separate
105-
line:
106-
107-
```
108-
--host
109-
localhost
110-
--port
111-
8000
112-
etc...
113-
```
114-
115-
Or you can put one or more options on the same line:
116-
117-
```
118-
--host localhost
119-
--port 8000
120-
etc...
121-
```
122-
123-
You then reference the file via the `@` symbol followed by a filename:
124-
125-
./bin/flux import-files @my-options.txt
126-
127-
You can reference multiple files this way and also include additional options on the command line.
128-
129147
## Previewing data
130148

131149
The `--preview` option works on every command in Flux. For example, given a set of Parquet files in a directory,

docs/import/import-files/selecting-files.md

Lines changed: 18 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -17,17 +17,32 @@ of specifying paths are described below.
1717

1818
## Specifying paths
1919

20-
The `--path` option controls where files are read from. You can specify multiple occurrences of `--path`, each with a
21-
different value, to import files from many sources in a single command invocation:
20+
Each command that imports from files requires the use of the `--path` option with at least one value - for example:
21+
22+
--path path/to/files
23+
24+
You can include multiple values for the `--path` option, which can utilize both relative and absolute file paths:
25+
26+
--path relative/path/to/files /absolute/path/to/files
2227

23-
--path relative/path/to/files --path /absolute/path/to/files
2428

2529
The value of the `--path` option can be any directory or file path. You can use wildcards in any part of the path. For
2630
example, the following, would select every file starting with `example` in any child directory of the root `/data`
2731
directory:
2832

2933
--path /data/*/example*
3034

35+
## Filtering files
36+
37+
You can restrict which files are read from a directory by specifying a standard
38+
[glob expression](https://en.wikipedia.org/wiki/Glob_(programming)) via the `--filter` option:
39+
40+
--path /data/examples --filter "example*.json"
41+
42+
Depending on your shell environment, you may need to include the value of `--filter` in double quotes as shown above to
43+
ensure that each asterisk is interpreted correctly. However, if you include `--filter` in an options file as
44+
described in [Common Options](../../common-options.md), you do not need double quotes around the value.
45+
3146
## Reading from S3
3247

3348
Flux can read files from S3 via a path expression of the form `s3a://bucket-name/optional/path`.
@@ -56,10 +71,3 @@ By default, child directories of each directory specified by `--path` are includ
5671
option:
5772

5873
--recursive-file-lookup false
59-
60-
## Filtering files
61-
62-
You can restrict which files are read from a directory by specifying a standard
63-
[glob expression](https://en.wikipedia.org/wiki/Glob_(programming)) via the `--filter` option:
64-
65-
--path /data/examples --filter some*.json

flux-cli/src/main/java/com/marklogic/flux/impl/SparkUtil.java

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,6 @@ public static SparkSession buildSparkSession() {
2222
public static SparkSession buildSparkSession(String masterUrl) {
2323
return SparkSession.builder()
2424
.master(masterUrl)
25-
// These can be overridden via the "-C" CLI option.
2625
.config("spark.ui.showConsoleProgress", "true")
2726
.config("spark.sql.session.timeZone", "UTC")
2827
.getOrCreate();

flux-cli/src/main/java/com/marklogic/flux/impl/importdata/ReadFilesParams.java

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,15 @@
1616
public class ReadFilesParams<T extends ReadFilesOptions> implements ReadFilesOptions<T> {
1717

1818
// "path" is the name so that picocli shows "--path <path>" instead of "--path <paths>".
19-
@CommandLine.Option(required = true, names = "--path", description = "Specify one or more path expressions for selecting files to import.")
19+
@CommandLine.Option(
20+
required = true,
21+
names = "--path",
22+
description = "One or more path expressions for selecting files to import.",
23+
// For a path like "/opt/data*.xml", the user's shell may resolve the asterisk and determine a list of all the
24+
// files that match. Each file then becomes a value passed to this List. In order for that to work properly,
25+
// we need arity set to allow for unlimited values. See https://picocli.info/#_arity for more information.
26+
arity = "1..*"
27+
)
2028
private List<String> path = new ArrayList<>();
2129

2230
@CommandLine.Option(names = "--abort-on-read-failure", description = "Causes the command to abort when it fails to read a file.")

flux-cli/src/test/java/com/marklogic/flux/impl/importdata/ImportAggregateXmlFilesTest.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ void elementAndPathAreRequired() {
3232
assertStderrContains(() -> run(
3333
"import-aggregate-xml-files",
3434
"--connection-string", makeConnectionString()
35-
), "Missing required options: '--path <path>', '--element <element>'");
35+
), "Missing required options: '--element <element>', '--path <path>'");
3636
}
3737

3838
@Test

flux-cli/src/test/java/com/marklogic/flux/impl/importdata/ImportFilesTest.java

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,42 @@ void multiplePaths() {
4242
verifyDocsWereWritten(uris.length, uris);
4343
}
4444

45+
@Test
46+
void pathAndFilterWithWildcardInOptionsFile() {
47+
run(
48+
"import-files",
49+
"@src/test/resources/options-files/import-people-files.txt",
50+
"--connection-string", makeConnectionString(),
51+
"--permissions", DEFAULT_PERMISSIONS,
52+
"--collections", "people-files",
53+
"--uri-replace", ".*/xml-file,''"
54+
);
55+
56+
assertCollectionSize(
57+
"Verifying that when --filter is included in an options file, double quotes should not be used for " +
58+
"the value of --filter.",
59+
"people-files", 2
60+
);
61+
}
62+
63+
@Test
64+
void pathWithArityofTwo() {
65+
run(
66+
"import-files",
67+
"--path", "src/test/resources/mixed-files/hello.txt", "src/test/resources/mixed-files/hello.xml",
68+
"--connection-string", makeConnectionString(),
69+
"--permissions", DEFAULT_PERMISSIONS,
70+
"--collections", "people-files",
71+
"--uri-replace", ".*/xml-file,''"
72+
);
73+
74+
assertCollectionSize(
75+
"Verifying that --path accepts multiple values, which is needed in order for an asterisk to work " +
76+
"in some shell environments.",
77+
"people-files", 2
78+
);
79+
}
80+
4581
@Test
4682
void withUsernameAndPasswordAndAuthType() {
4783
run(
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
--path src/test/resources/xml-file/temp
2+
--filter people*.xml

0 commit comments

Comments
 (0)