Releases · marklogic/marklogic-spark-connector

22 Aug 10:10

rjrudin

2.3.1

0235348

2.3.1

This patch release addresses the following issues:

Can now read document URIs that include non-US-ASCII characters. This was fixed via an upgrade of the Java Client to its 7.0.0 release, whose breaking changes do not have impact on this connector release.
Registered collatedString as a known TDE type, thereby avoiding warnings when reading rows from a TDE that uses that type.
Significantly improved performance when reading aggregate XML files and extracting a URI value from an element.
Fixed bug where a message of "Wrote failed documents to archive file at" was logged when no documents failed.

Assets 4

26 Jul 17:55

rjrudin

2.3.0

0eaf6b1

2.3.0

This minor release provides significant new functionality in support of the 1.0.0 release of the new MarkLogic Flux data movement tool. Much of this functionality is documented in the Flux documentation. We will soon have complete documentation of all the new options in this repository's documentation as well.

In the meantime, the new options in this release are listed below.

Read Options

spark.marklogic.read.javascriptFile and spark.marklogic.read.xqueryFile allow for custom code to be read from a file path.
spark.marklogic.read.partitions.javascriptFile and spark.marklogic.read.partitions.xqueryFile allow for custom code to be read from a file path.
Can now read document rows by specifying a list of newline-delimited URIs via the spark.marklogic.read.documents.uris option.
Can now read rows containing semantic triples in MarkLogic via spark.marklogic.read.triples.graphs, spark.marklogic.read.triples.collections, spark.marklogic.read.triples.query, spark.marklogic.read.triples.stringQuery, spark.marklogic.read.triples.uris, spark.marklogic.read.triples.directory, spark.marklogic.read.triples.options, spark.marklogic.read.triples.filtered, and spark.marklogic.read.triples.baseIri.
Can now read Flux and MLCP archives by setting spark.marklogic.read.files.type to archive or mlcp_archive.
Can control which categories of metadata are read from Flux archives via spark.marklogic.read.archives.categories.
Can now specify the encoding of a file to read via spark.marklogic.read.files.encoding.
Can now see progress logged of reading data from MarkLogic via spark.marklogic.read.logProgress.
Can specify whether to fail on a file read error via spark.marklogic.read.files.abortOnFailure.

Write Options

spark.marklogic.write.threadCount has been altered to reflect the common user understanding of "number of threads used to connect to MarkLogic". If you need to specify a thread count per partition, use spark.marklogic.write.threadCountPerPartition.
Can now see progress logged of data written to MarkLogic via spark.marklogic.write.logProgress.
spark.marklogic.write.javascriptFile and spark.marklogic.write.xqueryFile allow for custom code to be read from a file path.
Settingspark.marklogic.write.archivePathForFailedDocuments to a file path will result in any failed documents being added to an archive zip file at that file path.
spark.marklogic.write.jsonRootName allows for a root field to be added to a JSON document constructed from an arbitrary row.
spark.marklogic.write.xmlRootName and spark.marklogic.write.xmlNamespace allow for an XML document to be constructed from an arbitrary row.
Options starting with spark.marklogic.write.json. will be used to configure how the connector serializes a Spark row into a JSON object.
Can use spark.marklogic.write.graph and spark.marklogic.write.graphOverride to specify the graph when writing RDF triples to MarkLogic.
Deprecated spark.marklogic.write.fileRows.documentType in favor of using spark.marklogic.write.documentType to force a document type on documents written to MarkLogic with an extension unrecognized by MarkLogic.
Can use spark.marklogic.write.files.prettyPrint to pretty-print JSON and XML files written by the connector.
Can use spark.marklogic.write.files.encoding to write files in a different encoding.
Can use spark.marklogic.write.files.rdf.format to specify an RDF type when writing triples to RDF files.
Can use spark.marklogic.write.files.rdf.graph to specify a graph when writing RDF files.

Assets 4

22 Feb 21:10

rjrudin

2.2.0

f1bbf9c

2.2.0

This minor release provides the following enhancements:

Document rows can now be read via MarkLogic search queries.
Document rows can also be written to MarkLogic, thereby allowing for copy operations that read document rows from one database and write them to another database.
Now depends on the MarkLogic Java Client 6.5.0 release, which eliminates some security vulnerabilities via upgrades to OkHttp and Jackson.

Assets 4

17 Nov 15:03

rjrudin

2.1.0

9546666

2.1.0

This minor release provides two new significant enhancements:

Rows can now be read from MarkLogic via custom code.
Rows can now be processed via custom code in MarkLogic.

These capabilities can be mixed with the existing capabilities for reading rows via Optic and writing rows as documents.

Please see the user guide for more information.

Assets 4

21 Jun 14:10

rjrudin

2.0.0

0fc8206

2.0.0

Initial release of the MarkLogic connector for Apache Spark 3. The previous MarkLogic connector was designed for Apache Spark 2 and required use of the MarkLogic Data Hub Framework. This connector requires Apache Spark 3 and does not depend on the Data Hub Framework.

Please see the user guide for further information.

Assets 5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Read Options

Write Options

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Releases: marklogic/marklogic-spark-connector

2.3.1

Uh oh!

2.3.0

Read Options

Write Options

Uh oh!

2.2.0

Uh oh!

2.1.0

Uh oh!

2.0.0

Uh oh!