-
Notifications
You must be signed in to change notification settings - Fork 72
Data Services
Important Note: Data Services is now documented with the MarkLogic Java API at:
http://docs.marklogic.com/guide/java/DataServices
This page is retained as a convenience but not maintained. Please consult the link above.
The MarkLogic Java Client API includes development tooling and runtime proxies so a Java application can access custom data services in a MarkLogic cluster. The Java application calls strongly typed services running in the databases as if they were “plain old” Java methods. The API handles the underlying network protocol and data marshalling.
- Avoids unnecessary round-trips by encapsulating the data logic, ensuring that service implementations run close to the data
- Reduces custom plumbing code by handling network and data marshalling transparently
- Reduces the potential for API drift as requirements and implementations change by enforcing strongly typed interfaces
MarkLogic is designed integrate into an enterprise environment via data services. A data service is a fixed interface over the data managed in MarkLogic expressed in terms of the consuming application. Data services can run queries (“Find eligible insurance plans for an applicant”), updates (“Flag this claim as fraudulent”), or both (“Adjust the rates of plans that haven’t made claims in the last year”). A MarkLogic cluster can support dozens or even hundreds of different data services operating over the data and metadata managed in a data hub.
A data service is different from a generic query interface, like JDBC or ODBC, that typically operates at the physical layer of the database. Architecturally, a data service is more like a remote procedure call or a stored procedure. The data service allows the service developer to encapsulate the physical layout of the data and constrain or enhance queries and updates with business logic.
MarkLogic provides a rich scripting environment as part of the DBMS. The developer implements data services using an imperative programming language in addition to a declarative query language. MarkLogic supports JavaScript and XQuery runtimes. MarkLogic optimizes this code to run close to the data, minimizing data transfer and leveraging cluster-wide indexes and caches.
Enterprise middle-tier business logic generally integrates many services: data services from a MarkLogic cluster as well as services from other providers. This service orchestration and business logic happen at a layer above the data infrastructure, outside of a particular service provider. The flexibility to mix and match services and decouple providers and consumers is one of the benefits of a service-oriented architecture.
You declare a function signature for each endpoint that implements a data service.
From a set of such declarations, the development tools generate a Java proxy service class that encapsulates the execution of the endpoints including the marshalling and transport of the request and response data. The middle-tier business logic can then call the methods of the generated class.
That is, a MarkLogic data service is made up of three main components:
- Endpoint Declaration: A JSON document that specifies the name of the service as well as the names and data types of the inputs and outputs.
- Endpoint Proxy: Code that exposes the service definition in Java, automatically invoking the services remotely against a MarkLogic cluster for the caller
- Endpoint Module: The implementation of a data service in MarkLogic as a JavaScript or XQuery module.
By declaring the data tier functions needed by the middle-tier business logic, the endpoint declaration establishes a division of responsibility between the Java middle-tier developer and the data service developer. The endpoint declaration acts as a contract for collaboration between the two roles.
To create a proxy service. you need a Java JDK environment with Gradle and the following MarkLogic software components:
- MarkLogic Server 9.0-6 or later
- MarkLogic Client Java API version 4.1.0 or later
- ml-gradle version 3.8.2 or later
The MarkLogic Java development tools are provided as a Gradle plugin.
This document assumes that you are familiar with Java and Gradle.
If you're unfamiliar with Gradle, the ml-gradle project lists some
resources for getting started:
Installing and learning Gradle
Typically, you create one Gradle project directory for all of the work on proxy services for one content database.
The Java Client API supports physical operations on the database.
In particular, the Java Client API provides DocumentManager
(and its derivations) and QueryManager
to write, reads, or
query for documents and their metadata at the uris identifying
the documents in the database. Where a transaction must span
multiple requests, the client uses a physical Transaction
object.
Proxy services complement these physical operations with logical
operations. The Java middle tier invokes endpoints, passing
and receiving values. How the operation is implemented against
the database including how values are written or read is
encapsulated entirely within the endpoint. Where an operation
must interleave middle-tier and enode tasks, the client uses
a logical session represented by a SessionState
object
(as described later).
The Java Client API and proxy services connect with the database
in the same way. Both use the DatabaseClientFactory
class
to instantiate a DatabaseClient
object for use in requests.
A REST server used for the Java Client API cannot, however, be used for proxy services. Similarly, an appserver used for proxy services cannot be used by the Java Client API.
As a result, a DatabaseClient
object used with the endpoint
proxy classes generated for proxy services cannot also be used
with the DocumentManager
or QueryManager
classes or with the
other predefined classes provided by the Java API.
Note: The middle-tier client cannot specify the database
explicitly when creating a DatabaseClient
but, instead,
must use the default database associated with the appserver.
From the proxy service source files, you generate Java methods that call endpoint modules deployed to the modules database:
The development process consists of the following steps:
- Set up a MarkLogic App Server
- Create a proxy service directory within the Gradle project directory
- Create a file to declare the service
- Create files to declare one or more endpoint proxies for the service
- Implement the module for each endpoint proxy
- Deploy the proxy service directory to the modules database of the App Server
- Generate the Java Class from the proxy service declaration
Typically, you set up a single App Server for all of the proxy services for a content database.
The App Server configuration must have the following characteristics:
- Must have a modules database.
- Must have a root of
/
. - Must not have a rewriter.
You cannot use the following App Servers that are automatically created when you install MarkLogic:
- The REST/HTTP/XDBC App Server on port 8000
- The Admin API App Server on port 8001
- The REST Management API App Server on port 8002
As noted above, you also cannot use a REST server (that is, an appserver created for the Client REST API).
To make creation and configuration of the App Server and its modules
database a repeatable operation that can be managed in a version control system,
you can put resources in the Gradle project directory and use ml-gradle operate
on those resources. This approach is described in the
Getting started
page. You must set the mlNoRestServer
property to true
.
As an easy expedient when learning about MarkLogic, you can instead configure the App Server and modules database manually. The repeatable approach using Gradle, however, is recommended as a long term practice.
For each proxy service, you create a separate subdirectory under the Gradle project directory.
Each proxy service directory holds all of the resources required to support the proxy service, including:
- The service declaration
- The endpoint proxy declarations
- The module called by each endpoint proxy
- Any server-side libraries to support the endpoint modules
For easier deployment to the modules database using ml-gradle,
you should create the proxy service directory under the src/main/ml-modules/root
project subdirectory. If necessary, you can specify a different
parent directory for the root directory with the mlModulePaths
property.
For instance, a project might choose to provide the priceDynamically
service in the following proxy service directory:
src/main/ml-modules/root/inventory/priceDynamically
The proxy service directory must contain exactly one service declaration
file. The service declaration file must be named service.json
The service declaration consists of a JSON object with the following properties:
Property | Declares |
---|---|
endpointDirectory | The directory path for the installed endpoint modules within the modules database. |
$javaClass | The full name of the generated service class including the package qualification. |
desc | Optional; plain text documentation for the service (which is emitted as JavaDoc by the generated class). |
$comment | Optional; can contain an object, array, or value with developer comments about the declaration. |
The following example declares the /inventory/priceDynamically/
directory as the address of the endpoints in the modules database and
declares com.some.business.inventory.DynamicPricer
as the generated
Java class:
{
"endpointDirectory" : "/inventory/priceDynamically/",
"$javaClass" : "com.some.business.inventory.DynamicPricer"
}
Conventionally, the value of the endpointDirectory
property should be
the same as the path of the proxy service directory under the special
ml-gradle src/main/ml-modules/root
directory (so, the service
directory for this service.json file would conventionally be
src/main/ml-modules/root/inventory/priceDynamically
).
The endpoint directory value should include the leading /
and should
resemble a Linux path.
After declaring the service, you populate it with endpoint proxy declarations.
The name, parameters, and return value for each endpoint is declared in a file with
the .api
extension in the service directory. The file contains a JSON data structure
with the following properties:
Property | Declares |
---|---|
functionName | The name used to call the endpoint, which must match the name (without the .api extension) of the declaration file. |
desc | Optional; plain text documentation for the endpoint (emitted as JavaDoc). |
params | Optional; an array specifying the parameters of the endpoint; omitted for endpoints with no parameters. Parameter objects have name , desc , datatype , nullable , and multiple properties. |
return | Optional; an object specifying the endpoint return value; omitted for endpoints with no return value. The child object has desc , datatype , nullable , and multiple properties. |
errorDetail | Optional; specifies a value from the following enumeration to control whether error responses include stack traces:
|
The endpoint declaration is used both to generate a method in a Java class to call on the middle tier and to unmarshall the request and marshall the response when the appserver executes the endpoint module.
Note: The .api
file for proxy endpoint must be loaded into the modules database
with the endpoint module.
The following sections provide more detail about the params
and return
declarations.
A parameter definition in the params
property is an object with the following properties:
Property | Declares |
---|---|
name | The name of the parameter |
desc | Optional; a description of the parameter to include in JavaDoc. |
datatype | The datatype of the parameter (see data types). |
nullable | Optional; whether the parameter can be null (defaulting to false). |
multiple | Optional; whether the parameter can have more than one value (defaulting to false). |
The return
property of an endpoint declaration is an object with the following properties:
Property | Declares |
---|---|
desc | Optional; a description of the return value to include in JavaDoc. |
datatype | The datatype of the return value (see data types). |
nullable | Optional; whether the return value can be null (defaulting to false). |
multiple | Optional; whether the endpoint can return more than one value (defaulting to false). |
The following example declares that the lookupPricingFactors
endpoint
has two required parameters as well as a required return value:
{
"functionName" : "lookupPricingFactors",
"params" : [ {
"name" : "productCode",
"datatype" : "string"
}, {
"name" : "customerId",
"datatype" : "unsignedLong"
} ],
"return" : {
"datatype" : "jsonDocument"
}
}
You can specify atomic or node server data types for parameters and return values:
Category | Data types |
---|---|
atomics | boolean, date, dateTime, dayTimeDuration, decimal, double, float, int, long, string, time, unsignedInt, unsignedLong |
nodes | array, object, binaryDocument, jsonDocument, textDocument, xmlDocument |
The data types with direct equivalents in the Java language atomics are represented with those Java classes by default. These data types include boolean, double, float, int, long, string, unsignedInt, and unsignedLong. For instance, an int is represented with a Java Integer. The unsignedInt and unsignedLong types can be manipulated using the unsigned methods of the Java Integer and Long classes.
Other atomic types (including date, dateTime, dayTimeDuration, decimal and time) are represented as a Java String by default.
Other server atomic data types can be passed as a string and cast using the appropriate constructor on the server.
A binaryDocument value is represented as an InputStream by default. All other node data types are represented as a Reader by default.
The array
and object
data types differ from the jsonDocument
data type
in not having a document node at the root, which can provide a more natural
and efficient JSON value for manipulating in SJS (Server-Side JavaScript).
Some server data types can be represented with an alternative Java class
instead of the default Java representation. For example, date
is represented
by default as a String, but you can choose to use java.util.LocalDate
instead.
To specify an alternative Java class, supply the fully qualified class name
in the $javaClass
property of a parameter or return type. You must still specify
the server data type in the datatype
property.
The following table lists server data types with their available alternative representations:
Server Data Type | Mappable Java Classes |
---|---|
date | java.time.LocalDate |
dateTime | java.util.Date, java.time.LocalDateTime, java.time.OffsetDateTime |
dayTimeDuration | java.time.Duration |
decimal | java.math.BigDecimal |
time | java.time.LocalTime, java.time.OffsetTime |
array | java.io.InputStream, java.io.Reader, java.lang.String, com.fasterxml.jackson.databind.node.ArrayNode, com.fasterxml.jackson.core.JsonParser |
object | java.io.InputStream, java.io.Reader, java.lang.String, com.fasterxml.jackson.databind.node.ObjectNode, com.fasterxml.jackson.core.JsonParser |
binaryDocument | java.io.InputStream |
jsonDocument | java.io.InputStream, java.io.Reader, java.lang.String, com.fasterxml.jackson.databind.JsonNode, com.fasterxml.jackson.core.JsonParser |
textDocument | java.io.InputStream, java.io.Reader, java.lang.String |
xmlDocument | java.io.InputStream, java.io.Reader, java.lang.String, org.w3c.dom.Document, org.xml.sax.InputSource, javax.xml.transform.Source, javax.xml.stream.XMLEventReader, javax.xml.stream.XMLStreamReader |
The following example represents the occurred
date parameter
as a Java LocalDate and represents the returned JSON document
as a Jackson JsonNode.
{
"functionName" : "produceReport",
"params":[ {
"name":"id", "datatype":"int"
}, {
"name":"occurred", "datatype":"date",
"$javaClass":"java.time.LocalDate"
} ],
"return" : {
"datatype":"jsonDocument",
"$javaClass":"com.fasterxml.jackson.databind.JsonNode"}
}
}
Ordinarily, the database server doesn't keep any state associated with a call to an endpoint (with the obvious but important exception of documents persisted in the database). When the middle tier sends all of the input needed for an data tier operation, the operation can be completed in a single request. This approach typically maximizes performance and minimizes load.
Some operations, however, sessions that coordinate multiple requests. Examples of such operations include
- Interleaving middle tier and data tier operations ()such as multistatement transactions in which the middle tier logic must be inserted between the initial database change and a subsequent database change)
- Host affinity with an enode when working with a load balancer to exploit query caches on the enode.
You can handle these edge cases by calling the endpoints in
a session. If an endpoint needs to participate in a session,
its declaration must include exactly one parameter with the session
data type. The session parameter may be nullable but not multiple
(and may never be a return value).
If at least one endpoint has a session parameter, the generated class
provides a newSessionState()
factory that returns a SessionState
object. The expected pattern of use:
-
Construct a session object when a new session is needed.
-
Pass the same session object on each call that should execute in the same session.
Where endpoint modules need to participate in the same session, you must declare a session parameter for each of the corresponding endpoint proxies and document the expectations for coordination in the middle tier consumer code. For instance, if one session endpoint starts a multistatement transaction, another continues work in the same multistatement transaction, and a third commits the transaction, the documentation should explain that the same session should be used with each call and the sequence in which the calls should be made.
The proxy service doesn't end the session explicitly. Instead, the session eventually times out (as controlled by the configuration of the App Server). The middle tier code is responsible for calling an endpoint module to commit a multistatement transaction before the session times out.
You implement the data operations for an endpoint proxy in an XQuery or Server-Side JavaScript endpoint module. The proxy service directory of your project must contain exactly one endpoint module for each endpoint declaration in your service.
An endpoint module must have the same base name as the endpoint
declaration, and either a .xqy
(XQuery) or .sjs
(JavaScript)
extension, depending on the implementation language.
The App Server handles marshalling and unmarshalling for the endpoint. That is, the endpoint doesn't interact directly with the transport layer (which, internally, is currently HTTP).
The endpoint module must define an external variable for each parameter
in the endpoint declaration. In an SJS endpoint, use a var
statement at the top of the module with no initialization of the
variable. In an XQuery endpoint, use an external variable with
the server data type corresponding to the parameter data type.
The endpoint module must also return a value with the appropriate data type.
For the lookupPricingFactors endpoint whose declaration was shown earlier, the SJS endpoint module would resemble the following fragment:
'use strict';
var productCode; // an xs.string value
var customerId; // an xs.unsignedLong value
... /* the code that produces a JSON document as output */
The equivalent XQuery endpoint module would resemble the following fragment:
xquery version "1.0-ml";
declare variable $productCode as xs:string external;
declare variable $customerId as xs:unsignedLong external;
declare option xdmp:mapping "false";
... (: the code that produces a JSON document as output :)
As a convenience, you can use the initializeModule
Gradle task
to create the skeleton for an endpoint module from a endpoint
declaration. You specify the path (relative to the project directory)
for the endpoint declaration with the endpointDeclarationFile
property and
the module extension (which can be either sjs
or xqy
) with
the moduleExtension
property.
Your Gradle build script should apply the com.marklogic.ml-development-tools
plugin. You can execute the Gradle task using any of the following
techniques:
- By setting the properties in the
gradle.properties
file and specifying theinitializeModule
task on thegradle
command line - By specifying the properties with the -P option as well as
the
initializeModule
task on the gradle command line - By supplying a build script with a custom task of the
com.marklogic.client.tools.gradle.ModuleInitTask
type
For the command-line approach, the Gradle build script would resemble the following example:
plugins {
id 'com.marklogic.ml-development-tools' version '4.1.1'
}
On Linux, the command-line for initializing the lookupPricingFactors.sjs
SJS endpoint module from the lookupPricingFactors.api
endpoint
declaration might resemble the following example:
gradle \
-PendpointDeclarationFile=src/main/ml-modules/root/inventory/priceDynamically/lookupPricingFactors.api \
-PmoduleExtension=sjs \
initializeModule
Once each .api
endpoint declaration file has an equivalent
endpoint module to implement the endpoint, you can load the
proxy service directory into the modules database and
generate the proxy service Java class. (The Java code
generation checks the endpoint module in the service
directory to determine how to invoke the endpoint.)
You must load the resources from the proxy service directory into
the module database of the App Server. Your resources must be
deployed to the same database directory as the value of the endpointDirectory
property of the service declaration file (service.json
).
To load a directory into the modules database, you can use either of the
mlLoadModules
or mlReloadModules
tasks provided by ml-gradle. You supply
the properties required for deployment including the following:
-
mlHost
- required -
mlAppServicesUsername
- required if notadmin
andmlPassword
not set -
mlAppServicesPassword
- required if notadmin
andmlUsername
not set -
mlAppServicesPort
- required if not8000
-
mlModulesDatabaseName
- required -
mlModulePermissions
- required -
mlNoRestServer
- required to be true -
mlReplaceTokensInModules
- typically false
If you didn't create the proxy service directory under the src/main/ml-modules/root
project subdirectory, you must specify the parent directory
for the root directory with the mlModulePaths
property.
You can supply properties using a gradle.properties file or a task.
After you have configured the properties, the command to load
the modules would resemble the following example (or the
equivalent with mlReloadModules
):
gradle mlLoadModules
For more information, see:
A proxy service class is a Java interface for calling the endpoint modules for your service on the MarkLogic enode. You generate the proxy service class from the resources in the proxy service directory.
The proxy service class has the name specified by the $javaClass
property of the service declaration file (service.json
). The class has
one method for each endpoint declaration with an associated
endpoint module in the proxy service directory.
To generate the class, you use the generateEndpointProxies
Gradle task.
You specify the path (relative to the project directory) of the
service declaration file (service.json
) with the
serviceDeclarationFile
property. You can also specify the output
directory with the javaBaseDirectory
property or omit the
property to use the default (which is the src/main/java
subdirectory of the project directory).
Your Gradle build script should apply the com.marklogic.ml-development-tools
plugin. You can execute the task using any of the following techniques:
- By setting the properties in the
gradle.properties
file and specifying thegenerateEndpointProxies
task on the gradle command line - By specifying the properties with the -P option as well as
the
generateEndpointProxies
task on the gradle command line - By supplying a build script with custom task of the
com.marklogic.client.tools.gradle.EndpointProxiesGenTask
type - By supplying a build script with the
endpointProxiesConfig
extension configuration and specifying thegenerateEndpointProxies
task on the gradle command line
For the custom task approach, the Gradle build script for generating a class with a method for each endpoint in the priceDynamically service might resemble the following example:
plugins {
id 'com.marklogic.ml-development-tools' version '4.1.1'
}
task generateDynamicPricer(type: com.marklogic.client.tools.gradle.EndpointProxiesGenTask) {
serviceDeclarationFile = 'src/main/ml-modules/root/inventory/priceDynamically/service.json'
}
The command-line to execute the custom task would resemble the following example:
gradle generateDynamicPricer
You only need to regenerate the proxy service class when the list of endpoints or the name, parameters, or return value for an endpoint changes. You don't need the regenerate the proxy service class after changing the module that implements the endpoint.
In general, you can work with your generated proxy service Java class in the same way as with manually written Java source files.
The generated class has an on()
static method that is a
factory for constructing the class. The on()
method
requires a DatabaseClient
for the App Server. The database
client can be constructed using the DatabaseClientFactory
class of the Java API.
Note: You cannot specify the database explicitly when
creating the DatabaseClient
but, instead, must use the
default database associated with the appserver.
After generating the proxy service class, you compile it
in the usual way. In particular, by generating the proxy
service class in the conventional directory for Gradle
(which is src/main/java
) and declaring a dependency
on the MarkLogic Java API in the build script, you
can use Gradle to compile the generated class without
other configuration.
Once your proxy service is deployed to the MarkLogic modules database, you can test your proxy service Java class similar to other Java classes.
To write functional tests that confirm the endpoint modules work correctly, you can use any general-purpose test framework (for instance, JUnit). The test framework should
- Call the
on()
static factory method to construct an instance. - Call the appropriate method to invoke the endpoint module.
- Inspect the returned value to confirm the operation of the endpoint module.
Because the generated proxy service class is provided as a Java interface, you can replace the implementation with a mock implementation of the interface for testing a middle-tier consumer.
The generated class has JavaDoc comments based on the desc
properties from the service declaration and endpoint
declarations. You can generate JavaDoc for the middle tier
consumer of the proxy service class in the usual way.
Finally, you can create a jar file with the compiled executable proxy service class in the usual way.