Skip to content
Erik Hennum edited this page Jun 5, 2019 · 5 revisions

Important Note: Data Services is now documented with the MarkLogic Java API at:

http://docs.marklogic.com/guide/java/DataServices

This page is retained as a convenience but not maintained. Please consult the link above.


Creating Data Services Using the MarkLogic Java Development Tools

The MarkLogic Java Client API includes development tooling and runtime proxies so a Java application can access custom data services in a MarkLogic cluster. The Java application calls strongly typed services running in the databases as if they were “plain old” Java methods. The API handles the underlying network protocol and data marshalling.

Benefits

  • Avoids unnecessary round-trips by encapsulating the data logic, ensuring that service implementations run close to the data
  • Reduces custom plumbing code by handling network and data marshalling transparently
  • Reduces the potential for API drift as requirements and implementations change by enforcing strongly typed interfaces

Motivation

MarkLogic is designed integrate into an enterprise environment via data services. A data service is a fixed interface over the data managed in MarkLogic expressed in terms of the consuming application. Data services can run queries (“Find eligible insurance plans for an applicant”), updates (“Flag this claim as fraudulent”), or both (“Adjust the rates of plans that haven’t made claims in the last year”). A MarkLogic cluster can support dozens or even hundreds of different data services operating over the data and metadata managed in a data hub.

A data service is different from a generic query interface, like JDBC or ODBC, that typically operates at the physical layer of the database. Architecturally, a data service is more like a remote procedure call or a stored procedure. The data service allows the service developer to encapsulate the physical layout of the data and constrain or enhance queries and updates with business logic.

MarkLogic provides a rich scripting environment as part of the DBMS. The developer implements data services using an imperative programming language in addition to a declarative query language. MarkLogic supports JavaScript and XQuery runtimes. MarkLogic optimizes this code to run close to the data, minimizing data transfer and leveraging cluster-wide indexes and caches.

Enterprise middle-tier business logic generally integrates many services: data services from a MarkLogic cluster as well as services from other providers. This service orchestration and business logic happen at a layer above the data infrastructure, outside of a particular service provider. The flexibility to mix and match services and decouple providers and consumers is one of the benefits of a service-oriented architecture.

How it Works

You declare a function signature for each endpoint that implements a data service.

From a set of such declarations, the development tools generate a Java proxy service class that encapsulates the execution of the endpoints including the marshalling and transport of the request and response data. The middle-tier business logic can then call the methods of the generated class.

That is, a MarkLogic data service is made up of three main components:

  • Endpoint Declaration: A JSON document that specifies the name of the service as well as the names and data types of the inputs and outputs.
  • Endpoint Proxy: Code that exposes the service definition in Java, automatically invoking the services remotely against a MarkLogic cluster for the caller
  • Endpoint Module: The implementation of a data service in MarkLogic as a JavaScript or XQuery module.

By declaring the data tier functions needed by the middle-tier business logic, the endpoint declaration establishes a division of responsibility between the Java middle-tier developer and the data service developer. The endpoint declaration acts as a contract for collaboration between the two roles.

Prerequisites

To create a proxy service. you need a Java JDK environment with Gradle and the following MarkLogic software components:

The MarkLogic Java development tools are provided as a Gradle plugin.

This document assumes that you are familiar with Java and Gradle.
If you're unfamiliar with Gradle, the ml-gradle project lists some resources for getting started:

Installing and learning Gradle

Typically, you create one Gradle project directory for all of the work on proxy services for one content database.

Relation to the Java Client API

The Java Client API supports physical operations on the database. In particular, the Java Client API provides DocumentManager (and its derivations) and QueryManager to write, reads, or query for documents and their metadata at the uris identifying the documents in the database. Where a transaction must span multiple requests, the client uses a physical Transaction object.

Proxy services complement these physical operations with logical operations. The Java middle tier invokes endpoints, passing and receiving values. How the operation is implemented against the database including how values are written or read is encapsulated entirely within the endpoint. Where an operation must interleave middle-tier and enode tasks, the client uses a logical session represented by a SessionState object (as described later).

The Java Client API and proxy services connect with the database in the same way. Both use the DatabaseClientFactory class to instantiate a DatabaseClient object for use in requests.

A REST server used for the Java Client API cannot, however, be used for proxy services. Similarly, an appserver used for proxy services cannot be used by the Java Client API.

As a result, a DatabaseClient object used with the endpoint proxy classes generated for proxy services cannot also be used with the DocumentManager or QueryManager classes or with the other predefined classes provided by the Java API.

Note: The middle-tier client cannot specify the database explicitly when creating a DatabaseClient but, instead, must use the default database associated with the appserver.

Overview of Creating a Proxy Service

From the proxy service source files, you generate Java methods that call endpoint modules deployed to the modules database:

Calls from the generated class to the deployed endpoint

The development process consists of the following steps:

  1. Set up a MarkLogic App Server
  2. Create a proxy service directory within the Gradle project directory
  3. Create a file to declare the service
  4. Create files to declare one or more endpoint proxies for the service
  5. Implement the module for each endpoint proxy
  6. Deploy the proxy service directory to the modules database of the App Server
  7. Generate the Java Class from the proxy service declaration

Setting Up an App Server for the Proxy Service

Typically, you set up a single App Server for all of the proxy services for a content database.

The App Server configuration must have the following characteristics:

  • Must have a modules database.
  • Must have a root of /.
  • Must not have a rewriter.

You cannot use the following App Servers that are automatically created when you install MarkLogic:

  • The REST/HTTP/XDBC App Server on port 8000
  • The Admin API App Server on port 8001
  • The REST Management API App Server on port 8002

As noted above, you also cannot use a REST server (that is, an appserver created for the Client REST API).

To make creation and configuration of the App Server and its modules database a repeatable operation that can be managed in a version control system, you can put resources in the Gradle project directory and use ml-gradle operate on those resources. This approach is described in the Getting started page. You must set the mlNoRestServer property to true.

As an easy expedient when learning about MarkLogic, you can instead configure the App Server and modules database manually. The repeatable approach using Gradle, however, is recommended as a long term practice.

Creating the Proxy Service Directory

For each proxy service, you create a separate subdirectory under the Gradle project directory.

Each proxy service directory holds all of the resources required to support the proxy service, including:

  • The service declaration
  • The endpoint proxy declarations
  • The module called by each endpoint proxy
  • Any server-side libraries to support the endpoint modules

For easier deployment to the modules database using ml-gradle, you should create the proxy service directory under the src/main/ml-modules/root project subdirectory. If necessary, you can specify a different parent directory for the root directory with the mlModulePaths property.

For instance, a project might choose to provide the priceDynamically service in the following proxy service directory:

src/main/ml-modules/root/inventory/priceDynamically

Declaring the Proxy Service

The proxy service directory must contain exactly one service declaration file. The service declaration file must be named service.json

The service declaration consists of a JSON object with the following properties:

Property Declares
endpointDirectory The directory path for the installed endpoint modules within the modules database.
$javaClass The full name of the generated service class including the package qualification.
desc Optional; plain text documentation for the service (which is emitted as JavaDoc by the generated class).
$comment Optional; can contain an object, array, or value with developer comments about the declaration.

The following example declares the /inventory/priceDynamically/ directory as the address of the endpoints in the modules database and declares com.some.business.inventory.DynamicPricer as the generated Java class:

{
  "endpointDirectory" : "/inventory/priceDynamically/",
  "$javaClass"        : "com.some.business.inventory.DynamicPricer"
}

Conventionally, the value of the endpointDirectory property should be the same as the path of the proxy service directory under the special ml-gradle src/main/ml-modules/root directory (so, the service directory for this service.json file would conventionally be src/main/ml-modules/root/inventory/priceDynamically).

The endpoint directory value should include the leading / and should resemble a Linux path.

After declaring the service, you populate it with endpoint proxy declarations.

Declaring the Endpoint

The name, parameters, and return value for each endpoint is declared in a file with the .api extension in the service directory. The file contains a JSON data structure with the following properties:

Property Declares
functionName The name used to call the endpoint, which must match the name (without the .api extension) of the declaration file.
desc Optional; plain text documentation for the endpoint (emitted as JavaDoc).
params Optional; an array specifying the parameters of the endpoint; omitted for endpoints with no parameters. Parameter objects have name, desc, datatype, nullable, and multiple properties.
return Optional; an object specifying the endpoint return value; omitted for endpoints with no return value. The child object has desc, datatype, nullable, and multiple properties.
errorDetail Optional; specifies a value from the following enumeration to control whether error responses include stack traces:
  • log (the default) to log the stack trace on the server but not return the stack trace to the middle tier.
  • return to include the stack trace in the exception on the middle tier as well as log it on the server.

The endpoint declaration is used both to generate a method in a Java class to call on the middle tier and to unmarshall the request and marshall the response when the appserver executes the endpoint module.

Note: The .api file for proxy endpoint must be loaded into the modules database with the endpoint module.

The following sections provide more detail about the params and return declarations.

Structure of a Parameter Definition

A parameter definition in the params property is an object with the following properties:

Property Declares
name The name of the parameter
desc Optional; a description of the parameter to include in JavaDoc.
datatype The datatype of the parameter (see data types).
nullable Optional; whether the parameter can be null (defaulting to false).
multiple Optional; whether the parameter can have more than one value (defaulting to false).

Structure of the Return Type Definition

The return property of an endpoint declaration is an object with the following properties:

Property Declares
desc Optional; a description of the return value to include in JavaDoc.
datatype The datatype of the return value (see data types).
nullable Optional; whether the return value can be null (defaulting to false).
multiple Optional; whether the endpoint can return more than one value (defaulting to false).

Example of an Endpoint Proxy

The following example declares that the lookupPricingFactors endpoint has two required parameters as well as a required return value:

{
  "functionName" : "lookupPricingFactors",
  "params" : [ {
    "name" : "productCode",
    "datatype" : "string"
  }, {
    "name" : "customerId",
    "datatype" : "unsignedLong"
  } ],
  "return" : {
    "datatype" : "jsonDocument"
  }
}

Server Data Types for Values

You can specify atomic or node server data types for parameters and return values:

Category Data types
atomics boolean, date, dateTime, dayTimeDuration, decimal, double, float, int, long, string, time, unsignedInt, unsignedLong
nodes array, object, binaryDocument, jsonDocument, textDocument, xmlDocument

The data types with direct equivalents in the Java language atomics are represented with those Java classes by default. These data types include boolean, double, float, int, long, string, unsignedInt, and unsignedLong. For instance, an int is represented with a Java Integer. The unsignedInt and unsignedLong types can be manipulated using the unsigned methods of the Java Integer and Long classes.

Other atomic types (including date, dateTime, dayTimeDuration, decimal and time) are represented as a Java String by default.

Other server atomic data types can be passed as a string and cast using the appropriate constructor on the server.

A binaryDocument value is represented as an InputStream by default. All other node data types are represented as a Reader by default.

The array and object data types differ from the jsonDocument data type in not having a document node at the root, which can provide a more natural and efficient JSON value for manipulating in SJS (Server-Side JavaScript).

Mapping Values to Alternative Java Classes

Some server data types can be represented with an alternative Java class instead of the default Java representation. For example, date is represented by default as a String, but you can choose to use java.util.LocalDate instead.

To specify an alternative Java class, supply the fully qualified class name in the $javaClass property of a parameter or return type. You must still specify the server data type in the datatype property.

The following table lists server data types with their available alternative representations:

Server Data Type Mappable Java Classes
date java.time.LocalDate
dateTime java.util.Date, java.time.LocalDateTime, java.time.OffsetDateTime
dayTimeDuration java.time.Duration
decimal java.math.BigDecimal
time java.time.LocalTime, java.time.OffsetTime
array java.io.InputStream,
java.io.Reader,
java.lang.String,
com.fasterxml.jackson.databind.node.ArrayNode,
com.fasterxml.jackson.core.JsonParser
object java.io.InputStream,
java.io.Reader,
java.lang.String,
com.fasterxml.jackson.databind.node.ObjectNode,
com.fasterxml.jackson.core.JsonParser
binaryDocument java.io.InputStream
jsonDocument java.io.InputStream,
java.io.Reader,
java.lang.String,
com.fasterxml.jackson.databind.JsonNode,
com.fasterxml.jackson.core.JsonParser
textDocument java.io.InputStream,
java.io.Reader,
java.lang.String
xmlDocument java.io.InputStream,
java.io.Reader,
java.lang.String,
org.w3c.dom.Document,
org.xml.sax.InputSource,
javax.xml.transform.Source,
javax.xml.stream.XMLEventReader,
javax.xml.stream.XMLStreamReader

The following example represents the occurred date parameter as a Java LocalDate and represents the returned JSON document as a Jackson JsonNode.

{
  "functionName" : "produceReport",
  "params":[ {
    "name":"id", "datatype":"int"
  }, {
    "name":"occurred", "datatype":"date",
      "$javaClass":"java.time.LocalDate"
  } ],
  "return" : {
    "datatype":"jsonDocument",
      "$javaClass":"com.fasterxml.jackson.databind.JsonNode"}
  }
}

Calling Endpoints in a Session

Ordinarily, the database server doesn't keep any state associated with a call to an endpoint (with the obvious but important exception of documents persisted in the database). When the middle tier sends all of the input needed for an data tier operation, the operation can be completed in a single request. This approach typically maximizes performance and minimizes load.

Some operations, however, sessions that coordinate multiple requests. Examples of such operations include

  • Interleaving middle tier and data tier operations ()such as multistatement transactions in which the middle tier logic must be inserted between the initial database change and a subsequent database change)
  • Host affinity with an enode when working with a load balancer to exploit query caches on the enode.

You can handle these edge cases by calling the endpoints in a session. If an endpoint needs to participate in a session, its declaration must include exactly one parameter with the session data type. The session parameter may be nullable but not multiple (and may never be a return value).

If at least one endpoint has a session parameter, the generated class provides a newSessionState() factory that returns a SessionState object. The expected pattern of use:

  • Construct a session object when a new session is needed.

  • Pass the same session object on each call that should execute in the same session.

Where endpoint modules need to participate in the same session, you must declare a session parameter for each of the corresponding endpoint proxies and document the expectations for coordination in the middle tier consumer code. For instance, if one session endpoint starts a multistatement transaction, another continues work in the same multistatement transaction, and a third commits the transaction, the documentation should explain that the same session should be used with each call and the sequence in which the calls should be made.

The proxy service doesn't end the session explicitly. Instead, the session eventually times out (as controlled by the configuration of the App Server). The middle tier code is responsible for calling an endpoint module to commit a multistatement transaction before the session times out.

Providing the Module for an Endpoint Proxy

You implement the data operations for an endpoint proxy in an XQuery or Server-Side JavaScript endpoint module. The proxy service directory of your project must contain exactly one endpoint module for each endpoint declaration in your service.

An endpoint module must have the same base name as the endpoint declaration, and either a .xqy (XQuery) or .sjs (JavaScript) extension, depending on the implementation language.

The App Server handles marshalling and unmarshalling for the endpoint. That is, the endpoint doesn't interact directly with the transport layer (which, internally, is currently HTTP).

The endpoint module must define an external variable for each parameter in the endpoint declaration. In an SJS endpoint, use a var statement at the top of the module with no initialization of the variable. In an XQuery endpoint, use an external variable with the server data type corresponding to the parameter data type.

The endpoint module must also return a value with the appropriate data type.

For the lookupPricingFactors endpoint whose declaration was shown earlier, the SJS endpoint module would resemble the following fragment:

'use strict';      
var productCode; // an xs.string value
var customerId;  // an xs.unsignedLong value
... /* the code that produces a JSON document as output */

The equivalent XQuery endpoint module would resemble the following fragment:

 xquery version "1.0-ml";
 declare variable $productCode as xs:string       external;
 declare variable $customerId  as xs:unsignedLong external;
 declare option xdmp:mapping "false";
 ... (: the code that produces a JSON document as output :)

As a convenience, you can use the initializeModule Gradle task to create the skeleton for an endpoint module from a endpoint declaration. You specify the path (relative to the project directory) for the endpoint declaration with the endpointDeclarationFile property and the module extension (which can be either sjs or xqy) with the moduleExtension property.

Your Gradle build script should apply the com.marklogic.ml-development-tools plugin. You can execute the Gradle task using any of the following techniques:

  • By setting the properties in the gradle.properties file and specifying the initializeModule task on the gradle command line
  • By specifying the properties with the -P option as well as the initializeModule task on the gradle command line
  • By supplying a build script with a custom task of the com.marklogic.client.tools.gradle.ModuleInitTask type

For the command-line approach, the Gradle build script would resemble the following example:

plugins {
    id 'com.marklogic.ml-development-tools' version '4.1.1'
}

On Linux, the command-line for initializing the lookupPricingFactors.sjs SJS endpoint module from the lookupPricingFactors.api endpoint declaration might resemble the following example:

gradle \
    -PendpointDeclarationFile=src/main/ml-modules/root/inventory/priceDynamically/lookupPricingFactors.api \
    -PmoduleExtension=sjs \
    initializeModule

Once each .api endpoint declaration file has an equivalent endpoint module to implement the endpoint, you can load the proxy service directory into the modules database and generate the proxy service Java class. (The Java code generation checks the endpoint module in the service directory to determine how to invoke the endpoint.)

Deploying a Proxy Service

You must load the resources from the proxy service directory into the module database of the App Server. Your resources must be deployed to the same database directory as the value of the endpointDirectory property of the service declaration file (service.json).

To load a directory into the modules database, you can use either of the mlLoadModules or mlReloadModules tasks provided by ml-gradle. You supply the properties required for deployment including the following:

  • mlHost - required
  • mlAppServicesUsername - required if not admin and mlPassword not set
  • mlAppServicesPassword - required if not admin and mlUsername not set
  • mlAppServicesPort - required if not 8000
  • mlModulesDatabaseName - required
  • mlModulePermissions - required
  • mlNoRestServer - required to be true
  • mlReplaceTokensInModules - typically false

If you didn't create the proxy service directory under the src/main/ml-modules/root project subdirectory, you must specify the parent directory for the root directory with the mlModulePaths property.

You can supply properties using a gradle.properties file or a task.

After you have configured the properties, the command to load the modules would resemble the following example (or the equivalent with mlReloadModules):

gradle mlLoadModules

For more information, see:

How modules are loaded

Generating the Proxy Service Class

A proxy service class is a Java interface for calling the endpoint modules for your service on the MarkLogic enode. You generate the proxy service class from the resources in the proxy service directory.

The proxy service class has the name specified by the $javaClass property of the service declaration file (service.json). The class has one method for each endpoint declaration with an associated endpoint module in the proxy service directory.

To generate the class, you use the generateEndpointProxies Gradle task. You specify the path (relative to the project directory) of the service declaration file (service.json) with the serviceDeclarationFile property. You can also specify the output directory with the javaBaseDirectory property or omit the property to use the default (which is the src/main/java subdirectory of the project directory).

Your Gradle build script should apply the com.marklogic.ml-development-tools plugin. You can execute the task using any of the following techniques:

  • By setting the properties in the gradle.properties file and specifying the generateEndpointProxies task on the gradle command line
  • By specifying the properties with the -P option as well as the generateEndpointProxies task on the gradle command line
  • By supplying a build script with custom task of the com.marklogic.client.tools.gradle.EndpointProxiesGenTask type
  • By supplying a build script with the endpointProxiesConfig extension configuration and specifying the generateEndpointProxies task on the gradle command line

For the custom task approach, the Gradle build script for generating a class with a method for each endpoint in the priceDynamically service might resemble the following example:

plugins {
    id 'com.marklogic.ml-development-tools' version '4.1.1'
}
task generateDynamicPricer(type: com.marklogic.client.tools.gradle.EndpointProxiesGenTask) {
    serviceDeclarationFile = 'src/main/ml-modules/root/inventory/priceDynamically/service.json'
}

The command-line to execute the custom task would resemble the following example:

gradle generateDynamicPricer

You only need to regenerate the proxy service class when the list of endpoints or the name, parameters, or return value for an endpoint changes. You don't need the regenerate the proxy service class after changing the module that implements the endpoint.

Using a Proxy Service Class

In general, you can work with your generated proxy service Java class in the same way as with manually written Java source files.

The generated class has an on() static method that is a factory for constructing the class. The on() method requires a DatabaseClient for the App Server. The database client can be constructed using the DatabaseClientFactory class of the Java API.

Note: You cannot specify the database explicitly when creating the DatabaseClient but, instead, must use the default database associated with the appserver.

Compiling a Proxy Service Class

After generating the proxy service class, you compile it in the usual way. In particular, by generating the proxy service class in the conventional directory for Gradle (which is src/main/java) and declaring a dependency on the MarkLogic Java API in the build script, you can use Gradle to compile the generated class without other configuration.

Testing a Proxy Service Class

Once your proxy service is deployed to the MarkLogic modules database, you can test your proxy service Java class similar to other Java classes.

To write functional tests that confirm the endpoint modules work correctly, you can use any general-purpose test framework (for instance, JUnit). The test framework should

  • Call the on() static factory method to construct an instance.
  • Call the appropriate method to invoke the endpoint module.
  • Inspect the returned value to confirm the operation of the endpoint module.

Because the generated proxy service class is provided as a Java interface, you can replace the implementation with a mock implementation of the interface for testing a middle-tier consumer.

Documenting a Proxy Service Class

The generated class has JavaDoc comments based on the desc properties from the service declaration and endpoint declarations. You can generate JavaDoc for the middle tier consumer of the proxy service class in the usual way.

Packaging a Proxy Service

Finally, you can create a jar file with the compiled executable proxy service class in the usual way.

Clone this wiki locally