layout | title | parent | nav_order | permalink |
---|---|---|---|---|
default |
Searching documents |
Managing documents |
3 |
/documents/searching |
The POST /v1/search endpoint in the MarkLogic REST API supports
returning content and metadata for each matching document. Similar to reading multiple documents via the
GET /v1/documents endpoint, the data is returned in a multipart
HTTP response. The MarkLogic Python client simplifies use of this operation by returning a list of Document
instances
via the client.documents.search
method.
{: .no_toc .text-delta }
- TOC {:toc}
The examples below all assume that you have created a new MarkLogic user named "python-user" as described in the
setup guide. To run these examples, please run the following script first, which will
create a Client
instance that interacts with the out-of-the-box "Documents" database in MarkLogic:
from marklogic import Client
from marklogic.documents import Document, DefaultMetadata
client = Client('http://localhost:8000', digest=('python-user', 'pyth0n'))
client.documents.write([
DefaultMetadata(permissions={"rest-reader": ["read", "update"]}, collections=["python-search-example"]),
Document("/search/doc1.json", {"text": "hello world"}),
Document("/search/doc2.json", {"text": "hello again"})
])
The search endpoint in the REST API provides several ways of submitting a query. The simplest approach is by submitting a search string that utilizes the the MarkLogic search grammar:
# Find documents with the term "hello" in them.
docs = client.documents.search("hello")
assert len(docs) == 2
# Find documents with the term "world" in them.
docs = client.documents.search("world")
assert len(docs) == 1
The search string in the example corresponds to the q
argument, which is the first argument in the method and thus
does not need to be named.
More complex queries can be submitted via the query
parameter. The value of this parameter must be one of the
following:
For each of the above approaches, the query can be either a dictionary (for use when defining the query via JSON) or a string of XML. Based on the type, the client will set the appropriate Content-type header.
Examples of a structured query:
# JSON
docs = client.documents.search(query={"query": {"term-query": {"text": "hello"}}})
assert len(docs) == 2
# XML
query = "<query xmlns='http://marklogic.com/appservices/search'>\
<term-query><text>hello</text></term-query></query>"
docs = client.documents.search(query=query)
assert len(docs) == 2
Examples of a serialized CTS query:
# JSON
query = {"ctsquery": {"wordQuery": {"text": "hello"}}}
docs = client.documents.search(query=query)
assert len(docs) == 2
# XML
query = "<word-query xmlns='http://marklogic.com/cts'><text>hello</text></word-query>"
docs = client.documents.search(query=query)
assert len(docs) == 2
Examples of a combined query:
# JSON
options = {"constraint": {"name": "c1", "word": {"element": {"name": "text"}}}}
query = {
"search": {"options": options},
"qtext": "c1:hello",
}
docs = client.documents.search(query=query)
assert len(docs) == 2
# XML
query = "<search xmlns='http://marklogic.com/appservices/search'><options>\
<constraint name='c1'><word><element name='text'/></word></constraint>\
</options><qtext>c1:hello</qtext></search>"
docs = client.documents.search(query=query)
assert len(docs) == 2
The search endpoint supports a variety of parameters for controlling the search request. For convenience, several of the
more commonly used parameters are available as arguments in the client.documents.search
method:
# Specify the starting point and page length.
docs = client.documents.search("hello", start=2, page_length=5)
assert len(docs) == 1
# Search via a collection without any search string.
docs = client.documents.search(collections=["python-search-example"])
assert len(docs) == 2
Metadata for each document can be retrieved via the categories
argument. The acceptable values for this argument
match those of the category
parameter in the search endpoint
documentation: content
, metadata
, metadata-values
, collections
, permissions
, properties
, and quality
.
The following shows different examples of configuring the categories
argument:
# Retrieve all content and metadata for each matching document.
docs = client.documents.search("hello", categories=["content", "metadata"])
assert "python-search-example" in docs[0].collections
assert "python-search-example" in docs[1].collections
# Retrieve only permissions for each matching document.
docs = client.documents.search("hello", categories=["permissions"])
assert docs[0].content is None
assert docs[1].content is None
The client.documents.search
method provides a **kwargs
argument, so you can pass in any other arguments you would
normally pass to requests
. For example:
docs = client.documents.search("hello", params={"database": "Documents"})
assert len(docs) == 2
Please see the application developer's guide for more information on searching documents.
Starting in the 1.1.0 release, the client.documents.search
method accepts a
return_response
argument. When that argument is set to True
, the original response
is returned. This can be useful for custom processing of the response or debugging requests.
Starting in the 1.1.0 release, you can reference a
REST API transaction via the tx
argument. See the guide on transactions for further information.
If the client.documents.read
method receives an HTTP response with a status code of 200, then the client will return
a list of Document
instances. For any other status code, the client will return the requests
Response
object,
providing access to the error details returned by the MarkLogic REST API.
The status_code
and text
fields in the Response
object will typically be of the most interest when
debugging a problem. Please see
Response API documentation for complete information on what's available in this object.