Skip to content

Commit de2994c

Browse files
committed
feat(sru): enhance CQL search capabilities and performance
* SRU CQL Parser (cql_parser.py): - Add 4 new Dublin Core search indexes: - dc.note → note.label (MARC 500) - dc.tableofcontents → tableOfContents (MARC 505) - dc.abstract → summary.label (MARC 520) - dc.dissertation → dissertation.label (MARC 502) - Add ES_SORT_MODIFIERS for sort order mapping (ascending/descending) - Add ES_SORT_INDEX_MAPPINGS for sortable field mappings - Add SUPPORTED_RELATION_MODIFIERS and UNSUPPORTED_RELATION_MODIFIERS - Fix typos: SERVER_CHOISE → SERVER_CHOICE * SRU Explain (explaine.py): - Add _ES_MAPPINGS_CACHE module-level dictionary for caching - Implement ES mappings caching in __init__ to avoid repeated file I/O - Fix unsafe open() call by using context manager * SRU Views (views.py): - Add EXPLAIN_CACHE_MAX_AGE (3600s) and SEARCH_CACHE_MAX_AGE (60s) constants - Add hashlib import for cache key generation * MARC 21 Serializer (marc.py): - Add authority control numbers ($0) to MARC 650__ and 655__ fields - Extract $ref from TOPIC and genreForm entities for authority links Co-Authored-By: Peter Weber <[email protected]>
1 parent ca206fa commit de2994c

File tree

10 files changed

+2305
-287
lines changed

10 files changed

+2305
-287
lines changed

rero_ils/modules/documents/dojson/contrib/jsontodc/model.py

Lines changed: 491 additions & 29 deletions
Large diffs are not rendered by default.

rero_ils/modules/documents/dojson/contrib/jsontomarc21/model.py

Lines changed: 775 additions & 110 deletions
Large diffs are not rendered by default.

rero_ils/modules/documents/loaders/marcxml.py

Lines changed: 59 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -25,18 +25,73 @@
2525

2626

2727
def marcxml_marshmallow_loader():
28-
"""Marshmallow loader for MARCXML requests.
28+
"""Load and convert MARCXML from HTTP request to RERO ILS JSON format.
2929
30-
The method convert only one record, otherwise will return a bad request.
31-
:return: converted marc21 json record.
30+
This loader processes MARCXML data from the request body and converts it
31+
to RERO ILS internal JSON format using DoJSON transformations. It is designed
32+
for single-record imports via the REST API.
33+
34+
The conversion process:
35+
1. Extracts raw MARCXML from request.data
36+
2. Splits the XML stream into individual records
37+
3. Parses each MARCXML record into MARC 21 structure
38+
4. Transforms MARC 21 to RERO ILS JSON using DoJSON
39+
5. Marks the record as draft (_draft=True)
40+
41+
The loader enforces single-record processing:
42+
- If multiple MARCXML records are detected, returns HTTP 400 error
43+
- This ensures controlled imports and proper validation
44+
45+
Returns:
46+
dict: RERO ILS JSON document with the following structure:
47+
- Standard RERO ILS document fields (title, contributions, etc.)
48+
- _draft=True: Marks the record as a draft for review
49+
50+
Raises:
51+
werkzeug.exceptions.BadRequest: When multiple MARCXML records are
52+
detected in the request data (HTTP 400).
53+
54+
Example:
55+
POST request with MARCXML body::
56+
57+
<?xml version="1.0" encoding="UTF-8"?>
58+
<record xmlns="http://www.loc.gov/MARC21/slim">
59+
<leader>00000nam a2200000 c 4500</leader>
60+
<datafield tag="245" ind1="1" ind2="0">
61+
<subfield code="a">Sample Title</subfield>
62+
</datafield>
63+
</record>
64+
65+
Returns::
66+
67+
{
68+
"title": [{"type": "bf:Title", "_text": "Sample Title", ...}],
69+
"_draft": True,
70+
...
71+
}
72+
73+
Note:
74+
Records imported via this loader are automatically marked as drafts
75+
to prevent accidental publication of unvalidated data.
3276
"""
77+
# Split the XML stream into individual MARCXML records
3378
marcxml_records = split_stream(BytesIO(request.data))
3479
json_record = {}
80+
81+
# Process each MARCXML record (should be only one)
3582
for number_of_xml_records, marcxml_record in enumerate(marcxml_records):
83+
# Parse MARCXML into MARC 21 dictionary structure
3684
marc21json_record = create_record(marcxml_record)
85+
86+
# Transform MARC 21 to RERO ILS JSON using DoJSON rules
3787
json_record = marc21.do(marc21json_record)
38-
# converted records are considered as draft
88+
89+
# Mark imported records as drafts to prevent immediate publication
90+
# and ensure they go through the validation workflow
3991
json_record["_draft"] = True
92+
93+
# Reject requests with multiple records (only single-record import allowed)
4094
if number_of_xml_records > 0:
4195
abort(400)
96+
4297
return json_record

rero_ils/modules/documents/serializers/dc.py

Lines changed: 92 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -37,10 +37,26 @@
3737

3838

3939
class DublinCoreSerializer(_DublinCoreSerializer):
40-
"""Dublin Core serializer for records.
41-
42-
Note: This serializer is not suitable for serializing large number of
43-
records.
40+
"""Dublin Core serializer for document records.
41+
42+
This serializer transforms RERO ILS document records into Dublin Core XML format,
43+
following the Dublin Core Metadata Element Set specifications. It handles both
44+
individual record serialization and search result serialization with SRU support.
45+
46+
The serializer performs the following transformations:
47+
- Converts RERO ILS JSON documents to Dublin Core XML
48+
- Resolves record references and processes internationalized fields
49+
- Handles contribution data with i18n support
50+
- Generates SRU-compliant search responses
51+
52+
Note:
53+
This serializer loads complete records into memory and is not suitable
54+
for serializing large numbers of records (>1000). For bulk exports,
55+
consider using streaming serialization.
56+
57+
See Also:
58+
- Dublin Core Metadata Initiative: https://dublincore.org/
59+
- SRU Protocol: https://www.loc.gov/standards/sru/
4460
"""
4561

4662
# Default namespace mapping.
@@ -58,14 +74,50 @@ class DublinCoreSerializer(_DublinCoreSerializer):
5874
container_element = "record"
5975

6076
def transform_record(self, pid, record, links_factory=None, language=DEFAULT_LANGUAGE, **kwargs):
61-
"""Transform record into an intermediate representation."""
77+
"""Transform a document record into Dublin Core intermediate representation.
78+
79+
This method converts a RERO ILS document record into a Dublin Core
80+
compatible dictionary format by:
81+
1. Dumping the record with resolved references
82+
2. Processing contribution fields with i18n support
83+
3. Applying Dublin Core transformation rules
84+
85+
Args:
86+
pid (str): Persistent identifier of the record.
87+
record (Document): The document record instance to transform.
88+
links_factory (callable, optional): Factory function for generating
89+
record links. Defaults to None.
90+
language (str, optional): Target language for i18n fields.
91+
Defaults to application's BABEL_DEFAULT_LANGUAGE.
92+
**kwargs: Additional keyword arguments passed to the transformation.
93+
94+
Returns:
95+
dict: Dublin Core representation of the record with standard DC elements
96+
(dc:title, dc:creator, dc:date, dc:identifier, etc.).
97+
"""
6298
record = record.dumps(document_replace_refs_dumper)
6399
if contributions := record.pop("contribution", []):
64100
record["contribution"] = process_i18n_literal_fields(contributions)
65101
return dublincore.do(record, language=language)
66102

67103
def transform_search_hit(self, pid, record, links_factory=None, language=DEFAULT_LANGUAGE, **kwargs):
68-
"""Transform search result hit into an intermediate representation."""
104+
"""Transform a search hit into Dublin Core intermediate representation.
105+
106+
Retrieves the complete document record by PID and delegates to
107+
:meth:`transform_record` for the actual transformation.
108+
109+
Args:
110+
pid (str): Persistent identifier of the document.
111+
record (dict): Elasticsearch hit source (minimal record data).
112+
links_factory (callable, optional): Factory function for generating
113+
record links. Defaults to None.
114+
language (str, optional): Target language for i18n fields.
115+
Defaults to application's BABEL_DEFAULT_LANGUAGE.
116+
**kwargs: Additional keyword arguments passed to transform_record.
117+
118+
Returns:
119+
dict: Dublin Core representation of the record.
120+
"""
69121
record = Document.get_record_by_pid(pid)
70122
return self.transform_record(
71123
pid=pid,
@@ -76,12 +128,40 @@ def transform_search_hit(self, pid, record, links_factory=None, language=DEFAULT
76128
)
77129

78130
def serialize_search(self, pid_fetcher, search_result, links=None, item_links_factory=None, **kwargs):
79-
"""Serialize a search result.
80-
81-
:param pid_fetcher: Persistent identifier fetcher.
82-
:param search_result: Elasticsearch search result.
83-
:param links: Dictionary of links to add to response.
84-
:param item_links_factory: Factory function for record links.
131+
"""Serialize Elasticsearch search results into Dublin Core XML.
132+
133+
Generates an SRU-compliant searchRetrieveResponse containing Dublin Core
134+
records for all search hits. The response includes:
135+
- Total number of matching records
136+
- Dublin Core XML records for each hit
137+
- SRU metadata (query echo, pagination info)
138+
- Next record position for pagination
139+
140+
The method processes search results in the following order:
141+
1. Extract SRU metadata and pagination info from search results
142+
2. Transform each search hit to Dublin Core format
143+
3. Build XML structure with SRU envelope
144+
4. Add echoed search parameters for SRU compliance
145+
146+
Args:
147+
pid_fetcher (callable): Function to extract persistent identifier from hits.
148+
search_result (dict): Elasticsearch search response containing:
149+
- hits.total.value: Total number of matching documents
150+
- hits.hits: List of search result hits
151+
- hits.sru: SRU-specific metadata (optional)
152+
links (dict, optional): Additional links to include in response.
153+
Currently unused. Defaults to None.
154+
item_links_factory (callable, optional): Factory function for generating
155+
per-record links. Defaults to None.
156+
**kwargs: Additional keyword arguments passed to transformation methods.
157+
158+
Returns:
159+
bytes: UTF-8 encoded XML string containing the complete SRU response
160+
with Dublin Core records.
161+
162+
Note:
163+
The 'ln' query parameter from the request controls the output language
164+
for internationalized fields.
85165
"""
86166
total = search_result["hits"]["total"]["value"]
87167
sru = search_result["hits"].get("sru", {})

0 commit comments

Comments
 (0)