diff --git a/docs/strawberryrunners.md b/docs/strawberryrunners.md index a1d9ebbf..ad1c17a9 100644 --- a/docs/strawberryrunners.md +++ b/docs/strawberryrunners.md @@ -13,6 +13,7 @@ tags: Archipelago's [Strawberry Runners (SBR)](https://github.com/esmero/strawberry_runners) module provides provides a set of post-processing capabilities for the JSON based metadata, files and entities that comprise your Archipelago Digital Objects (ADOs). These post-processing actions are based on dispatched events, direct http calls, and invoked webhooks from partner services (such as Min.io, AWS S3 or self-invoked). The default Archipelago SBR post-processor configurations include operations that: + - perform page-based HOCR/OCR for image and pdf-based ADOs, send the output to the Search API, and use Natural Language Processing to extract entities from the output - extract text from pages within a Webarchives File and send the output to the Search API - convert WARC format Webarchives Files into WACZ format and attach the new WACZ file to the original source ADO to complement the WARC original