Skip to content
@DS4SD

IBM Deep Search

Developer tools for IBM Deep Search

Welcome to our OSS organization for document processing

The DS4SD organization is the home of the open-source projects of the AI for Knowledge group at IBM Research Europe - Zurich.

Docling

Docling is our main open-source package. It is a powerful library which simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem.

We support an amazing community which helps us driving forward the adoption of Docling. Give it a try and join the community!



The key repositories of Docling are:

  • docling - The home of the main docling package.
  • docling-core - The definition of types, transforms, serializers, etc. If it has to do with the DoclingDocument you will find it here.
  • docling-parse - The backend PDF parser used by Docling.
  • docling-serve - The FastAPI wrappers for running Docling as REST API and distribute large jobs.
  • docling-ibm-models - The AI models powering Docling.

Deep Search

Deep Search leverages the output of Docling to Interprete, Index and Integrate the knowledge encoded in your documents. It offers a seamless chat interface for interacting with its RAG backend and navigate your data collections.

Deep Search is a service and it provides a programmatic access, for easy integration with other tools or in order to do bulk conversion. Our python toolkit provides these functionalities both as a client and library. Our examples repository is very useful to get started.

PatCID

PatCID is a collection of chemical structures in patent documents to facilitate search of patent documents in the organic-chemistry domain. Programmatic access to PatCID can facilitate discovery of molecules. This collection was created by processing molecular-structure images in United States Patent and Trademark Office, Japan Patent Office, European Patent Office, Korean Intellectual Property Office, and China National Intellectual Property Administration patent documents.

The key repositories of the PatCID tools are:

  • PatCID - Examples and demostrators of PatCID.
  • MolGrapher - The graph-based visual recognition of chemical structures leveraged when building the PatCID database.
  • deepsearch-toolkit - The programmatic toolkit for interacting with the database and perform chemistry searches.

Publications

Find here our extensive list of publications!

IBM ❤️ Open Source AI

All our projects are brought to you by IBM.

Pinned Loading

  1. deepsearch-toolkit deepsearch-toolkit Public

    Interact with the Deep Search platform for new knowledge explorations and discoveries

    Python 176 25

  2. deepsearch-examples deepsearch-examples Public

    Examples using the Deep Search functionalities

    Python 68 21

  3. DocLayNet DocLayNet Public

    DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis

    324 17

Repositories

Showing 10 of 21 repositories
  • docling-eval Public
    DS4SD/docling-eval’s past year of commit activity
    Python 10 MIT 2 2 5 Updated Mar 13, 2025
  • MolGrapher Public

    MolGrapher: Graph-based Visual Recognition of Chemical Structures

    DS4SD/MolGrapher’s past year of commit activity
    Python 67 MIT 4 2 0 Updated Mar 13, 2025
  • .github Public
    DS4SD/.github’s past year of commit activity
    1 0 0 1 Updated Mar 10, 2025
  • DS4SD/docling-ibm-models’s past year of commit activity
    Python 82 MIT 12 11 3 Updated Mar 10, 2025
  • PatCID Public
    DS4SD/PatCID’s past year of commit activity
    Python 46 MIT 2 1 0 Updated Feb 26, 2025
  • DS4SD/DS4SD.github.io’s past year of commit activity
    CSS 10 MIT 1 0 0 Updated Feb 3, 2025
  • ragnardoc Public
    DS4SD/ragnardoc’s past year of commit activity
    Python 16 MIT 1 0 0 Updated Feb 1, 2025
  • deepsearch-examples Public

    Examples using the Deep Search functionalities

    DS4SD/deepsearch-examples’s past year of commit activity
    Python 68 MIT 21 0 4 Updated Jan 29, 2025
  • deepsearch-glm Public

    Create fast graph language models from converted PDF documents for knowledge extraction and Q&A.

    DS4SD/deepsearch-glm’s past year of commit activity
    C++ 48 MIT 8 2 2 Updated Jan 27, 2025
  • deepsearch-toolkit Public

    Interact with the Deep Search platform for new knowledge explorations and discoveries

    DS4SD/deepsearch-toolkit’s past year of commit activity
    Python 176 MIT 25 9 11 Updated Jan 24, 2025

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Python C++ CSS

Most used topics

Loading…