Skip to content
Change the repository type filter

All

    Repositories list

    • datalab-on-prem

      Public
      Scripts to run Datalab's self-service on-prem container
      Shell
      1400Updated Feb 12, 2026Feb 12, 2026
    • sdk

      Public
      Python
      78311Updated Feb 10, 2026Feb 10, 2026
    • marker

      Public
      Convert PDF to markdown + JSON quickly with high accuracy
      Python
      2.2k32k32253Updated Feb 9, 2026Feb 9, 2026
    • pykatex

      Public
      Python
      0200Updated Feb 5, 2026Feb 5, 2026
    • surya

      Public
      OCR, layout analysis, reading order, table recognition in 90+ languages
      Python
      1.3k19k13513Updated Feb 4, 2026Feb 4, 2026
    • chandra

      Public
      OCR model that handles complex tables, forms, handwriting with full layout.
      Python
      5434.8k225Updated Jan 13, 2026Jan 13, 2026
    • oss_container

      Public
      Python
      1100Updated Oct 2, 2025Oct 2, 2025
    • Python
      1301Updated Aug 13, 2025Aug 13, 2025
    • docext

      Public
      An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)
      Python
      4900Updated Jun 18, 2025Jun 18, 2025
    • pdftext

      Public
      Extract structured text from pdfs quickly
      Python
      62661126Updated Jun 11, 2025Jun 11, 2025