Skip to content

Feature Request: Check / improve integration as backend for Nextcloud Context Chat Backend #2255

@ga-it

Description

@ga-it

Feature Request: Check / Improve Integration As Backend for Nextcloud Context Chat Backend

Summary

Please help validate and (where useful) refine R2R’s use as a pluggable retrieval backend for the Nextcloud Context Chat Backend (CCBE). The goal is smooth, upstream‑friendly interoperability: stable request/response shapes, predictable ingestion lifecycle, robust deletion semantics (including optional blob cleanup), and clear guidance for auth, collections/permissions, and performance tuning.

I’ve piloted R2R as the external backend via a thin adapter in CCBE (default remains builtin). This issue summarizes what works well and proposes a few focused enhancements and docs to make R2R a first‑class backend for CCBE deployments.

Links and Context

Current Integration Status (pilot)

  • CCBE → R2R integration works for ingest (file/text/chunks), list/retrieve, chunks listing, RAG search, and document delete.
  • S3 file store and Postgres/pgvector work well operationally; Hatchet orchestration enables scalable ingestion (also tested the “simple” orchestration path).
  • Collections map cleanly to Nextcloud access scopes (owner and shared collections).

What Works Well

  • Clear v3 API: /v3/documents (create/list/retrieve/chunks/delete), /v3/retrieval (search/rag), /v3/graphs/* (optional Graph‑RAG).
  • Ingestion status surfaced via documents.retrieve (PENDING → PARSING → EMBEDDING → STORING → SUCCESS/FAILED).
  • S3 and Postgres providers are easy to configure; Hatchet‑based concurrency is effective at scale.
  • Metadata filters are powerful (metadata.* with $eq/$ilike/$in/...) and fit CCBE’s needs.

Requested Improvements (to make R2R “drop‑in excellent” for CCBE)

  1. Ingestion cancellation API
  • Need: Cancel/abort long‑running ingestions by document_id, and ideally in bulk by filter (e.g., metadata path prefix).
  • Proposal:
    • POST /v3/documents/{id}/cancel-ingestion (idempotent; marks FAILED, stops Hatchet run if present).
    • POST /v3/documents/cancel-by-filter with the same filter syntax as delete‑by‑filter, applying cancellation to matching pending/running docs.
  • Rationale: CCBE admins often need to stop accidental/bad batch ingests (e.g., a Nextcloud folder that shouldn’t be imported).
  1. Delete cascade should optionally include file blobs (S3/Postgres)
  • Observed: delete_documents_and_chunks_by_filter() removes DB rows and graph data, but file blobs in S3/Postgres storage aren’t deleted.
  • Proposal:
    • Add an optional cascade setting (e.g., include_files=true or server config) to call providers.file.delete_file(doc_id) for each deleted document.
    • Document default behavior and risks (some deployments may prefer to retain original files).
  • Rationale: Keeps storage consistent, simplifies cleanup.
  1. Response shape parity across orchestration modes
  • Ask: Ensure identical response shapes (fields) for ingestion operations whether using Hatchet or “simple” mode, especially { message, task_id, document_id } contract.
  • Rationale: CCBE can treat both modes uniformly (queue vs. inline).
  1. Auth guidance and headers
  • Ask: Document supported auth methods and headers (e.g., X-API-Key and/or Authorization: Bearer), plus recommended patterns for service‑to‑service integration.
  • Rationale: Clear guidance reduces misconfiguration.
  1. Collections/permissions helpers
  • Ask: Document or expose helper endpoints/workflows to:
    • Ensure default per‑user collection exists (idempotent).
    • Bulk assign/unassign documents to collections.
  • Rationale: CCBE maps Nextcloud users/groups to collections; helpers simplify setup.
  1. Conformance tests and examples
  • Proposal: A minimal “CCBE integration” conformance script (or Postman collection) that:
    • Ingests file/text/chunks, polls status, searches, lists chunks, deletes by id/filter, and (optionally) cancels ingestion.
    • Verifies shapes/status codes match docs for both Hatchet and “simple”.
  • Rationale: Locks in backend contract for external integrators.
  1. Operational docs (CCBE‑oriented)
  • Ask: A short “Using R2R behind CCBE” guide:
    • Required endpoints and typical headers.
    • Recommended config snippets (Postgres/pgvector, S3/MinIO, LiteLLM).
    • Tuning knobs for batch sizes, concurrency, index creation.
    • Deletion/cancellation behavior and caveats.

Thanks for R2R — it’s been a strong fit operationally and conceptually for this use‑case. Happy to adapt to preferred naming/structures and to split work into reviewable chunks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions