Is your feature request related to a problem? Please describe.
I have put R2R behind Nextcloud as the RAG engine ( https://github.com/ga-it/context_chat_backend )
This results in massive indexing runs (100s of thousands of documents - backlog will take months to ingest)
This locks up the completions endpoints based on the concurrency settings
Describe the solution you'd like
Split out the completions (concurrency) configurations per LLM definition (i.e. fast_llm, quality_llm, etc) and task (i.e. database.graph_creation_settings, database.graph_entity_deduplication_settings, etc). Extend configuration settings for RAG queries
Describe alternatives you've considered
As far I can tell there is no way to manage this at present. Concurrency settings choke queries too