Fix multiple issues (limit inconsistency, invisible documents #2260

rexjohannes · 2025-09-17T22:58:03Z

Fixes #2238 (limit inconsistency) and #2222 (invisible documents) + "Documents disappear on delete but after a delay appear again"

This pull request primarily increases the maximum limit parameter for various API endpoints from 100 to 1000, allowing clients to request larger result sets. It also improves how collection IDs are handled during document ingestion and updates the database transaction isolation for document upserts to increase reliability. Below are the most important changes grouped by theme:

API Parameter Updates

Increased the maximum allowed value for the limit parameter from 100 to 1000 across multiple endpoints in documents_router.py, collections_router.py, users_router.py, and chunks_router.py, as well as in the corresponding OpenAPI documentation (llms.txt). This change enables clients to retrieve up to 1000 objects per request instead of 100. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]

Collection ID Handling in Document Ingestion

Updated ingestion workflows and services to consistently use document_info.collection_ids for assigning and propagating collection IDs, ensuring documents and chunks are correctly associated with collections. This includes changes in ingestion_service.py, documents_router.py, and orchestration workflows. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]

Database Reliability Improvements

Changed transaction isolation level to serializable in upsert_documents_overview to reduce race conditions and added handling for SerializationFailureError to improve retry logic during concurrent document upserts. [1] [2]

Document Status Update Safeguards

Added a check in ingestion_service.py to ensure a document still exists before updating its status, preventing accidental recreation of deleted documents during ingestion.

Minor Query Construction Fix

Improved query construction in get_documents_overview to ensure conditions are properly combined.

Important

Increased API limit parameter to 1000, improved collection ID handling, and enhanced database reliability and document status updates.

API Parameter Updates
- Increased limit parameter from 100 to 1000 in documents_router.py, collections_router.py, and users_router.py to allow larger result sets.
Collection ID Handling in Document Ingestion
- Updated ingestion_service.py and documents_router.py to use document_info.collection_ids for consistent collection ID assignment.
Database Reliability Improvements
- Changed transaction isolation to serializable in upsert_documents_overview to reduce race conditions.
- Added SerializationFailureError handling for retries during document upserts.
Document Status Update Safeguards
- Added checks in ingestion_service.py to ensure document existence before status updates, preventing accidental recreation of deleted documents.

^{This description was created by}^{for 3f97b08. You can customize this summary. It will automatically update as commits are pushed.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed everything up to 3f97b08 in 1 minute and 20 seconds. Click for details.

Reviewed 325 lines of code in 9 files
Skipped 0 files when reviewing.
Skipped posting 4 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. py/core/providers/database/documents.py:387

Draft comment:
The _get_ids_from_table query uses 'AND $2 = ANY(collection_ids)' without checking if collection_id is provided. If collection_id is None, this condition may behave unexpectedly. Consider adding explicit logic to handle a missing collection constraint.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

2. py/core/providers/database/documents.py:327

Draft comment:
In upsert_documents_overview, exponential backoff is used (wait_time = 0.1 * (2**retries)). Consider adding jitter to this formula to avoid thundering herd issues when many concurrent updates occur.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

3. py/core/providers/database/documents.py:667

Draft comment:
Parsing 'summary_embedding' by slicing the string (using [1:-1] and splitting by commas) can be brittle. Consider storing embeddings in a structured format (e.g., as JSON) so you can reliably convert them without manual string manipulation.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

4. py/core/providers/database/documents.py:775

Draft comment:
In semantic_document_search, the SQL query dynamically builds parameter placeholders using the length of the params list (e.g., LIMIT ${len(params) + 1}). This approach can be error-prone. Consider using a query builder or a more explicit parameter numbering strategy.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_ad1NTX5viMYMheWa

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

rexjohannes added 3 commits September 17, 2025 22:20

fix multiple issues

2731e4e

revert counter changes

80e5153

revert document list changes

3f97b08

ellipsis-dev bot reviewed Sep 17, 2025

View reviewed changes

rexjohannes mentioned this pull request Sep 19, 2025

Fix #2257 #2259

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix multiple issues (limit inconsistency, invisible documents #2260

Fix multiple issues (limit inconsistency, invisible documents #2260

Uh oh!

rexjohannes commented Sep 17, 2025 •

edited by ellipsis-dev bot

Loading

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix multiple issues (limit inconsistency, invisible documents #2260

Are you sure you want to change the base?

Fix multiple issues (limit inconsistency, invisible documents #2260

Uh oh!

Conversation

rexjohannes commented Sep 17, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rexjohannes commented Sep 17, 2025 •

edited by ellipsis-dev bot

Loading