Skip to content

bulk_* ignores errors and can cause missing data #346

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Zaczero opened this issue Mar 30, 2025 · 1 comment
Open

bulk_* ignores errors and can cause missing data #346

Zaczero opened this issue Mar 30, 2025 · 1 comment

Comments

@Zaczero
Copy link
Contributor

Zaczero commented Mar 30, 2025

async def bulk_async(
self, collection_id: str, processed_items: List[Item], refresh: bool = False
) -> None:
"""Perform a bulk insert of items into the database asynchronously.
Args:
self: The instance of the object calling this function.
collection_id (str): The ID of the collection to which the items belong.
processed_items (List[Item]): A list of `Item` objects to be inserted into the database.
refresh (bool): Whether to refresh the index after the bulk insert (default: False).
Notes:
This function performs a bulk insert of `processed_items` into the database using the specified `collection_id`. The
insert is performed asynchronously, and the event loop is used to run the operation in a separate executor. The
`mk_actions` function is called to generate a list of actions for the bulk insert. If `refresh` is set to True, the
index is refreshed after the bulk insert. The function does not return any value.
"""
await helpers.async_bulk(
self.client,
mk_actions(collection_id, processed_items),
refresh=refresh,
raise_on_error=False,
)
def bulk_sync(
self, collection_id: str, processed_items: List[Item], refresh: bool = False
) -> None:
"""Perform a bulk insert of items into the database synchronously.
Args:
self: The instance of the object calling this function.
collection_id (str): The ID of the collection to which the items belong.
processed_items (List[Item]): A list of `Item` objects to be inserted into the database.
refresh (bool): Whether to refresh the index after the bulk insert (default: False).
Notes:
This function performs a bulk insert of `processed_items` into the database using the specified `collection_id`. The
insert is performed synchronously and blocking, meaning that the function does not return until the insert has
completed. The `mk_actions` function is called to generate a list of actions for the bulk insert. If `refresh` is set to
True, the index is refreshed after the bulk insert. The function does not return any value.
"""
helpers.bulk(
self.sync_client,
mk_actions(collection_id, processed_items),
refresh=refresh,
raise_on_error=False,
)

The bulk methods set raise_on_error=False but then ignore the returned errors, making these methods unsafe to use in production environment. If you don't want to deal with the errors, let them raise so it's possible to handle it in the client code.

@Zaczero Zaczero changed the title bulk_async ignores errors and can cause missing data bulk_* ignores errors and can cause missing data Mar 30, 2025
@jonhealy1
Copy link
Collaborator

@Zaczero Good point. Would you like to add this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants