Skip to content

Use array "current domain" for vector dimension #564

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mlin opened this issue Jan 28, 2025 · 1 comment · May be fixed by #565
Open

Use array "current domain" for vector dimension #564

mlin opened this issue Jan 28, 2025 · 1 comment · May be fixed by #565
Assignees

Comments

@mlin
Copy link

mlin commented Jan 28, 2025

On ingestion, the vector dimension is read from the array's dimension 1 domain:

if source_type == "TILEDB_SPARSE_ARRAY":
schema = tiledb.ArraySchema.load(source_uri)
size = np.int64(schema.domain.dim(0).domain[1]) + 1
dimensions = np.int64(schema.domain.dim(1).domain[1]) + 1
return size, dimensions, schema.attr(0).dtype

But the "current domain" should likely be used instead, if set (new feature).

Alternatively: the ingest method could take an optional dimensions arg to use instead of of the one detected by read_source_metadata, complementing the existing size arg.


[sc-62701]

@mlin
Copy link
Author

mlin commented Jan 28, 2025

Context: the "new shape feature" in tiledbsoma 1.15 has the dimension domains set to ~2^63 and uses the "current domain" for the desired shape. So TileDB-Vector-Search reads dimensions as ~2^63 which causes downstream problems (for good reason!).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants