You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm loading in large h5ad files via scanpy.read_h5ad and then creating/appending via tiledbsoma.io.from_anndata. The relevant code:
defappend_to_database(db_uri: str, adata: sc.AnnData) ->None:
""" Append an AnnData object to the TileDB database. Args: db_uri: URI of the TileDB database adata: AnnData object to append """logging.info(" Appending data to TileDB...")
# Register AnnData objectsrd=tiledbsoma.io.register_anndatas(
db_uri,
[adata],
measurement_name="RNA",
obs_field_name="obs_id",
var_field_name="var_id",
)
# Apply resizewithtiledbsoma.Experiment.open(db_uri) asexp:
tiledbsoma.io.resize_experiment(
exp.uri,
nobs=rd.get_obs_shape(),
nvars=rd.get_var_shapes()
)
# Ingest new data into the dbtiledbsoma.io.from_anndata(
db_uri,
adata,
measurement_name="RNA",
registration_mapping=rd,
)
defcreate_tiledb(db_uri: str, adata: sc.AnnData) ->None:
""" Create a new tiledb database. Args: db_uri: URI of the TileDB database adata: AnnData object to append """logging.info(f" Creating new database...")
tiledbsoma.io.from_anndata(
db_uri,
adata,
measurement_name="RNA",
)
defload_tiledb(h5ad_files: List[str], db_uri: str, batch_size: int=8) ->None:
forinfileinh5ad_files:
logging.info(f"Processing {infile}...")
# load anndata objectadata=sc.read_h5ad(infile)
# add to databaseifnotos.path.exists(db_uri):
create_tiledb(db_uri, adata)
else:
append_to_database(db_uri, adata)
# clear memorydeladatagc.collect()
The error that occurs on the append:
[2025-02-03 19:29:47.413] [tiledbsoma] [Process: 848304] [Thread: 848304] [warning] [TileDB-SOMA::ManagedQuery] [unnamed] Invalid column selected: obs_id
Traceback (most recent call last):
File "/home/nickyoungblut/dev/nextflow/scRecounter/./scripts/tiledb-loader-tahoe.py", line 190, in <module>
#load_tiledb(h5ad_files, args.db_uri, batch_size=args.threads)
^^^^^^
File "/home/nickyoungblut/dev/nextflow/scRecounter/./scripts/tiledb-loader-tahoe.py", line 184, in main
#print(h5ad_files); exit();
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nickyoungblut/dev/nextflow/scRecounter/./scripts/tiledb-loader-tahoe.py", line 160, in load_tiledb
append_to_database(db_uri, adata)
File "/home/nickyoungblut/dev/nextflow/scRecounter/./scripts/tiledb-loader-tahoe.py", line 113, in append_to_database
rd = tiledbsoma.io.register_anndatas(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nickyoungblut/miniforge3/envs/tiledb/lib/python3.12/site-packages/tiledbsoma/io/ingest.py", line 225, in register_anndatas
return ExperimentAmbientLabelMapping.from_anndata_appends_on_experiment(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nickyoungblut/miniforge3/envs/tiledb/lib/python3.12/site-packages/tiledbsoma/io/_registration/ambient_label_mappings.py", line 419, in from_anndata_appends_on_experiment
registration_data = cls._acquire_experiment_mappings(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nickyoungblut/miniforge3/envs/tiledb/lib/python3.12/site-packages/tiledbsoma/io/_registration/ambient_label_mappings.py", line 376, in _acquire_experiment_mappings
registration_data = cls.from_isolated_soma_experiment(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nickyoungblut/miniforge3/envs/tiledb/lib/python3.12/site-packages/tiledbsoma/io/_registration/ambient_label_mappings.py", line 242, in from_isolated_soma_experiment
obs_ids = [e.as_py() for e in batch[1]]
~~~~~^^^
File "pyarrow/table.pxi", line 1693, in pyarrow.lib._Tabular.__getitem__
File "pyarrow/table.pxi", line 1779, in pyarrow.lib._Tabular.column
File "pyarrow/table.pxi", line 5175, in pyarrow.lib.Table._column
File "pyarrow/array.pxi", line 598, in pyarrow.lib._normalize_index
IndexError: index out of bounds
The error appears to be due to a lack of obs_id and/or var_id columns do not exist in these h5ad files. However, when I added:
... I just get a seg-fault during the first append (2nd h5ad file) after the initial creation of the database from the first h5ad file. I'm using 512 GB of mem, so a lack of memory should not be the issue.
To Reproduce
Provide a code example and any sample input data (e.g. an H5AD) as an attachment to reproduce this behavior.
Versions (please complete the following information):
TileDB-SOMA version: 1.15.4
Language and language version (e.g. Python 3.9, R 4.3.2): Python 3.12.8
OS (e.g. MacOS, Ubuntu Linux): Linux
Note: you can use tiledbsoma.show_package_versions() (Python) or tiledbsoma::show_package_versions() (R)
The text was updated successfully, but these errors were encountered:
Describe the bug
I'm loading in large h5ad files via
scanpy.read_h5ad
and then creating/appending viatiledbsoma.io.from_anndata
. The relevant code:The error that occurs on the append:
The error appears to be due to a lack of
obs_id
and/orvar_id
columns do not exist in these h5ad files. However, when I added:... I just get a seg-fault during the first append (2nd h5ad file) after the initial creation of the database from the first h5ad file. I'm using 512 GB of mem, so a lack of memory should not be the issue.
To Reproduce
Provide a code example and any sample input data (e.g. an H5AD) as an attachment to reproduce this behavior.
Versions (please complete the following information):
tiledbsoma.show_package_versions()
(Python) ortiledbsoma::show_package_versions()
(R)The text was updated successfully, but these errors were encountered: