Image based QC prior to aggregate_profiles #215

kvshams · 2022-07-20T20:06:38Z

Are there any QC procedure that could be done prior to aggregating well.

In my case any images that have less cell dense region would create an artifact as the cells become larger and larger. I want to avoid those images from the aggregation steps.

Or is there any way to get the entire db to be covert to one data frame including all features and metadata?. This would be more usable for the QC and exclude identified outliers from db and perform the downstream aggregation and analysis?.

gwaybio · 2022-07-20T20:48:10Z

Pycytominer doesn't perform any QC at the moment.

You might consider looking into bioprofiling.jl. IIRC they have some QC ability.

You can also look into this paper, which proposes some QC ideas (not yet implemented in pycytominer, see rohban-lab/Image-based-cell-profiling-enhancement-via-data-cleaning-methods#1), including one which may be helpful for adjusting for cell density.

Pycytominer does have functionality to acquire full db (SQLite) here:

pycytominer/pycytominer/cyto_utils/cells.py

Line 25 in b4d32d3

class SingleCells(object):

kvshams · 2022-07-21T18:16:28Z

@gwaybio Thank you for point out to the insightful method. It is a naive request. How to create the data frame of single cell df after loading the db.
sc = SingleCells('sqlite:///Data/database.sqlite') # this is by default get the strata=['Metadata_Plate', 'Metadata_Well']
ie, How I can create a data-frame contains raw single cell level data for the qc, from sqlite output created by the ingest (used ingest function to combine parallel processed data)

gwaybio · 2022-07-21T23:34:47Z

We have a function inside the SingleCells class to merge single cells. See https://pycytominer.readthedocs.io/en/latest/pycytominer.cyto_utils.html#pycytominer.cyto_utils.cells.SingleCells.merge_single_cells

However, we recently recognized some memory issues in this function (see #195), which we're working to solve by moving away from SQLite to parquet (#213).

We'd welcome any insights and experience you have with this method

gwaybio · 2022-08-17T16:52:11Z

@kvshams - I wanted to provide an update that the merge_single_cells() functionality is now working well. It now takes 15 minutes to merge whereas previously it was taking several hours.

This might help you to design methods for image QC prior to aggregating. Thanks!

bunnech mentioned this issue Aug 16, 2022

Fix of .merge_single_cells() to Load Single-Cell Data into Dataframes #219

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image based QC prior to aggregate_profiles #215

Image based QC prior to aggregate_profiles #215

kvshams commented Jul 20, 2022

gwaybio commented Jul 20, 2022

kvshams commented Jul 21, 2022

gwaybio commented Jul 21, 2022

gwaybio commented Aug 17, 2022 •

edited

Loading

Image based QC prior to aggregate_profiles #215

Image based QC prior to aggregate_profiles #215

Comments

kvshams commented Jul 20, 2022

gwaybio commented Jul 20, 2022

kvshams commented Jul 21, 2022

gwaybio commented Jul 21, 2022

gwaybio commented Aug 17, 2022 • edited Loading

gwaybio commented Aug 17, 2022 •

edited

Loading