Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cell-level doublet status for Baysor segmentation #40

Open
j-bac opened this issue Jan 23, 2025 · 2 comments
Open

cell-level doublet status for Baysor segmentation #40

j-bac opened this issue Jan 23, 2025 · 2 comments

Comments

@j-bac
Copy link

j-bac commented Jan 23, 2025

The current tutorial for Xenium seems to return "doublets" as x,y pixel locations independent of provided cell ids. How were the VSI scores computed for Baysor in the paper ? Any cell containing at least one "pixel doublet" is considered a doublet ?

Many thanks

@sebastiantiesmeyer
Copy link
Collaborator

Thank your comment! When known segments or cell locations are available, the most straightforward approach is to sample the signal integrity map at these specific locations and test whether the integrity value falls below a designated "doublet" threshold. This can often be done without the need to explicitly generate a doublet_df.

In the Baysor data provided to us, each cell had been assigned a centroid location, which was used to sample the signal integrity map to determine a cell-wise integrity score (point of the analysis was to recover large-scale signal artefacts from tissue folds after all).
For the rest, determining a mean integrity value over the cell area and discarding cells below a given threshold of 0.5 has been promising in our latest analyses. Another approach is to consider cleaning your raw data based on the integrity map, e.g. by doing

integrity_filtering_threshold = 0.5
coordinate_df_cleaned = coordinate_df[signal_integrity[coordinate_df.x_pixel,coordinate_df.y_pixel]>integrity_filtering_threshold]

which removes all transcripts at very low-integrity pixels.
I think the proper approach very much depends on the nature of your data set, your effect sizes and the amount of noise/artefacts that's acceptable for your analysis. After all, if you discover large regions of low vertical signal integrity in your data but still have a fairly dense signal, it might help to perform analysis on individual Z-stack levels or virtual horizontal subslices - which can reduce the need for data exclusion but may result in lower signal density and potentially less statistical power for each analysis.
I hope this helps & good luck with your project!

@j-bac
Copy link
Author

j-bac commented Jan 27, 2025

Thanks a lot, that's very helpful !

If I understand correctly, in the paper you only had baysor centroid locations and thus each cell's integrity is based on this one pixel - but in general you'd recommend taking the mean integrity over each cell's area as a doublet score ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants