-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
passing your own cluster labels: VIA performance with different clustering methods. #29
Comments
hi Adi, thanks for bringing this up. Yes indeed for the paper we tested using separate cluster labels and this works fine - though it's usually nice to have a fairly granular (not too coarse) clustering. In the current pip version of via we havent yet allowed for different clustering but I can very easily fix that for you if you give me a day to just make sure that it runs without any glitches. We would effectively just bypass the PARC clustering stage and use your own clusters. Alternatively, while you wait for me to work on this, you can pass your own cluster labels in the true_label parameter and then let via do its inbuilt PARC clustering Shobi |
Looking forward to this feature! |
Hi Shobi, Yes sure, let me know when you fix it, would be a great functionality to have. As a suggestion - one of the enhancements would be to make both PARC and VIA single -cell agnostic. To give you context, I also work in patient electronic health records area, where one analyses baseline patient characteristics (like a snapshot of single cell readouts) as well as longitudinal data. There aren't many methods that allow this, except for ClinTrajan (ref - https://academic.oup.com/gigascience/article/9/11/giaa128/6006352). I think making this agnostic of single-cells would be pretty great. I have tried running PARC on patient data PCs and it runs well. You could test functionality using the two open datasets in the ClinTrajan paper above and see how your methods do. I am happy to jump on a call to discuss this fyurther if you wish. It could result in a pretty nice publication as well. I work in Novartis and am reachable at [email protected]. Hope this helps, |
hi @barveaditya Adi, Thank you for sharing the paper - Let me have a read through and yes of course happy to discuss further! What i mean by passing your precomputed clustering into the "true_label" parameter was so that in the plots of the viagraph/milestone etc you will be able to compare the composition of your clusters for each of the via clusters in the clutsergraph plot. Like you said, for exploratory data, the true-labels are often just a "best guess annotation" based on DEGs of a certain clustering output to provide some indication of the cell types in the dataset. Shobi |
Hi @ShobiStassen , TypeError Traceback (most recent call last) ~/programs/miniconda3/envs/py37/lib/python3.7/site-packages/pyVIA/core.py in run_VIA(self) ~/programs/miniconda3/envs/py37/lib/python3.7/site-packages/pyVIA/core.py in run_subPARC(self) ~/programs/miniconda3/envs/py37/lib/python3.7/site-packages/pyVIA/core.py in project_branch_probability_sc(self, bp_array_clus, pt) TypeError: only integer scalar arrays can be converted to a scalar index |
Does this link help: |
This might be caused by the problem of indexing a list since indexing is not allowed on list. You can convert the list to a numpy array and pass the array to labels. |
Converting the list to a numpy array solved the problem. I tried using a set of kmeans labels, this time VIA run successfully, but when I used other labels, VIA still failed. I think VIA needs some pre-requisites on the labels. 2022-12-21 13:56:11.494170 Running VIA over input data of 564 (samples) x 30 (features) |
I have an anndata object, then I wanna subset some clusters to do trajectories inference. |
Hi Shobi,
Excellent work here!! I have a question, rather than an issue. I want to pass cluster labels from a separate clustering, how do I do that. This is also referenced in the article supplementary material - Supplementary Note 6: VIA performance with different clustering methods.
I am using for exploratory purposes with entirely new kind of cell types, so I do not know much about the population. I would like to understand that first and then pass it on to VIA. Do you have it referenced anywhere or maybe an example?
Thank you
Adi
The text was updated successfully, but these errors were encountered: