-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add uncertainty score in KNN label_transfer in PerturbationSpace #658
Conversation
Certainty is quantified as the fraction of nearest neighbors belonging to the classified (i.e. the most abundant) label compared to the total number of nearest neighbors.
@Zethson Norman et al. analysis has to be updated with this. I can do that if needed, just LMK. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thank you!
I was wondering whether it's more useful to model a certainty score or an uncertainty score. Don't people usually look at the latter? They are of course the same but maybe a 1- is the way to go?
Also, I think that this is Occam's razor. But I think we can make a better case if we implement something slightly more sophisticated that actually isn't much more work. Issues are:
- Dependence on n_neighbors: The certainty measure is highly dependent on the choice of n_neighbors. A small number might lead to overfitting, while a large number might smooth out important local structures.
- Lack of distance weighting: All neighbors are treated equally, regardless of their distance in the embedding space. This might not accurately represent the data structure, especially in areas of varying density.
- Binary treatment of labels: The method doesn't consider similarities between different labels. For example, if there are labels A, B, and C, and B is more similar to A than C is, this information is not used.
There's more issues but these we can't easily tackle. I'm copy and pasting a solution by Claude here. Not because we should copy and paste this but so that you see how we can potentially do that without a lot of work.
import numpy as np
import pandas as pd
from pynndescent import NNDescent
from sklearn.neighbors import KernelDensity
from scipy.spatial.distance import pdist, squareform
def improved_label_transfer(
adata,
column="perturbation",
column_certainty_score="perturbation_transfer_certainty",
target_val="unknown",
max_neighbors=50,
use_rep="X_umap",
label_similarity_matrix=None
):
"""Improved imputation of missing values using adaptive KNN and weighted voting.
Args:
adata: AnnData object containing single-cell data.
column: Column name in AnnData object to perform imputation on.
column_certainty_score: Column name to store the certainty score.
target_val: The target value to impute.
max_neighbors: Maximum number of neighbors to consider.
use_rep: Key in adata.obsm where the embedding is stored.
label_similarity_matrix: Dictionary of label similarities. If None, treats all labels as equally dissimilar.
"""
if use_rep not in adata.obsm:
raise ValueError(f"Representation {use_rep} not found in the AnnData object.")
embedding = adata.obsm[use_rep]
# Estimate local density
kde = KernelDensity(kernel='gaussian', bandwidth=0.5).fit(embedding)
local_density = np.exp(kde.score_samples(embedding))
# Adaptive neighbor selection
adaptive_n_neighbors = np.clip(np.int32(max_neighbors * (1 - local_density)), 5, max_neighbors)
# Find neighbors
nnd = NNDescent(embedding, n_neighbors=max_neighbors)
indices, distances = nnd.query(embedding, k=max_neighbors)
perturbations = np.array(adata.obs[column])
certainty = np.ones(adata.n_obs)
missing_mask = perturbations == target_val
# Create label similarity matrix if not provided
if label_similarity_matrix is None:
unique_labels = np.unique(perturbations[~missing_mask])
label_similarity_matrix = {l1: {l2: 1.0 if l1 == l2 else 0.0 for l2 in unique_labels} for l1 in unique_labels}
for idx in np.where(missing_mask)[0]:
n_neighbors = adaptive_n_neighbors[idx]
neighbor_indices = indices[idx][:n_neighbors]
neighbor_distances = distances[idx][:n_neighbors]
neighbor_categories = perturbations[neighbor_indices]
# Distance-weighted voting
weights = 1 / (neighbor_distances + 1e-8) # Add small epsilon to avoid division by zero
# Consider label similarities
vote_dict = {}
for cat, weight in zip(neighbor_categories, weights):
for possible_cat in label_similarity_matrix[cat]:
similarity = label_similarity_matrix[cat][possible_cat]
vote_dict[possible_cat] = vote_dict.get(possible_cat, 0) + weight * similarity
most_common = max(vote_dict, key=vote_dict.get)
perturbations[idx] = most_common
# Calculate certainty based on weighted votes
total_votes = sum(vote_dict.values())
certainty[idx] = vote_dict[most_common] / total_votes if total_votes > 0 else 0
adata.obs[column] = perturbations
adata.obs[column_certainty_score] = certainty
# Example usage:
# improved_label_transfer(adata, label_similarity_matrix={'A': {'A': 1.0, 'B': 0.5, 'C': 0.1}, 'B': {'A': 0.5, 'B': 1.0, 'C': 0.3}, 'C': {'A': 0.1, 'B': 0.3, 'C': 1.0}})
What do you think? We can also keep your simple implementation as a flavor but I think we should do this in a better way to argue better in the manuscript
The test failed by the way |
Great suggestions @Zethson. I tried to keep the (un)certainty measure as simple as the label transfer method (which is too simplistic). Let's improve the label transfer itself then first? My senf:
|
…lab/pertpy into feature/label_transfer_uncertainty
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Yes! +1 |
Replaces yanked dependency of mypy "types-pkg-resources" with "types-setuptools" as recommended: https://pypi.org/project/types-pkg-resources/
…se/pertpy into feature/label_transfer_uncertainty
Key changes: - Now uses KNN graph in adata: saves cost and increases consistency - Vectorized operations instead of expensive for loop - Distance weighting for KNN imputation - Quantifies uncertainty as local KNN label entropy
I reworked the function, it's now much better I think. Key changes:
Also, I changed the example from random labels to louvain with randomly dropped labels. And somewhat hidden in the commits: I removed a faulty dependency in pre-commit config that was preventing me from using it. See commit message: "Replaces yanked dependency of mypy "types-pkg-resources" with "types-setuptools" as recommended: https://pypi.org/project/types-pkg-resources/". |
Thanks! I'll have a look soon. But always use Leiden instead of Louvain, please :) |
I'm using the louvain clusters because they came with the preprocessed dataset. Otherwise I'd 100% agree with always using leiden! |
Uncertainty can go above 1, as entropy is not bounded by 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much! This is much better.
- The test still fails. The only source of truth is the CI and "it works on my machine" is sadly not good enough ^_^
FAILED tests/tools/_perturbation_space/test_simple_perturbation_space.py::test_label_transfer - assert False
+ where False = all(0 0.000000\n1 0.000000\n2 0.000000\n3 0.000000\n4 0.000000\n ... \n64 0.000000\n65 0.000000\n66 0.812785\n67 0.933854\n68 0.981642\nName: perturbation_transfer_uncertainty, Length: 69, dtype: float64 == 0)
- Could you please always resolve my comments? Then I know what you've already done and what not.
- I stopped commenting on those at some point but you have many lines of code where you added a single comment above outlining what the following line of code does. This is not soo useful and noisy. Either the line of code is well written with clear variable names and it's obvious what it does or it needs to be improved. Always document WHY you're doing certain things if it matters but don't duplicate the "what"
For real this time :(
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #658 +/- ##
==========================================
+ Coverage 65.55% 65.56% +0.01%
==========================================
Files 47 47
Lines 6105 6105
==========================================
+ Hits 4002 4003 +1
+ Misses 2103 2102 -1
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! I think that the function description might benefit from a 2 sentence description on how the uncertainty is calculated.
But this is ready to go now!
Checks if error is correctly raised when no KNN is present in adata.
* Set legend anchor as parameter (#660) * Fix missing space * Remove explicit anndata in dependencies (#666) * Incorporate use case tutorials (#665) * Fixed DEG layer retrieval * Use-case tutorial icons * Restructure tutorial page * Subgroup tutorials * Improve KNN label_transfer in PerturbationSpace (#658) * Add uncertainty score in KNN label_transfer in PerturbationSpace Certainty is quantified as the fraction of nearest neighbors belonging to the classified (i.e. the most abundant) label compared to the total number of nearest neighbors. * Update pre-commit-config.yaml Replaces yanked dependency of mypy "types-pkg-resources" with "types-setuptools" as recommended: https://pypi.org/project/types-pkg-resources/ * Improve label imputation in PerturbationSpace class Key changes: - Now uses KNN graph in adata: saves cost and increases consistency - Vectorized operations instead of expensive for loop - Distance weighting for KNN imputation - Quantifies uncertainty as local KNN label entropy * Fixed plotting for mixscape.plot_barplot and sccoda.plot_effects_barplot (#667) * Augur scsim warnings (#670) * Augur scsim warnings Signed-off-by: zethson <[email protected]> * Submodules Signed-off-by: zethson <[email protected]> --------- Signed-off-by: zethson <[email protected]> * Add PerturbationDataValidator (#672) * Augur scsim warnings Signed-off-by: zethson <[email protected]> * Submodules Signed-off-by: zethson <[email protected]> * Add super draft of pertpy validator Signed-off-by: zethson <[email protected]> * Polish Signed-off-by: zethson <[email protected]> * Polish Signed-off-by: zethson <[email protected]> * Nested try Signed-off-by: zethson <[email protected]> * validator in test Signed-off-by: zethson <[email protected]> * try uv for rtd Signed-off-by: zethson <[email protected]> * rtd uv Signed-off-by: zethson <[email protected]> * rtd uv Signed-off-by: zethson <[email protected]> * rtd uv fix Signed-off-by: zethson <[email protected]> * mb sphinx fix for validator Signed-off-by: zethson <[email protected]> * docs Signed-off-by: zethson <[email protected]> * remove PerturbationValidator from docs Signed-off-by: zethson <[email protected]> * remove PerturbationValidator from docs Signed-off-by: zethson <[email protected]> --------- Signed-off-by: zethson <[email protected]> * Latest OS for RTD * Remove curator again Signed-off-by: zethson <[email protected]> * Fix jax random array (#686) * Fix jax random array Signed-off-by: zethson <[email protected]> * Fix further jax warnings Signed-off-by: zethson <[email protected]> * Fix edger Signed-off-by: zethson <[email protected]> * Fix choice Signed-off-by: zethson <[email protected]> --------- Signed-off-by: zethson <[email protected]> * Switch to formulaic-contrasts (#682) * Switch to formulaic-contrasts * Cleanup * removing design matrix workaround (#691) Co-authored-by: Emma Dann <[email protected]> * Fix PyDESeq2 * Update tests * fix typo in gitignore * Remove contrast dataclass, which isnt used anywhere * Fix edgeR rpy2 tests (#692) * fix broken rpy2 edger tests * updated edger tests * Fix tests (scipy) Signed-off-by: zethson <[email protected]> * submodule Signed-off-by: zethson <[email protected]> * Remove unused code Signed-off-by: zethson <[email protected]> * type hints Signed-off-by: zethson <[email protected]> --------- Signed-off-by: zethson <[email protected]> Co-authored-by: Emma Dann <[email protected]> Co-authored-by: Emma Dann <[email protected]> Co-authored-by: zethson <[email protected]> * Release 0.9.5 Signed-off-by: zethson <[email protected]> * Prepare 0.10.0 Signed-off-by: zethson <[email protected]> * Added Mixscape seeds and test (#683) Co-authored-by: Lukas Heumos <[email protected]> * Fix probability data type (#696) Signed-off-by: Lukas Heumos <[email protected]> * Optimize MeanVarDistributionDistance (#697) * Fix probability data type Signed-off-by: Lukas Heumos <[email protected]> * Optimize mean_var distance Signed-off-by: Lukas Heumos <[email protected]> --------- Signed-off-by: Lukas Heumos <[email protected]> * Optimize test speed (#699) * Try buildjet Signed-off-by: Lukas Heumos <[email protected]> * Try buildjet large Signed-off-by: Lukas Heumos <[email protected]> * speed up predict_differential_prioritization Signed-off-by: Lukas Heumos <[email protected]> * speed up tests Signed-off-by: Lukas Heumos <[email protected]> --------- Signed-off-by: Lukas Heumos <[email protected]> * Lower bound for scikit-learn (#701) Signed-off-by: Lukas Heumos <[email protected]> * Fix type annotation Signed-off-by: Lukas Heumos <[email protected]> * Fix empty figure returns when show=True in plotting functions (#703) * Removed show parameter * Adapt plotting API for Augur, Coda, Dialogue * Adapted plotting API for Milo, Mixscape, scgen * Add joblib * Remove joblib --------- Co-authored-by: Lukas Heumos <[email protected]> * Fix scikit-learn intendation Signed-off-by: Lukas Heumos <[email protected]> --------- Signed-off-by: zethson <[email protected]> Signed-off-by: Lukas Heumos <[email protected]> Co-authored-by: Lilly May <[email protected]> Co-authored-by: Lukas Heumos <[email protected]> Co-authored-by: Gregor Sturm <[email protected]> Co-authored-by: Emma Dann <[email protected]> Co-authored-by: Emma Dann <[email protected]>
* Set legend anchor as parameter (#660) * Fix missing space * Remove explicit anndata in dependencies (#666) * Incorporate use case tutorials (#665) * Fixed DEG layer retrieval * Use-case tutorial icons * Restructure tutorial page * Subgroup tutorials * Improve KNN label_transfer in PerturbationSpace (#658) * Add uncertainty score in KNN label_transfer in PerturbationSpace Certainty is quantified as the fraction of nearest neighbors belonging to the classified (i.e. the most abundant) label compared to the total number of nearest neighbors. * Update pre-commit-config.yaml Replaces yanked dependency of mypy "types-pkg-resources" with "types-setuptools" as recommended: https://pypi.org/project/types-pkg-resources/ * Improve label imputation in PerturbationSpace class Key changes: - Now uses KNN graph in adata: saves cost and increases consistency - Vectorized operations instead of expensive for loop - Distance weighting for KNN imputation - Quantifies uncertainty as local KNN label entropy * Fixed plotting for mixscape.plot_barplot and sccoda.plot_effects_barplot (#667) * Augur scsim warnings (#670) * Augur scsim warnings Signed-off-by: zethson <[email protected]> * Submodules Signed-off-by: zethson <[email protected]> --------- Signed-off-by: zethson <[email protected]> * Add PerturbationDataValidator (#672) * Augur scsim warnings Signed-off-by: zethson <[email protected]> * Submodules Signed-off-by: zethson <[email protected]> * Add super draft of pertpy validator Signed-off-by: zethson <[email protected]> * Polish Signed-off-by: zethson <[email protected]> * Polish Signed-off-by: zethson <[email protected]> * Nested try Signed-off-by: zethson <[email protected]> * validator in test Signed-off-by: zethson <[email protected]> * try uv for rtd Signed-off-by: zethson <[email protected]> * rtd uv Signed-off-by: zethson <[email protected]> * rtd uv Signed-off-by: zethson <[email protected]> * rtd uv fix Signed-off-by: zethson <[email protected]> * mb sphinx fix for validator Signed-off-by: zethson <[email protected]> * docs Signed-off-by: zethson <[email protected]> * remove PerturbationValidator from docs Signed-off-by: zethson <[email protected]> * remove PerturbationValidator from docs Signed-off-by: zethson <[email protected]> --------- Signed-off-by: zethson <[email protected]> * Latest OS for RTD * Remove curator again Signed-off-by: zethson <[email protected]> * Fix jax random array (#686) * Fix jax random array Signed-off-by: zethson <[email protected]> * Fix further jax warnings Signed-off-by: zethson <[email protected]> * Fix edger Signed-off-by: zethson <[email protected]> * Fix choice Signed-off-by: zethson <[email protected]> --------- Signed-off-by: zethson <[email protected]> * Switch to formulaic-contrasts (#682) * Switch to formulaic-contrasts * Cleanup * removing design matrix workaround (#691) Co-authored-by: Emma Dann <[email protected]> * Fix PyDESeq2 * Update tests * fix typo in gitignore * Remove contrast dataclass, which isnt used anywhere * Fix edgeR rpy2 tests (#692) * fix broken rpy2 edger tests * updated edger tests * Fix tests (scipy) Signed-off-by: zethson <[email protected]> * submodule Signed-off-by: zethson <[email protected]> * Remove unused code Signed-off-by: zethson <[email protected]> * type hints Signed-off-by: zethson <[email protected]> --------- Signed-off-by: zethson <[email protected]> Co-authored-by: Emma Dann <[email protected]> Co-authored-by: Emma Dann <[email protected]> Co-authored-by: zethson <[email protected]> * Release 0.9.5 Signed-off-by: zethson <[email protected]> * Prepare 0.10.0 Signed-off-by: zethson <[email protected]> * Added Mixscape seeds and test (#683) Co-authored-by: Lukas Heumos <[email protected]> * Fix probability data type (#696) Signed-off-by: Lukas Heumos <[email protected]> * Optimize MeanVarDistributionDistance (#697) * Fix probability data type Signed-off-by: Lukas Heumos <[email protected]> * Optimize mean_var distance Signed-off-by: Lukas Heumos <[email protected]> --------- Signed-off-by: Lukas Heumos <[email protected]> * Optimize test speed (#699) * Try buildjet Signed-off-by: Lukas Heumos <[email protected]> * Try buildjet large Signed-off-by: Lukas Heumos <[email protected]> * speed up predict_differential_prioritization Signed-off-by: Lukas Heumos <[email protected]> * speed up tests Signed-off-by: Lukas Heumos <[email protected]> --------- Signed-off-by: Lukas Heumos <[email protected]> * Lower bound for scikit-learn (#701) Signed-off-by: Lukas Heumos <[email protected]> * Fix type annotation Signed-off-by: Lukas Heumos <[email protected]> * Fix empty figure returns when show=True in plotting functions (#703) * Removed show parameter * Adapt plotting API for Augur, Coda, Dialogue * Adapted plotting API for Milo, Mixscape, scgen * Add joblib * Remove joblib --------- Co-authored-by: Lukas Heumos <[email protected]> * Fix scikit-learn intendation Signed-off-by: Lukas Heumos <[email protected]> --------- Signed-off-by: zethson <[email protected]> Signed-off-by: Lukas Heumos <[email protected]> Co-authored-by: Lilly May <[email protected]> Co-authored-by: Lukas Heumos <[email protected]> Co-authored-by: Gregor Sturm <[email protected]> Co-authored-by: Emma Dann <[email protected]> Co-authored-by: Emma Dann <[email protected]>
* Implement mixture models for guide assignment Key additions: - Added a base abstract class "MixtureModel" with numpyro - Added a first mixture model "Poisson_Gauss_Mixture" - New function "assign_mixture_model" in GuideAssignment class * Merge main into branch (#705) * Set legend anchor as parameter (#660) * Fix missing space * Remove explicit anndata in dependencies (#666) * Incorporate use case tutorials (#665) * Fixed DEG layer retrieval * Use-case tutorial icons * Restructure tutorial page * Subgroup tutorials * Improve KNN label_transfer in PerturbationSpace (#658) * Add uncertainty score in KNN label_transfer in PerturbationSpace Certainty is quantified as the fraction of nearest neighbors belonging to the classified (i.e. the most abundant) label compared to the total number of nearest neighbors. * Update pre-commit-config.yaml Replaces yanked dependency of mypy "types-pkg-resources" with "types-setuptools" as recommended: https://pypi.org/project/types-pkg-resources/ * Improve label imputation in PerturbationSpace class Key changes: - Now uses KNN graph in adata: saves cost and increases consistency - Vectorized operations instead of expensive for loop - Distance weighting for KNN imputation - Quantifies uncertainty as local KNN label entropy * Fixed plotting for mixscape.plot_barplot and sccoda.plot_effects_barplot (#667) * Augur scsim warnings (#670) * Augur scsim warnings Signed-off-by: zethson <[email protected]> * Submodules Signed-off-by: zethson <[email protected]> --------- Signed-off-by: zethson <[email protected]> * Add PerturbationDataValidator (#672) * Augur scsim warnings Signed-off-by: zethson <[email protected]> * Submodules Signed-off-by: zethson <[email protected]> * Add super draft of pertpy validator Signed-off-by: zethson <[email protected]> * Polish Signed-off-by: zethson <[email protected]> * Polish Signed-off-by: zethson <[email protected]> * Nested try Signed-off-by: zethson <[email protected]> * validator in test Signed-off-by: zethson <[email protected]> * try uv for rtd Signed-off-by: zethson <[email protected]> * rtd uv Signed-off-by: zethson <[email protected]> * rtd uv Signed-off-by: zethson <[email protected]> * rtd uv fix Signed-off-by: zethson <[email protected]> * mb sphinx fix for validator Signed-off-by: zethson <[email protected]> * docs Signed-off-by: zethson <[email protected]> * remove PerturbationValidator from docs Signed-off-by: zethson <[email protected]> * remove PerturbationValidator from docs Signed-off-by: zethson <[email protected]> --------- Signed-off-by: zethson <[email protected]> * Latest OS for RTD * Remove curator again Signed-off-by: zethson <[email protected]> * Fix jax random array (#686) * Fix jax random array Signed-off-by: zethson <[email protected]> * Fix further jax warnings Signed-off-by: zethson <[email protected]> * Fix edger Signed-off-by: zethson <[email protected]> * Fix choice Signed-off-by: zethson <[email protected]> --------- Signed-off-by: zethson <[email protected]> * Switch to formulaic-contrasts (#682) * Switch to formulaic-contrasts * Cleanup * removing design matrix workaround (#691) Co-authored-by: Emma Dann <[email protected]> * Fix PyDESeq2 * Update tests * fix typo in gitignore * Remove contrast dataclass, which isnt used anywhere * Fix edgeR rpy2 tests (#692) * fix broken rpy2 edger tests * updated edger tests * Fix tests (scipy) Signed-off-by: zethson <[email protected]> * submodule Signed-off-by: zethson <[email protected]> * Remove unused code Signed-off-by: zethson <[email protected]> * type hints Signed-off-by: zethson <[email protected]> --------- Signed-off-by: zethson <[email protected]> Co-authored-by: Emma Dann <[email protected]> Co-authored-by: Emma Dann <[email protected]> Co-authored-by: zethson <[email protected]> * Release 0.9.5 Signed-off-by: zethson <[email protected]> * Prepare 0.10.0 Signed-off-by: zethson <[email protected]> * Added Mixscape seeds and test (#683) Co-authored-by: Lukas Heumos <[email protected]> * Fix probability data type (#696) Signed-off-by: Lukas Heumos <[email protected]> * Optimize MeanVarDistributionDistance (#697) * Fix probability data type Signed-off-by: Lukas Heumos <[email protected]> * Optimize mean_var distance Signed-off-by: Lukas Heumos <[email protected]> --------- Signed-off-by: Lukas Heumos <[email protected]> * Optimize test speed (#699) * Try buildjet Signed-off-by: Lukas Heumos <[email protected]> * Try buildjet large Signed-off-by: Lukas Heumos <[email protected]> * speed up predict_differential_prioritization Signed-off-by: Lukas Heumos <[email protected]> * speed up tests Signed-off-by: Lukas Heumos <[email protected]> --------- Signed-off-by: Lukas Heumos <[email protected]> * Lower bound for scikit-learn (#701) Signed-off-by: Lukas Heumos <[email protected]> * Fix type annotation Signed-off-by: Lukas Heumos <[email protected]> * Fix empty figure returns when show=True in plotting functions (#703) * Removed show parameter * Adapt plotting API for Augur, Coda, Dialogue * Adapted plotting API for Milo, Mixscape, scgen * Add joblib * Remove joblib --------- Co-authored-by: Lukas Heumos <[email protected]> * Fix scikit-learn intendation Signed-off-by: Lukas Heumos <[email protected]> --------- Signed-off-by: zethson <[email protected]> Signed-off-by: Lukas Heumos <[email protected]> Co-authored-by: Lilly May <[email protected]> Co-authored-by: Lukas Heumos <[email protected]> Co-authored-by: Gregor Sturm <[email protected]> Co-authored-by: Emma Dann <[email protected]> Co-authored-by: Emma Dann <[email protected]> * Merge main into branch (#706) * Set legend anchor as parameter (#660) * Fix missing space * Remove explicit anndata in dependencies (#666) * Incorporate use case tutorials (#665) * Fixed DEG layer retrieval * Use-case tutorial icons * Restructure tutorial page * Subgroup tutorials * Improve KNN label_transfer in PerturbationSpace (#658) * Add uncertainty score in KNN label_transfer in PerturbationSpace Certainty is quantified as the fraction of nearest neighbors belonging to the classified (i.e. the most abundant) label compared to the total number of nearest neighbors. * Update pre-commit-config.yaml Replaces yanked dependency of mypy "types-pkg-resources" with "types-setuptools" as recommended: https://pypi.org/project/types-pkg-resources/ * Improve label imputation in PerturbationSpace class Key changes: - Now uses KNN graph in adata: saves cost and increases consistency - Vectorized operations instead of expensive for loop - Distance weighting for KNN imputation - Quantifies uncertainty as local KNN label entropy * Fixed plotting for mixscape.plot_barplot and sccoda.plot_effects_barplot (#667) * Augur scsim warnings (#670) * Augur scsim warnings Signed-off-by: zethson <[email protected]> * Submodules Signed-off-by: zethson <[email protected]> --------- Signed-off-by: zethson <[email protected]> * Add PerturbationDataValidator (#672) * Augur scsim warnings Signed-off-by: zethson <[email protected]> * Submodules Signed-off-by: zethson <[email protected]> * Add super draft of pertpy validator Signed-off-by: zethson <[email protected]> * Polish Signed-off-by: zethson <[email protected]> * Polish Signed-off-by: zethson <[email protected]> * Nested try Signed-off-by: zethson <[email protected]> * validator in test Signed-off-by: zethson <[email protected]> * try uv for rtd Signed-off-by: zethson <[email protected]> * rtd uv Signed-off-by: zethson <[email protected]> * rtd uv Signed-off-by: zethson <[email protected]> * rtd uv fix Signed-off-by: zethson <[email protected]> * mb sphinx fix for validator Signed-off-by: zethson <[email protected]> * docs Signed-off-by: zethson <[email protected]> * remove PerturbationValidator from docs Signed-off-by: zethson <[email protected]> * remove PerturbationValidator from docs Signed-off-by: zethson <[email protected]> --------- Signed-off-by: zethson <[email protected]> * Latest OS for RTD * Remove curator again Signed-off-by: zethson <[email protected]> * Fix jax random array (#686) * Fix jax random array Signed-off-by: zethson <[email protected]> * Fix further jax warnings Signed-off-by: zethson <[email protected]> * Fix edger Signed-off-by: zethson <[email protected]> * Fix choice Signed-off-by: zethson <[email protected]> --------- Signed-off-by: zethson <[email protected]> * Switch to formulaic-contrasts (#682) * Switch to formulaic-contrasts * Cleanup * removing design matrix workaround (#691) Co-authored-by: Emma Dann <[email protected]> * Fix PyDESeq2 * Update tests * fix typo in gitignore * Remove contrast dataclass, which isnt used anywhere * Fix edgeR rpy2 tests (#692) * fix broken rpy2 edger tests * updated edger tests * Fix tests (scipy) Signed-off-by: zethson <[email protected]> * submodule Signed-off-by: zethson <[email protected]> * Remove unused code Signed-off-by: zethson <[email protected]> * type hints Signed-off-by: zethson <[email protected]> --------- Signed-off-by: zethson <[email protected]> Co-authored-by: Emma Dann <[email protected]> Co-authored-by: Emma Dann <[email protected]> Co-authored-by: zethson <[email protected]> * Release 0.9.5 Signed-off-by: zethson <[email protected]> * Prepare 0.10.0 Signed-off-by: zethson <[email protected]> * Added Mixscape seeds and test (#683) Co-authored-by: Lukas Heumos <[email protected]> * Fix probability data type (#696) Signed-off-by: Lukas Heumos <[email protected]> * Optimize MeanVarDistributionDistance (#697) * Fix probability data type Signed-off-by: Lukas Heumos <[email protected]> * Optimize mean_var distance Signed-off-by: Lukas Heumos <[email protected]> --------- Signed-off-by: Lukas Heumos <[email protected]> * Optimize test speed (#699) * Try buildjet Signed-off-by: Lukas Heumos <[email protected]> * Try buildjet large Signed-off-by: Lukas Heumos <[email protected]> * speed up predict_differential_prioritization Signed-off-by: Lukas Heumos <[email protected]> * speed up tests Signed-off-by: Lukas Heumos <[email protected]> --------- Signed-off-by: Lukas Heumos <[email protected]> * Lower bound for scikit-learn (#701) Signed-off-by: Lukas Heumos <[email protected]> * Fix type annotation Signed-off-by: Lukas Heumos <[email protected]> * Fix empty figure returns when show=True in plotting functions (#703) * Removed show parameter * Adapt plotting API for Augur, Coda, Dialogue * Adapted plotting API for Milo, Mixscape, scgen * Add joblib * Remove joblib --------- Co-authored-by: Lukas Heumos <[email protected]> * Fix scikit-learn intendation Signed-off-by: Lukas Heumos <[email protected]> --------- Signed-off-by: zethson <[email protected]> Signed-off-by: Lukas Heumos <[email protected]> Co-authored-by: Lilly May <[email protected]> Co-authored-by: Lukas Heumos <[email protected]> Co-authored-by: Gregor Sturm <[email protected]> Co-authored-by: Emma Dann <[email protected]> Co-authored-by: Emma Dann <[email protected]> * Refactor guide assignment logic and enhance mixture model parameters * Cleanup MixtureModel class * Enhance guide assignment validation and error handling in GuideAssignment class * Update dev nb * Add test for grna_mixture_model * Remove dev nb * Update notebook for guide assignment * Update guide assignment notebooks * Apply suggestions from code review Review comments by @Zethson Co-authored-by: Lukas Heumos <[email protected]> * Improve code to fit review suggestions - Added lots of type hints and return types - Improved naming of variables - Added and removed a few comments - Added user warnings if a guide is not expressed at all * Fix sloppy data dimensions for numpyro Previously data was (N,1) dim. Now applying ravel, and changed numpyro plates accordingly for correct batching. * Update test_grna_assignment.py We changed "Negative" to "negative" :) * Polish Signed-off-by: Lukas Heumos <[email protected]> * Polish Signed-off-by: Lukas Heumos <[email protected]> --------- Signed-off-by: zethson <[email protected]> Signed-off-by: Lukas Heumos <[email protected]> Co-authored-by: Lilly May <[email protected]> Co-authored-by: Lukas Heumos <[email protected]> Co-authored-by: Gregor Sturm <[email protected]> Co-authored-by: Emma Dann <[email protected]> Co-authored-by: Emma Dann <[email protected]>
PR Checklist
docs
is updated (NA)Description of changes
Certainty is quantified as the fraction of nearest neighbors belonging to the classified (i.e. the most abundant) label compared to the total number of nearest neighbors. This is saved in a new specified column in adata.obs as a score between 0 and 1. Cells that had a label to begin with will get highest certainty assigned.
Additional context