Add uncertainty score in KNN label_transfer in PerturbationSpace #658

stefanpeidli · 2024-09-19T13:43:37Z

PR Checklist

Referenced issue is linked (NA; addresses reviewer comment)
If you've fixed a bug or added code that should be tested, add tests!
Documentation in docs is updated (NA)

Description of changes

Certainty is quantified as the fraction of nearest neighbors belonging to the classified (i.e. the most abundant) label compared to the total number of nearest neighbors. This is saved in a new specified column in adata.obs as a score between 0 and 1. Cells that had a label to begin with will get highest certainty assigned.

Additional context

Certainty is quantified as the fraction of nearest neighbors belonging to the classified (i.e. the most abundant) label compared to the total number of nearest neighbors.

stefanpeidli · 2024-09-19T13:44:56Z

@Zethson Norman et al. analysis has to be updated with this. I can do that if needed, just LMK.

Zethson

Great, thank you!

I was wondering whether it's more useful to model a certainty score or an uncertainty score. Don't people usually look at the latter? They are of course the same but maybe a 1- is the way to go?

Also, I think that this is Occam's razor. But I think we can make a better case if we implement something slightly more sophisticated that actually isn't much more work. Issues are:

Dependence on n_neighbors: The certainty measure is highly dependent on the choice of n_neighbors. A small number might lead to overfitting, while a large number might smooth out important local structures.
Lack of distance weighting: All neighbors are treated equally, regardless of their distance in the embedding space. This might not accurately represent the data structure, especially in areas of varying density.
Binary treatment of labels: The method doesn't consider similarities between different labels. For example, if there are labels A, B, and C, and B is more similar to A than C is, this information is not used.

There's more issues but these we can't easily tackle. I'm copy and pasting a solution by Claude here. Not because we should copy and paste this but so that you see how we can potentially do that without a lot of work.

import numpy as np
import pandas as pd
from pynndescent import NNDescent
from sklearn.neighbors import KernelDensity
from scipy.spatial.distance import pdist, squareform

def improved_label_transfer(
    adata,
    column="perturbation",
    column_certainty_score="perturbation_transfer_certainty",
    target_val="unknown",
    max_neighbors=50,
    use_rep="X_umap",
    label_similarity_matrix=None
):
    """Improved imputation of missing values using adaptive KNN and weighted voting.
    
    Args:
        adata: AnnData object containing single-cell data.
        column: Column name in AnnData object to perform imputation on.
        column_certainty_score: Column name to store the certainty score.
        target_val: The target value to impute.
        max_neighbors: Maximum number of neighbors to consider.
        use_rep: Key in adata.obsm where the embedding is stored.
        label_similarity_matrix: Dictionary of label similarities. If None, treats all labels as equally dissimilar.
    """
    if use_rep not in adata.obsm:
        raise ValueError(f"Representation {use_rep} not found in the AnnData object.")
    
    embedding = adata.obsm[use_rep]
    
    # Estimate local density
    kde = KernelDensity(kernel='gaussian', bandwidth=0.5).fit(embedding)
    local_density = np.exp(kde.score_samples(embedding))
    
    # Adaptive neighbor selection
    adaptive_n_neighbors = np.clip(np.int32(max_neighbors * (1 - local_density)), 5, max_neighbors)
    
    # Find neighbors
    nnd = NNDescent(embedding, n_neighbors=max_neighbors)
    indices, distances = nnd.query(embedding, k=max_neighbors)
    
    perturbations = np.array(adata.obs[column])
    certainty = np.ones(adata.n_obs)
    missing_mask = perturbations == target_val
    
    # Create label similarity matrix if not provided
    if label_similarity_matrix is None:
        unique_labels = np.unique(perturbations[~missing_mask])
        label_similarity_matrix = {l1: {l2: 1.0 if l1 == l2 else 0.0 for l2 in unique_labels} for l1 in unique_labels}
    
    for idx in np.where(missing_mask)[0]:
        n_neighbors = adaptive_n_neighbors[idx]
        neighbor_indices = indices[idx][:n_neighbors]
        neighbor_distances = distances[idx][:n_neighbors]
        neighbor_categories = perturbations[neighbor_indices]
        
        # Distance-weighted voting
        weights = 1 / (neighbor_distances + 1e-8)  # Add small epsilon to avoid division by zero
        
        # Consider label similarities
        vote_dict = {}
        for cat, weight in zip(neighbor_categories, weights):
            for possible_cat in label_similarity_matrix[cat]:
                similarity = label_similarity_matrix[cat][possible_cat]
                vote_dict[possible_cat] = vote_dict.get(possible_cat, 0) + weight * similarity
        
        most_common = max(vote_dict, key=vote_dict.get)
        perturbations[idx] = most_common
        
        # Calculate certainty based on weighted votes
        total_votes = sum(vote_dict.values())
        certainty[idx] = vote_dict[most_common] / total_votes if total_votes > 0 else 0

    adata.obs[column] = perturbations
    adata.obs[column_certainty_score] = certainty

# Example usage:
# improved_label_transfer(adata, label_similarity_matrix={'A': {'A': 1.0, 'B': 0.5, 'C': 0.1}, 'B': {'A': 0.5, 'B': 1.0, 'C': 0.3}, 'C': {'A': 0.1, 'B': 0.3, 'C': 1.0}})

What do you think? We can also keep your simple implementation as a flavor but I think we should do this in a better way to argue better in the manuscript

pertpy/tools/_perturbation_space/_perturbation_space.py

Zethson · 2024-09-19T15:32:40Z

FAILED tests/tools/_perturbation_space/test_simple_perturbation_space.py::test_label_transfer - AttributeError: 'AnnData' object has no attribute 'loc'

The test failed by the way

stefanpeidli · 2024-09-20T09:01:48Z

Great suggestions @Zethson. I tried to keep the (un)certainty measure as simple as the label transfer method (which is too simplistic). Let's improve the label transfer itself then first?

My senf:

Yes let's make this an uncertainty score instead.
If we use distance weighting (which I agree we should), we can increase the number of neighbors more safely so the choices for this parameter have less impact in general. I think this alone should suffice to improve the label transfer and uncertainty score.
Instead of label similarity (which uses global information that might not necessarily hold locally, I do not like this) I propose we use the entropy of the local class distribution for the uncertainty. If two classes are very similar (locally!), this case should still get a higher certainty score than a case where every class gets has the same number of neighbors to the query node. Is that understandable?

…lab/pertpy into feature/label_transfer_uncertainty

review-notebook-app · 2024-09-20T09:09:08Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Zethson · 2024-09-20T11:49:43Z

Instead of label similarity (which uses global information that might not necessarily hold locally, I do not like this) I propose we use the entropy of the local class distribution for the uncertainty. If two classes are very similar (locally!), this case should still get a higher certainty score than a case where every class gets has the same number of neighbors to the query node. Is that understandable?

Yes! +1

Replaces yanked dependency of mypy "types-pkg-resources" with "types-setuptools" as recommended: https://pypi.org/project/types-pkg-resources/

…se/pertpy into feature/label_transfer_uncertainty

Key changes: - Now uses KNN graph in adata: saves cost and increases consistency - Vectorized operations instead of expensive for loop - Distance weighting for KNN imputation - Quantifies uncertainty as local KNN label entropy

stefanpeidli · 2024-09-25T11:53:00Z

I reworked the function, it's now much better I think. Key changes:

Now uses KNN graph in adata: saves cost and increases consistency e.g. with UMAPs generated from same KNN.
Now vectorized operations instead of expensive for loop, MUCH faster
Using distance weighting for KNN imputation --> more robust
Quantifies uncertainty as local KNN label entropy. Basically measures how sure the classifier is with its prediction. We do not account for global similarity between classes.
(tests passed at least locally :))))

Also, I changed the example from random labels to louvain with randomly dropped labels.

Here is some example result:

And somewhat hidden in the commits: I removed a faulty dependency in pre-commit config that was preventing me from using it. See commit message: "Replaces yanked dependency of mypy "types-pkg-resources" with "types-setuptools" as recommended: https://pypi.org/project/types-pkg-resources/".

Zethson · 2024-09-25T11:54:30Z

Thanks! I'll have a look soon. But always use Leiden instead of Louvain, please :)

stefanpeidli · 2024-09-25T11:56:20Z

I'm using the louvain clusters because they came with the preprocessed dataset. Otherwise I'd 100% agree with always using leiden!

Uncertainty can go above 1, as entropy is not bounded by 1.

Zethson

Thank you very much! This is much better.

The test still fails. The only source of truth is the CI and "it works on my machine" is sadly not good enough ^_^

FAILED tests/tools/_perturbation_space/test_simple_perturbation_space.py::test_label_transfer - assert False
 +  where False = all(0     0.000000\n1     0.000000\n2     0.000000\n3     0.000000\n4     0.000000\n        ...   \n64    0.000000\n65    0.000000\n66    0.812785\n67    0.933854\n68    0.981642\nName: perturbation_transfer_uncertainty, Length: 69, dtype: float64 == 0)

Could you please always resolve my comments? Then I know what you've already done and what not.
I stopped commenting on those at some point but you have many lines of code where you added a single comment above outlining what the following line of code does. This is not soo useful and noisy. Either the line of code is well written with clear variable names and it's obvious what it does or it needs to be improved. Always document WHY you're doing certain things if it matters but don't duplicate the "what"

pertpy/tools/_perturbation_space/_perturbation_space.py

For real this time :(

codecov-commenter · 2024-09-27T16:24:48Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 65.56%. Comparing base (99dcd18) to head (8660301).
Report is 4 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #658      +/-   ##
==========================================
+ Coverage   65.55%   65.56%   +0.01%     
==========================================
  Files          47       47              
  Lines        6105     6105              
==========================================
+ Hits         4002     4003       +1     
+ Misses       2103     2102       -1

Files with missing lines	Coverage Δ
...y/tools/_perturbation_space/_perturbation_space.py	`87.11% <100.00%> (+0.51%)`	⬆️

... and 1 file with indirect coverage changes

Zethson

Great! I think that the function description might benefit from a 2 sentence description on how the uncertainty is calculated.

But this is ready to go now!

Checks if error is correctly raised when no KNN is present in adata.

* Set legend anchor as parameter (#660) * Fix missing space * Remove explicit anndata in dependencies (#666) * Incorporate use case tutorials (#665) * Fixed DEG layer retrieval * Use-case tutorial icons * Restructure tutorial page * Subgroup tutorials * Improve KNN label_transfer in PerturbationSpace (#658) * Add uncertainty score in KNN label_transfer in PerturbationSpace Certainty is quantified as the fraction of nearest neighbors belonging to the classified (i.e. the most abundant) label compared to the total number of nearest neighbors. * Update pre-commit-config.yaml Replaces yanked dependency of mypy "types-pkg-resources" with "types-setuptools" as recommended: https://pypi.org/project/types-pkg-resources/ * Improve label imputation in PerturbationSpace class Key changes: - Now uses KNN graph in adata: saves cost and increases consistency - Vectorized operations instead of expensive for loop - Distance weighting for KNN imputation - Quantifies uncertainty as local KNN label entropy * Fixed plotting for mixscape.plot_barplot and sccoda.plot_effects_barplot (#667) * Augur scsim warnings (#670) * Augur scsim warnings Signed-off-by: zethson <[email protected]> * Submodules Signed-off-by: zethson <[email protected]> --------- Signed-off-by: zethson <[email protected]> * Add PerturbationDataValidator (#672) * Augur scsim warnings Signed-off-by: zethson <[email protected]> * Submodules Signed-off-by: zethson <[email protected]> * Add super draft of pertpy validator Signed-off-by: zethson <[email protected]> * Polish Signed-off-by: zethson <[email protected]> * Polish Signed-off-by: zethson <[email protected]> * Nested try Signed-off-by: zethson <[email protected]> * validator in test Signed-off-by: zethson <[email protected]> * try uv for rtd Signed-off-by: zethson <[email protected]> * rtd uv Signed-off-by: zethson <[email protected]> * rtd uv Signed-off-by: zethson <[email protected]> * rtd uv fix Signed-off-by: zethson <[email protected]> * mb sphinx fix for validator Signed-off-by: zethson <[email protected]> * docs Signed-off-by: zethson <[email protected]> * remove PerturbationValidator from docs Signed-off-by: zethson <[email protected]> * remove PerturbationValidator from docs Signed-off-by: zethson <[email protected]> --------- Signed-off-by: zethson <[email protected]> * Latest OS for RTD * Remove curator again Signed-off-by: zethson <[email protected]> * Fix jax random array (#686) * Fix jax random array Signed-off-by: zethson <[email protected]> * Fix further jax warnings Signed-off-by: zethson <[email protected]> * Fix edger Signed-off-by: zethson <[email protected]> * Fix choice Signed-off-by: zethson <[email protected]> --------- Signed-off-by: zethson <[email protected]> * Switch to formulaic-contrasts (#682) * Switch to formulaic-contrasts * Cleanup * removing design matrix workaround (#691) Co-authored-by: Emma Dann <[email protected]> * Fix PyDESeq2 * Update tests * fix typo in gitignore * Remove contrast dataclass, which isnt used anywhere * Fix edgeR rpy2 tests (#692) * fix broken rpy2 edger tests * updated edger tests * Fix tests (scipy) Signed-off-by: zethson <[email protected]> * submodule Signed-off-by: zethson <[email protected]> * Remove unused code Signed-off-by: zethson <[email protected]> * type hints Signed-off-by: zethson <[email protected]> --------- Signed-off-by: zethson <[email protected]> Co-authored-by: Emma Dann <[email protected]> Co-authored-by: Emma Dann <[email protected]> Co-authored-by: zethson <[email protected]> * Release 0.9.5 Signed-off-by: zethson <[email protected]> * Prepare 0.10.0 Signed-off-by: zethson <[email protected]> * Added Mixscape seeds and test (#683) Co-authored-by: Lukas Heumos <[email protected]> * Fix probability data type (#696) Signed-off-by: Lukas Heumos <[email protected]> * Optimize MeanVarDistributionDistance (#697) * Fix probability data type Signed-off-by: Lukas Heumos <[email protected]> * Optimize mean_var distance Signed-off-by: Lukas Heumos <[email protected]> --------- Signed-off-by: Lukas Heumos <[email protected]> * Optimize test speed (#699) * Try buildjet Signed-off-by: Lukas Heumos <[email protected]> * Try buildjet large Signed-off-by: Lukas Heumos <[email protected]> * speed up predict_differential_prioritization Signed-off-by: Lukas Heumos <[email protected]> * speed up tests Signed-off-by: Lukas Heumos <[email protected]> --------- Signed-off-by: Lukas Heumos <[email protected]> * Lower bound for scikit-learn (#701) Signed-off-by: Lukas Heumos <[email protected]> * Fix type annotation Signed-off-by: Lukas Heumos <[email protected]> * Fix empty figure returns when show=True in plotting functions (#703) * Removed show parameter * Adapt plotting API for Augur, Coda, Dialogue * Adapted plotting API for Milo, Mixscape, scgen * Add joblib * Remove joblib --------- Co-authored-by: Lukas Heumos <[email protected]> * Fix scikit-learn intendation Signed-off-by: Lukas Heumos <[email protected]> --------- Signed-off-by: zethson <[email protected]> Signed-off-by: Lukas Heumos <[email protected]> Co-authored-by: Lilly May <[email protected]> Co-authored-by: Lukas Heumos <[email protected]> Co-authored-by: Gregor Sturm <[email protected]> Co-authored-by: Emma Dann <[email protected]> Co-authored-by: Emma Dann <[email protected]>

@Zethson

* Implement mixture models for guide assignment Key additions: - Added a base abstract class "MixtureModel" with numpyro - Added a first mixture model "Poisson_Gauss_Mixture" - New function "assign_mixture_model" in GuideAssignment class * Merge main into branch (#705) * Set legend anchor as parameter (#660) * Fix missing space * Remove explicit anndata in dependencies (#666) * Incorporate use case tutorials (#665) * Fixed DEG layer retrieval * Use-case tutorial icons * Restructure tutorial page * Subgroup tutorials * Improve KNN label_transfer in PerturbationSpace (#658) * Add uncertainty score in KNN label_transfer in PerturbationSpace Certainty is quantified as the fraction of nearest neighbors belonging to the classified (i.e. the most abundant) label compared to the total number of nearest neighbors. * Update pre-commit-config.yaml Replaces yanked dependency of mypy "types-pkg-resources" with "types-setuptools" as recommended: https://pypi.org/project/types-pkg-resources/ * Improve label imputation in PerturbationSpace class Key changes: - Now uses KNN graph in adata: saves cost and increases consistency - Vectorized operations instead of expensive for loop - Distance weighting for KNN imputation - Quantifies uncertainty as local KNN label entropy * Fixed plotting for mixscape.plot_barplot and sccoda.plot_effects_barplot (#667) * Augur scsim warnings (#670) * Augur scsim warnings Signed-off-by: zethson <[email protected]> * Submodules Signed-off-by: zethson <[email protected]> --------- Signed-off-by: zethson <[email protected]> * Add PerturbationDataValidator (#672) * Augur scsim warnings Signed-off-by: zethson <[email protected]> * Submodules Signed-off-by: zethson <[email protected]> * Add super draft of pertpy validator Signed-off-by: zethson <[email protected]> * Polish Signed-off-by: zethson <[email protected]> * Polish Signed-off-by: zethson <[email protected]> * Nested try Signed-off-by: zethson <[email protected]> * validator in test Signed-off-by: zethson <[email protected]> * try uv for rtd Signed-off-by: zethson <[email protected]> * rtd uv Signed-off-by: zethson <[email protected]> * rtd uv Signed-off-by: zethson <[email protected]> * rtd uv fix Signed-off-by: zethson <[email protected]> * mb sphinx fix for validator Signed-off-by: zethson <[email protected]> * docs Signed-off-by: zethson <[email protected]> * remove PerturbationValidator from docs Signed-off-by: zethson <[email protected]> * remove PerturbationValidator from docs Signed-off-by: zethson <[email protected]> --------- Signed-off-by: zethson <[email protected]> * Latest OS for RTD * Remove curator again Signed-off-by: zethson <[email protected]> * Fix jax random array (#686) * Fix jax random array Signed-off-by: zethson <[email protected]> * Fix further jax warnings Signed-off-by: zethson <[email protected]> * Fix edger Signed-off-by: zethson <[email protected]> * Fix choice Signed-off-by: zethson <[email protected]> --------- Signed-off-by: zethson <[email protected]> * Switch to formulaic-contrasts (#682) * Switch to formulaic-contrasts * Cleanup * removing design matrix workaround (#691) Co-authored-by: Emma Dann <[email protected]> * Fix PyDESeq2 * Update tests * fix typo in gitignore * Remove contrast dataclass, which isnt used anywhere * Fix edgeR rpy2 tests (#692) * fix broken rpy2 edger tests * updated edger tests * Fix tests (scipy) Signed-off-by: zethson <[email protected]> * submodule Signed-off-by: zethson <[email protected]> * Remove unused code Signed-off-by: zethson <[email protected]> * type hints Signed-off-by: zethson <[email protected]> --------- Signed-off-by: zethson <[email protected]> Co-authored-by: Emma Dann <[email protected]> Co-authored-by: Emma Dann <[email protected]> Co-authored-by: zethson <[email protected]> * Release 0.9.5 Signed-off-by: zethson <[email protected]> * Prepare 0.10.0 Signed-off-by: zethson <[email protected]> * Added Mixscape seeds and test (#683) Co-authored-by: Lukas Heumos <[email protected]> * Fix probability data type (#696) Signed-off-by: Lukas Heumos <[email protected]> * Optimize MeanVarDistributionDistance (#697) * Fix probability data type Signed-off-by: Lukas Heumos <[email protected]> * Optimize mean_var distance Signed-off-by: Lukas Heumos <[email protected]> --------- Signed-off-by: Lukas Heumos <[email protected]> * Optimize test speed (#699) * Try buildjet Signed-off-by: Lukas Heumos <[email protected]> * Try buildjet large Signed-off-by: Lukas Heumos <[email protected]> * speed up predict_differential_prioritization Signed-off-by: Lukas Heumos <[email protected]> * speed up tests Signed-off-by: Lukas Heumos <[email protected]> --------- Signed-off-by: Lukas Heumos <[email protected]> * Lower bound for scikit-learn (#701) Signed-off-by: Lukas Heumos <[email protected]> * Fix type annotation Signed-off-by: Lukas Heumos <[email protected]> * Fix empty figure returns when show=True in plotting functions (#703) * Removed show parameter * Adapt plotting API for Augur, Coda, Dialogue * Adapted plotting API for Milo, Mixscape, scgen * Add joblib * Remove joblib --------- Co-authored-by: Lukas Heumos <[email protected]> * Fix scikit-learn intendation Signed-off-by: Lukas Heumos <[email protected]> --------- Signed-off-by: zethson <[email protected]> Signed-off-by: Lukas Heumos <[email protected]> Co-authored-by: Lilly May <[email protected]> Co-authored-by: Lukas Heumos <[email protected]> Co-authored-by: Gregor Sturm <[email protected]> Co-authored-by: Emma Dann <[email protected]> Co-authored-by: Emma Dann <[email protected]> * Merge main into branch (#706) * Set legend anchor as parameter (#660) * Fix missing space * Remove explicit anndata in dependencies (#666) * Incorporate use case tutorials (#665) * Fixed DEG layer retrieval * Use-case tutorial icons * Restructure tutorial page * Subgroup tutorials * Improve KNN label_transfer in PerturbationSpace (#658) * Add uncertainty score in KNN label_transfer in PerturbationSpace Certainty is quantified as the fraction of nearest neighbors belonging to the classified (i.e. the most abundant) label compared to the total number of nearest neighbors. * Update pre-commit-config.yaml Replaces yanked dependency of mypy "types-pkg-resources" with "types-setuptools" as recommended: https://pypi.org/project/types-pkg-resources/ * Improve label imputation in PerturbationSpace class Key changes: - Now uses KNN graph in adata: saves cost and increases consistency - Vectorized operations instead of expensive for loop - Distance weighting for KNN imputation - Quantifies uncertainty as local KNN label entropy * Fixed plotting for mixscape.plot_barplot and sccoda.plot_effects_barplot (#667) * Augur scsim warnings (#670) * Augur scsim warnings Signed-off-by: zethson <[email protected]> * Submodules Signed-off-by: zethson <[email protected]> --------- Signed-off-by: zethson <[email protected]> * Add PerturbationDataValidator (#672) * Augur scsim warnings Signed-off-by: zethson <[email protected]> * Submodules Signed-off-by: zethson <[email protected]> * Add super draft of pertpy validator Signed-off-by: zethson <[email protected]> * Polish Signed-off-by: zethson <[email protected]> * Polish Signed-off-by: zethson <[email protected]> * Nested try Signed-off-by: zethson <[email protected]> * validator in test Signed-off-by: zethson <[email protected]> * try uv for rtd Signed-off-by: zethson <[email protected]> * rtd uv Signed-off-by: zethson <[email protected]> * rtd uv Signed-off-by: zethson <[email protected]> * rtd uv fix Signed-off-by: zethson <[email protected]> * mb sphinx fix for validator Signed-off-by: zethson <[email protected]> * docs Signed-off-by: zethson <[email protected]> * remove PerturbationValidator from docs Signed-off-by: zethson <[email protected]> * remove PerturbationValidator from docs Signed-off-by: zethson <[email protected]> --------- Signed-off-by: zethson <[email protected]> * Latest OS for RTD * Remove curator again Signed-off-by: zethson <[email protected]> * Fix jax random array (#686) * Fix jax random array Signed-off-by: zethson <[email protected]> * Fix further jax warnings Signed-off-by: zethson <[email protected]> * Fix edger Signed-off-by: zethson <[email protected]> * Fix choice Signed-off-by: zethson <[email protected]> --------- Signed-off-by: zethson <[email protected]> * Switch to formulaic-contrasts (#682) * Switch to formulaic-contrasts * Cleanup * removing design matrix workaround (#691) Co-authored-by: Emma Dann <[email protected]> * Fix PyDESeq2 * Update tests * fix typo in gitignore * Remove contrast dataclass, which isnt used anywhere * Fix edgeR rpy2 tests (#692) * fix broken rpy2 edger tests * updated edger tests * Fix tests (scipy) Signed-off-by: zethson <[email protected]> * submodule Signed-off-by: zethson <[email protected]> * Remove unused code Signed-off-by: zethson <[email protected]> * type hints Signed-off-by: zethson <[email protected]> --------- Signed-off-by: zethson <[email protected]> Co-authored-by: Emma Dann <[email protected]> Co-authored-by: Emma Dann <[email protected]> Co-authored-by: zethson <[email protected]> * Release 0.9.5 Signed-off-by: zethson <[email protected]> * Prepare 0.10.0 Signed-off-by: zethson <[email protected]> * Added Mixscape seeds and test (#683) Co-authored-by: Lukas Heumos <[email protected]> * Fix probability data type (#696) Signed-off-by: Lukas Heumos <[email protected]> * Optimize MeanVarDistributionDistance (#697) * Fix probability data type Signed-off-by: Lukas Heumos <[email protected]> * Optimize mean_var distance Signed-off-by: Lukas Heumos <[email protected]> --------- Signed-off-by: Lukas Heumos <[email protected]> * Optimize test speed (#699) * Try buildjet Signed-off-by: Lukas Heumos <[email protected]> * Try buildjet large Signed-off-by: Lukas Heumos <[email protected]> * speed up predict_differential_prioritization Signed-off-by: Lukas Heumos <[email protected]> * speed up tests Signed-off-by: Lukas Heumos <[email protected]> --------- Signed-off-by: Lukas Heumos <[email protected]> * Lower bound for scikit-learn (#701) Signed-off-by: Lukas Heumos <[email protected]> * Fix type annotation Signed-off-by: Lukas Heumos <[email protected]> * Fix empty figure returns when show=True in plotting functions (#703) * Removed show parameter * Adapt plotting API for Augur, Coda, Dialogue * Adapted plotting API for Milo, Mixscape, scgen * Add joblib * Remove joblib --------- Co-authored-by: Lukas Heumos <[email protected]> * Fix scikit-learn intendation Signed-off-by: Lukas Heumos <[email protected]> --------- Signed-off-by: zethson <[email protected]> Signed-off-by: Lukas Heumos <[email protected]> Co-authored-by: Lilly May <[email protected]> Co-authored-by: Lukas Heumos <[email protected]> Co-authored-by: Gregor Sturm <[email protected]> Co-authored-by: Emma Dann <[email protected]> Co-authored-by: Emma Dann <[email protected]> * Refactor guide assignment logic and enhance mixture model parameters * Cleanup MixtureModel class * Enhance guide assignment validation and error handling in GuideAssignment class * Update dev nb * Add test for grna_mixture_model * Remove dev nb * Update notebook for guide assignment * Update guide assignment notebooks * Apply suggestions from code review Review comments by @Zethson Co-authored-by: Lukas Heumos <[email protected]> * Improve code to fit review suggestions - Added lots of type hints and return types - Improved naming of variables - Added and removed a few comments - Added user warnings if a guide is not expressed at all * Fix sloppy data dimensions for numpyro Previously data was (N,1) dim. Now applying ravel, and changed numpyro plates accordingly for correct batching. * Update test_grna_assignment.py We changed "Negative" to "negative" :) * Polish Signed-off-by: Lukas Heumos <[email protected]> * Polish Signed-off-by: Lukas Heumos <[email protected]> --------- Signed-off-by: zethson <[email protected]> Signed-off-by: Lukas Heumos <[email protected]> Co-authored-by: Lilly May <[email protected]> Co-authored-by: Lukas Heumos <[email protected]> Co-authored-by: Gregor Sturm <[email protected]> Co-authored-by: Emma Dann <[email protected]> Co-authored-by: Emma Dann <[email protected]>

Add uncertainty score in KNN label_transfer in PerturbationSpace

77c72f5

Certainty is quantified as the fraction of nearest neighbors belonging to the classified (i.e. the most abundant) label compared to the total number of nearest neighbors.

stefanpeidli requested a review from Zethson September 19, 2024 13:43

github-actions bot added the enhancement New feature or request label Sep 19, 2024

stefanpeidli self-assigned this Sep 19, 2024

Zethson reviewed Sep 19, 2024

View reviewed changes

pertpy/tools/_perturbation_space/_perturbation_space.py Outdated Show resolved Hide resolved

pertpy/tools/_perturbation_space/_perturbation_space.py Outdated Show resolved Hide resolved

pertpy/tools/_perturbation_space/_perturbation_space.py Outdated Show resolved Hide resolved

stefanpeidli added 2 commits September 20, 2024 11:08

Fix test_label_transfer in test_simple_perturbation_space

6e0df72

Merge branch 'feature/label_transfer_uncertainty' of github.com:theis…

0bf9186

…lab/pertpy into feature/label_transfer_uncertainty

stefanpeidli added 6 commits September 24, 2024 14:50

Update pre-commit-config.yaml

5cc16fa

Replaces yanked dependency of mypy "types-pkg-resources" with "types-setuptools" as recommended: https://pypi.org/project/types-pkg-resources/

Merge branch 'feature/label_transfer_uncertainty' of github.com:scver…

07f7488

…se/pertpy into feature/label_transfer_uncertainty

Fix assertion in test_label_transfer

3316fca

Remove dev notebooks

d81230d

Improve label imputation in PerturbationSpace class

5354f3f

Key changes: - Now uses KNN graph in adata: saves cost and increases consistency - Vectorized operations instead of expensive for loop - Distance weighting for KNN imputation - Quantifies uncertainty as local KNN label entropy

Update test_simple_perturbation_space.py

c4927eb

stefanpeidli requested a review from Zethson September 25, 2024 11:54

Fix test_label_transfer

f3fa186

Uncertainty can go above 1, as entropy is not bounded by 1.

Zethson requested changes Sep 26, 2024

View reviewed changes

stefanpeidli added 2 commits September 27, 2024 16:43

Small name changes in label_transfer

88096ce

Fix label transfer test

217154a

For real this time :(

Zethson approved these changes Sep 28, 2024

View reviewed changes

stefanpeidli added 2 commits September 30, 2024 17:19

Add uncertainty calculation description to label_transfer

4d1bf06

Improve code coverage by label_transfer tests

db24e81

Checks if error is correctly raised when no KNN is present in adata.

Fix missing call of umap calculation in test label transfer

8660301

stefanpeidli merged commit 98e2bdb into main Sep 30, 2024
5 checks passed

stefanpeidli deleted the feature/label_transfer_uncertainty branch February 7, 2025 12:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add uncertainty score in KNN label_transfer in PerturbationSpace #658

Add uncertainty score in KNN label_transfer in PerturbationSpace #658

stefanpeidli commented Sep 19, 2024

stefanpeidli commented Sep 19, 2024

Zethson left a comment

Zethson commented Sep 19, 2024

stefanpeidli commented Sep 20, 2024

review-notebook-app bot commented Sep 20, 2024

Zethson commented Sep 20, 2024

stefanpeidli commented Sep 25, 2024

Zethson commented Sep 25, 2024

stefanpeidli commented Sep 25, 2024

Zethson left a comment •

edited

Loading

codecov-commenter commented Sep 27, 2024 •

edited

Loading

Zethson left a comment

Add uncertainty score in KNN label_transfer in PerturbationSpace #658

Add uncertainty score in KNN label_transfer in PerturbationSpace #658

Conversation

stefanpeidli commented Sep 19, 2024

stefanpeidli commented Sep 19, 2024

Zethson left a comment

Choose a reason for hiding this comment

Zethson commented Sep 19, 2024

stefanpeidli commented Sep 20, 2024

review-notebook-app bot commented Sep 20, 2024

Zethson commented Sep 20, 2024

stefanpeidli commented Sep 25, 2024

Zethson commented Sep 25, 2024

stefanpeidli commented Sep 25, 2024

Zethson left a comment • edited Loading

Choose a reason for hiding this comment

codecov-commenter commented Sep 27, 2024 • edited Loading

Codecov Report

Zethson left a comment

Choose a reason for hiding this comment

Zethson left a comment •

edited

Loading

codecov-commenter commented Sep 27, 2024 •

edited

Loading