Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add uncertainty score in KNN label_transfer in PerturbationSpace #658

Merged
merged 15 commits into from
Sep 30, 2024

Conversation

stefanpeidli
Copy link
Collaborator

PR Checklist

  • Referenced issue is linked (NA; addresses reviewer comment)
  • If you've fixed a bug or added code that should be tested, add tests!
  • Documentation in docs is updated (NA)

Description of changes

Certainty is quantified as the fraction of nearest neighbors belonging to the classified (i.e. the most abundant) label compared to the total number of nearest neighbors. This is saved in a new specified column in adata.obs as a score between 0 and 1. Cells that had a label to begin with will get highest certainty assigned.

Additional context

image
image

Certainty is quantified as the fraction of nearest neighbors belonging to the classified (i.e. the most abundant) label compared to the total number of nearest neighbors.
@github-actions github-actions bot added the enhancement New feature or request label Sep 19, 2024
@stefanpeidli stefanpeidli self-assigned this Sep 19, 2024
@stefanpeidli
Copy link
Collaborator Author

@Zethson Norman et al. analysis has to be updated with this. I can do that if needed, just LMK.

Copy link
Member

@Zethson Zethson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thank you!

I was wondering whether it's more useful to model a certainty score or an uncertainty score. Don't people usually look at the latter? They are of course the same but maybe a 1- is the way to go?

Also, I think that this is Occam's razor. But I think we can make a better case if we implement something slightly more sophisticated that actually isn't much more work. Issues are:

  1. Dependence on n_neighbors: The certainty measure is highly dependent on the choice of n_neighbors. A small number might lead to overfitting, while a large number might smooth out important local structures.
  2. Lack of distance weighting: All neighbors are treated equally, regardless of their distance in the embedding space. This might not accurately represent the data structure, especially in areas of varying density.
  3. Binary treatment of labels: The method doesn't consider similarities between different labels. For example, if there are labels A, B, and C, and B is more similar to A than C is, this information is not used.

There's more issues but these we can't easily tackle. I'm copy and pasting a solution by Claude here. Not because we should copy and paste this but so that you see how we can potentially do that without a lot of work.

import numpy as np
import pandas as pd
from pynndescent import NNDescent
from sklearn.neighbors import KernelDensity
from scipy.spatial.distance import pdist, squareform

def improved_label_transfer(
    adata,
    column="perturbation",
    column_certainty_score="perturbation_transfer_certainty",
    target_val="unknown",
    max_neighbors=50,
    use_rep="X_umap",
    label_similarity_matrix=None
):
    """Improved imputation of missing values using adaptive KNN and weighted voting.
    
    Args:
        adata: AnnData object containing single-cell data.
        column: Column name in AnnData object to perform imputation on.
        column_certainty_score: Column name to store the certainty score.
        target_val: The target value to impute.
        max_neighbors: Maximum number of neighbors to consider.
        use_rep: Key in adata.obsm where the embedding is stored.
        label_similarity_matrix: Dictionary of label similarities. If None, treats all labels as equally dissimilar.
    """
    if use_rep not in adata.obsm:
        raise ValueError(f"Representation {use_rep} not found in the AnnData object.")
    
    embedding = adata.obsm[use_rep]
    
    # Estimate local density
    kde = KernelDensity(kernel='gaussian', bandwidth=0.5).fit(embedding)
    local_density = np.exp(kde.score_samples(embedding))
    
    # Adaptive neighbor selection
    adaptive_n_neighbors = np.clip(np.int32(max_neighbors * (1 - local_density)), 5, max_neighbors)
    
    # Find neighbors
    nnd = NNDescent(embedding, n_neighbors=max_neighbors)
    indices, distances = nnd.query(embedding, k=max_neighbors)
    
    perturbations = np.array(adata.obs[column])
    certainty = np.ones(adata.n_obs)
    missing_mask = perturbations == target_val
    
    # Create label similarity matrix if not provided
    if label_similarity_matrix is None:
        unique_labels = np.unique(perturbations[~missing_mask])
        label_similarity_matrix = {l1: {l2: 1.0 if l1 == l2 else 0.0 for l2 in unique_labels} for l1 in unique_labels}
    
    for idx in np.where(missing_mask)[0]:
        n_neighbors = adaptive_n_neighbors[idx]
        neighbor_indices = indices[idx][:n_neighbors]
        neighbor_distances = distances[idx][:n_neighbors]
        neighbor_categories = perturbations[neighbor_indices]
        
        # Distance-weighted voting
        weights = 1 / (neighbor_distances + 1e-8)  # Add small epsilon to avoid division by zero
        
        # Consider label similarities
        vote_dict = {}
        for cat, weight in zip(neighbor_categories, weights):
            for possible_cat in label_similarity_matrix[cat]:
                similarity = label_similarity_matrix[cat][possible_cat]
                vote_dict[possible_cat] = vote_dict.get(possible_cat, 0) + weight * similarity
        
        most_common = max(vote_dict, key=vote_dict.get)
        perturbations[idx] = most_common
        
        # Calculate certainty based on weighted votes
        total_votes = sum(vote_dict.values())
        certainty[idx] = vote_dict[most_common] / total_votes if total_votes > 0 else 0

    adata.obs[column] = perturbations
    adata.obs[column_certainty_score] = certainty

# Example usage:
# improved_label_transfer(adata, label_similarity_matrix={'A': {'A': 1.0, 'B': 0.5, 'C': 0.1}, 'B': {'A': 0.5, 'B': 1.0, 'C': 0.3}, 'C': {'A': 0.1, 'B': 0.3, 'C': 1.0}})

What do you think? We can also keep your simple implementation as a flavor but I think we should do this in a better way to argue better in the manuscript

@Zethson
Copy link
Member

Zethson commented Sep 19, 2024

FAILED tests/tools/_perturbation_space/test_simple_perturbation_space.py::test_label_transfer - AttributeError: 'AnnData' object has no attribute 'loc'

The test failed by the way

@stefanpeidli
Copy link
Collaborator Author

Great suggestions @Zethson. I tried to keep the (un)certainty measure as simple as the label transfer method (which is too simplistic). Let's improve the label transfer itself then first?

My senf:

  • Yes let's make this an uncertainty score instead.
  • If we use distance weighting (which I agree we should), we can increase the number of neighbors more safely so the choices for this parameter have less impact in general. I think this alone should suffice to improve the label transfer and uncertainty score.
  • Instead of label similarity (which uses global information that might not necessarily hold locally, I do not like this) I propose we use the entropy of the local class distribution for the uncertainty. If two classes are very similar (locally!), this case should still get a higher certainty score than a case where every class gets has the same number of neighbors to the query node. Is that understandable?

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@Zethson
Copy link
Member

Zethson commented Sep 20, 2024

Instead of label similarity (which uses global information that might not necessarily hold locally, I do not like this) I propose we use the entropy of the local class distribution for the uncertainty. If two classes are very similar (locally!), this case should still get a higher certainty score than a case where every class gets has the same number of neighbors to the query node. Is that understandable?

Yes! +1

Replaces yanked dependency of mypy "types-pkg-resources" with "types-setuptools" as recommended: https://pypi.org/project/types-pkg-resources/
…se/pertpy into feature/label_transfer_uncertainty
Key changes:
- Now uses KNN graph in adata: saves cost and increases consistency
- Vectorized operations instead of expensive for loop
- Distance weighting for KNN imputation
- Quantifies uncertainty as local KNN label entropy
@stefanpeidli
Copy link
Collaborator Author

I reworked the function, it's now much better I think. Key changes:

  • Now uses KNN graph in adata: saves cost and increases consistency e.g. with UMAPs generated from same KNN.
  • Now vectorized operations instead of expensive for loop, MUCH faster
  • Using distance weighting for KNN imputation --> more robust
  • Quantifies uncertainty as local KNN label entropy. Basically measures how sure the classifier is with its prediction. We do not account for global similarity between classes.
  • (tests passed at least locally :))))

Also, I changed the example from random labels to louvain with randomly dropped labels.
image

Here is some example result:
image

And somewhat hidden in the commits: I removed a faulty dependency in pre-commit config that was preventing me from using it. See commit message: "Replaces yanked dependency of mypy "types-pkg-resources" with "types-setuptools" as recommended: https://pypi.org/project/types-pkg-resources/".

@Zethson
Copy link
Member

Zethson commented Sep 25, 2024

Thanks! I'll have a look soon. But always use Leiden instead of Louvain, please :)

@stefanpeidli
Copy link
Collaborator Author

I'm using the louvain clusters because they came with the preprocessed dataset. Otherwise I'd 100% agree with always using leiden!

Uncertainty can go above 1, as entropy is not bounded by 1.
Copy link
Member

@Zethson Zethson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much! This is much better.

  1. The test still fails. The only source of truth is the CI and "it works on my machine" is sadly not good enough ^_^
FAILED tests/tools/_perturbation_space/test_simple_perturbation_space.py::test_label_transfer - assert False
 +  where False = all(0     0.000000\n1     0.000000\n2     0.000000\n3     0.000000\n4     0.000000\n        ...   \n64    0.000000\n65    0.000000\n66    0.812785\n67    0.933854\n68    0.981642\nName: perturbation_transfer_uncertainty, Length: 69, dtype: float64 == 0)
  1. Could you please always resolve my comments? Then I know what you've already done and what not.
  2. I stopped commenting on those at some point but you have many lines of code where you added a single comment above outlining what the following line of code does. This is not soo useful and noisy. Either the line of code is well written with clear variable names and it's obvious what it does or it needs to be improved. Always document WHY you're doing certain things if it matters but don't duplicate the "what"

@codecov-commenter
Copy link

codecov-commenter commented Sep 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 65.56%. Comparing base (99dcd18) to head (8660301).
Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #658      +/-   ##
==========================================
+ Coverage   65.55%   65.56%   +0.01%     
==========================================
  Files          47       47              
  Lines        6105     6105              
==========================================
+ Hits         4002     4003       +1     
+ Misses       2103     2102       -1     
Files with missing lines Coverage Δ
...y/tools/_perturbation_space/_perturbation_space.py 87.11% <100.00%> (+0.51%) ⬆️

... and 1 file with indirect coverage changes

Copy link
Member

@Zethson Zethson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I think that the function description might benefit from a 2 sentence description on how the uncertainty is calculated.

But this is ready to go now!

@stefanpeidli stefanpeidli merged commit 98e2bdb into main Sep 30, 2024
5 checks passed
@stefanpeidli stefanpeidli deleted the feature/label_transfer_uncertainty branch February 7, 2025 12:30
stefanpeidli added a commit that referenced this pull request Feb 7, 2025
* Set legend anchor as parameter (#660)

* Fix missing space

* Remove explicit anndata in dependencies (#666)

* Incorporate use case tutorials (#665)

* Fixed DEG layer retrieval

* Use-case tutorial icons

* Restructure tutorial page

* Subgroup tutorials

* Improve KNN label_transfer in PerturbationSpace (#658)

* Add uncertainty score in KNN label_transfer in PerturbationSpace
Certainty is quantified as the fraction of nearest neighbors belonging to the classified (i.e. the most abundant) label compared to the total number of nearest neighbors.

* Update pre-commit-config.yaml
Replaces yanked dependency of mypy "types-pkg-resources" with "types-setuptools" as recommended: https://pypi.org/project/types-pkg-resources/

* Improve label imputation in PerturbationSpace class
Key changes:
- Now uses KNN graph in adata: saves cost and increases consistency
- Vectorized operations instead of expensive for loop
- Distance weighting for KNN imputation
- Quantifies uncertainty as local KNN label entropy

* Fixed plotting for mixscape.plot_barplot and sccoda.plot_effects_barplot (#667)

* Augur scsim warnings (#670)

* Augur scsim warnings

Signed-off-by: zethson <[email protected]>

* Submodules

Signed-off-by: zethson <[email protected]>

---------

Signed-off-by: zethson <[email protected]>

* Add PerturbationDataValidator (#672)

* Augur scsim warnings

Signed-off-by: zethson <[email protected]>

* Submodules

Signed-off-by: zethson <[email protected]>

* Add super draft of pertpy validator

Signed-off-by: zethson <[email protected]>

* Polish

Signed-off-by: zethson <[email protected]>

* Polish

Signed-off-by: zethson <[email protected]>

* Nested try

Signed-off-by: zethson <[email protected]>

* validator in test

Signed-off-by: zethson <[email protected]>

* try uv for rtd

Signed-off-by: zethson <[email protected]>

* rtd uv

Signed-off-by: zethson <[email protected]>

* rtd uv

Signed-off-by: zethson <[email protected]>

* rtd uv fix

Signed-off-by: zethson <[email protected]>

* mb sphinx fix for validator

Signed-off-by: zethson <[email protected]>

* docs

Signed-off-by: zethson <[email protected]>

* remove PerturbationValidator from docs

Signed-off-by: zethson <[email protected]>

* remove PerturbationValidator from docs

Signed-off-by: zethson <[email protected]>

---------

Signed-off-by: zethson <[email protected]>

* Latest OS for RTD

* Remove curator again

Signed-off-by: zethson <[email protected]>

* Fix jax random array (#686)

* Fix jax random array

Signed-off-by: zethson <[email protected]>

* Fix further jax warnings

Signed-off-by: zethson <[email protected]>

* Fix edger

Signed-off-by: zethson <[email protected]>

* Fix choice

Signed-off-by: zethson <[email protected]>

---------

Signed-off-by: zethson <[email protected]>

* Switch to formulaic-contrasts (#682)

* Switch to formulaic-contrasts

* Cleanup

* removing design matrix workaround (#691)

Co-authored-by: Emma Dann <[email protected]>

* Fix PyDESeq2

* Update tests

* fix typo in gitignore

* Remove contrast dataclass, which isnt used anywhere

* Fix edgeR rpy2 tests (#692)

* fix broken rpy2 edger tests

* updated edger tests

* Fix tests (scipy)

Signed-off-by: zethson <[email protected]>

* submodule

Signed-off-by: zethson <[email protected]>

* Remove unused code

Signed-off-by: zethson <[email protected]>

* type hints

Signed-off-by: zethson <[email protected]>

---------

Signed-off-by: zethson <[email protected]>
Co-authored-by: Emma Dann <[email protected]>
Co-authored-by: Emma Dann <[email protected]>
Co-authored-by: zethson <[email protected]>

* Release 0.9.5

Signed-off-by: zethson <[email protected]>

* Prepare 0.10.0

Signed-off-by: zethson <[email protected]>

* Added Mixscape seeds and test (#683)

Co-authored-by: Lukas Heumos <[email protected]>

* Fix probability data type (#696)

Signed-off-by: Lukas Heumos <[email protected]>

* Optimize MeanVarDistributionDistance (#697)

* Fix probability data type

Signed-off-by: Lukas Heumos <[email protected]>

* Optimize mean_var distance

Signed-off-by: Lukas Heumos <[email protected]>

---------

Signed-off-by: Lukas Heumos <[email protected]>

* Optimize test speed (#699)

* Try buildjet

Signed-off-by: Lukas Heumos <[email protected]>

* Try buildjet large

Signed-off-by: Lukas Heumos <[email protected]>

* speed up predict_differential_prioritization

Signed-off-by: Lukas Heumos <[email protected]>

* speed up tests

Signed-off-by: Lukas Heumos <[email protected]>

---------

Signed-off-by: Lukas Heumos <[email protected]>

* Lower bound for scikit-learn (#701)

Signed-off-by: Lukas Heumos <[email protected]>

* Fix type annotation

Signed-off-by: Lukas Heumos <[email protected]>

* Fix empty figure returns when show=True in plotting functions (#703)

* Removed show parameter

* Adapt plotting API for Augur, Coda, Dialogue

* Adapted plotting API for Milo, Mixscape, scgen

* Add joblib

* Remove joblib

---------

Co-authored-by: Lukas Heumos <[email protected]>

* Fix scikit-learn intendation

Signed-off-by: Lukas Heumos <[email protected]>

---------

Signed-off-by: zethson <[email protected]>
Signed-off-by: Lukas Heumos <[email protected]>
Co-authored-by: Lilly May <[email protected]>
Co-authored-by: Lukas Heumos <[email protected]>
Co-authored-by: Gregor Sturm <[email protected]>
Co-authored-by: Emma Dann <[email protected]>
Co-authored-by: Emma Dann <[email protected]>
stefanpeidli added a commit that referenced this pull request Feb 7, 2025
* Set legend anchor as parameter (#660)

* Fix missing space

* Remove explicit anndata in dependencies (#666)

* Incorporate use case tutorials (#665)

* Fixed DEG layer retrieval

* Use-case tutorial icons

* Restructure tutorial page

* Subgroup tutorials

* Improve KNN label_transfer in PerturbationSpace (#658)

* Add uncertainty score in KNN label_transfer in PerturbationSpace
Certainty is quantified as the fraction of nearest neighbors belonging to the classified (i.e. the most abundant) label compared to the total number of nearest neighbors.

* Update pre-commit-config.yaml
Replaces yanked dependency of mypy "types-pkg-resources" with "types-setuptools" as recommended: https://pypi.org/project/types-pkg-resources/

* Improve label imputation in PerturbationSpace class
Key changes:
- Now uses KNN graph in adata: saves cost and increases consistency
- Vectorized operations instead of expensive for loop
- Distance weighting for KNN imputation
- Quantifies uncertainty as local KNN label entropy

* Fixed plotting for mixscape.plot_barplot and sccoda.plot_effects_barplot (#667)

* Augur scsim warnings (#670)

* Augur scsim warnings

Signed-off-by: zethson <[email protected]>

* Submodules

Signed-off-by: zethson <[email protected]>

---------

Signed-off-by: zethson <[email protected]>

* Add PerturbationDataValidator (#672)

* Augur scsim warnings

Signed-off-by: zethson <[email protected]>

* Submodules

Signed-off-by: zethson <[email protected]>

* Add super draft of pertpy validator

Signed-off-by: zethson <[email protected]>

* Polish

Signed-off-by: zethson <[email protected]>

* Polish

Signed-off-by: zethson <[email protected]>

* Nested try

Signed-off-by: zethson <[email protected]>

* validator in test

Signed-off-by: zethson <[email protected]>

* try uv for rtd

Signed-off-by: zethson <[email protected]>

* rtd uv

Signed-off-by: zethson <[email protected]>

* rtd uv

Signed-off-by: zethson <[email protected]>

* rtd uv fix

Signed-off-by: zethson <[email protected]>

* mb sphinx fix for validator

Signed-off-by: zethson <[email protected]>

* docs

Signed-off-by: zethson <[email protected]>

* remove PerturbationValidator from docs

Signed-off-by: zethson <[email protected]>

* remove PerturbationValidator from docs

Signed-off-by: zethson <[email protected]>

---------

Signed-off-by: zethson <[email protected]>

* Latest OS for RTD

* Remove curator again

Signed-off-by: zethson <[email protected]>

* Fix jax random array (#686)

* Fix jax random array

Signed-off-by: zethson <[email protected]>

* Fix further jax warnings

Signed-off-by: zethson <[email protected]>

* Fix edger

Signed-off-by: zethson <[email protected]>

* Fix choice

Signed-off-by: zethson <[email protected]>

---------

Signed-off-by: zethson <[email protected]>

* Switch to formulaic-contrasts (#682)

* Switch to formulaic-contrasts

* Cleanup

* removing design matrix workaround (#691)

Co-authored-by: Emma Dann <[email protected]>

* Fix PyDESeq2

* Update tests

* fix typo in gitignore

* Remove contrast dataclass, which isnt used anywhere

* Fix edgeR rpy2 tests (#692)

* fix broken rpy2 edger tests

* updated edger tests

* Fix tests (scipy)

Signed-off-by: zethson <[email protected]>

* submodule

Signed-off-by: zethson <[email protected]>

* Remove unused code

Signed-off-by: zethson <[email protected]>

* type hints

Signed-off-by: zethson <[email protected]>

---------

Signed-off-by: zethson <[email protected]>
Co-authored-by: Emma Dann <[email protected]>
Co-authored-by: Emma Dann <[email protected]>
Co-authored-by: zethson <[email protected]>

* Release 0.9.5

Signed-off-by: zethson <[email protected]>

* Prepare 0.10.0

Signed-off-by: zethson <[email protected]>

* Added Mixscape seeds and test (#683)

Co-authored-by: Lukas Heumos <[email protected]>

* Fix probability data type (#696)

Signed-off-by: Lukas Heumos <[email protected]>

* Optimize MeanVarDistributionDistance (#697)

* Fix probability data type

Signed-off-by: Lukas Heumos <[email protected]>

* Optimize mean_var distance

Signed-off-by: Lukas Heumos <[email protected]>

---------

Signed-off-by: Lukas Heumos <[email protected]>

* Optimize test speed (#699)

* Try buildjet

Signed-off-by: Lukas Heumos <[email protected]>

* Try buildjet large

Signed-off-by: Lukas Heumos <[email protected]>

* speed up predict_differential_prioritization

Signed-off-by: Lukas Heumos <[email protected]>

* speed up tests

Signed-off-by: Lukas Heumos <[email protected]>

---------

Signed-off-by: Lukas Heumos <[email protected]>

* Lower bound for scikit-learn (#701)

Signed-off-by: Lukas Heumos <[email protected]>

* Fix type annotation

Signed-off-by: Lukas Heumos <[email protected]>

* Fix empty figure returns when show=True in plotting functions (#703)

* Removed show parameter

* Adapt plotting API for Augur, Coda, Dialogue

* Adapted plotting API for Milo, Mixscape, scgen

* Add joblib

* Remove joblib

---------

Co-authored-by: Lukas Heumos <[email protected]>

* Fix scikit-learn intendation

Signed-off-by: Lukas Heumos <[email protected]>

---------

Signed-off-by: zethson <[email protected]>
Signed-off-by: Lukas Heumos <[email protected]>
Co-authored-by: Lilly May <[email protected]>
Co-authored-by: Lukas Heumos <[email protected]>
Co-authored-by: Gregor Sturm <[email protected]>
Co-authored-by: Emma Dann <[email protected]>
Co-authored-by: Emma Dann <[email protected]>
Zethson added a commit that referenced this pull request Feb 20, 2025
* Implement mixture models for guide assignment

Key additions:
- Added a base abstract class "MixtureModel" with numpyro
- Added a first mixture model "Poisson_Gauss_Mixture"
- New function "assign_mixture_model" in GuideAssignment class

* Merge main into branch (#705)

* Set legend anchor as parameter (#660)

* Fix missing space

* Remove explicit anndata in dependencies (#666)

* Incorporate use case tutorials (#665)

* Fixed DEG layer retrieval

* Use-case tutorial icons

* Restructure tutorial page

* Subgroup tutorials

* Improve KNN label_transfer in PerturbationSpace (#658)

* Add uncertainty score in KNN label_transfer in PerturbationSpace
Certainty is quantified as the fraction of nearest neighbors belonging to the classified (i.e. the most abundant) label compared to the total number of nearest neighbors.

* Update pre-commit-config.yaml
Replaces yanked dependency of mypy "types-pkg-resources" with "types-setuptools" as recommended: https://pypi.org/project/types-pkg-resources/

* Improve label imputation in PerturbationSpace class
Key changes:
- Now uses KNN graph in adata: saves cost and increases consistency
- Vectorized operations instead of expensive for loop
- Distance weighting for KNN imputation
- Quantifies uncertainty as local KNN label entropy

* Fixed plotting for mixscape.plot_barplot and sccoda.plot_effects_barplot (#667)

* Augur scsim warnings (#670)

* Augur scsim warnings

Signed-off-by: zethson <[email protected]>

* Submodules

Signed-off-by: zethson <[email protected]>

---------

Signed-off-by: zethson <[email protected]>

* Add PerturbationDataValidator (#672)

* Augur scsim warnings

Signed-off-by: zethson <[email protected]>

* Submodules

Signed-off-by: zethson <[email protected]>

* Add super draft of pertpy validator

Signed-off-by: zethson <[email protected]>

* Polish

Signed-off-by: zethson <[email protected]>

* Polish

Signed-off-by: zethson <[email protected]>

* Nested try

Signed-off-by: zethson <[email protected]>

* validator in test

Signed-off-by: zethson <[email protected]>

* try uv for rtd

Signed-off-by: zethson <[email protected]>

* rtd uv

Signed-off-by: zethson <[email protected]>

* rtd uv

Signed-off-by: zethson <[email protected]>

* rtd uv fix

Signed-off-by: zethson <[email protected]>

* mb sphinx fix for validator

Signed-off-by: zethson <[email protected]>

* docs

Signed-off-by: zethson <[email protected]>

* remove PerturbationValidator from docs

Signed-off-by: zethson <[email protected]>

* remove PerturbationValidator from docs

Signed-off-by: zethson <[email protected]>

---------

Signed-off-by: zethson <[email protected]>

* Latest OS for RTD

* Remove curator again

Signed-off-by: zethson <[email protected]>

* Fix jax random array (#686)

* Fix jax random array

Signed-off-by: zethson <[email protected]>

* Fix further jax warnings

Signed-off-by: zethson <[email protected]>

* Fix edger

Signed-off-by: zethson <[email protected]>

* Fix choice

Signed-off-by: zethson <[email protected]>

---------

Signed-off-by: zethson <[email protected]>

* Switch to formulaic-contrasts (#682)

* Switch to formulaic-contrasts

* Cleanup

* removing design matrix workaround (#691)

Co-authored-by: Emma Dann <[email protected]>

* Fix PyDESeq2

* Update tests

* fix typo in gitignore

* Remove contrast dataclass, which isnt used anywhere

* Fix edgeR rpy2 tests (#692)

* fix broken rpy2 edger tests

* updated edger tests

* Fix tests (scipy)

Signed-off-by: zethson <[email protected]>

* submodule

Signed-off-by: zethson <[email protected]>

* Remove unused code

Signed-off-by: zethson <[email protected]>

* type hints

Signed-off-by: zethson <[email protected]>

---------

Signed-off-by: zethson <[email protected]>
Co-authored-by: Emma Dann <[email protected]>
Co-authored-by: Emma Dann <[email protected]>
Co-authored-by: zethson <[email protected]>

* Release 0.9.5

Signed-off-by: zethson <[email protected]>

* Prepare 0.10.0

Signed-off-by: zethson <[email protected]>

* Added Mixscape seeds and test (#683)

Co-authored-by: Lukas Heumos <[email protected]>

* Fix probability data type (#696)

Signed-off-by: Lukas Heumos <[email protected]>

* Optimize MeanVarDistributionDistance (#697)

* Fix probability data type

Signed-off-by: Lukas Heumos <[email protected]>

* Optimize mean_var distance

Signed-off-by: Lukas Heumos <[email protected]>

---------

Signed-off-by: Lukas Heumos <[email protected]>

* Optimize test speed (#699)

* Try buildjet

Signed-off-by: Lukas Heumos <[email protected]>

* Try buildjet large

Signed-off-by: Lukas Heumos <[email protected]>

* speed up predict_differential_prioritization

Signed-off-by: Lukas Heumos <[email protected]>

* speed up tests

Signed-off-by: Lukas Heumos <[email protected]>

---------

Signed-off-by: Lukas Heumos <[email protected]>

* Lower bound for scikit-learn (#701)

Signed-off-by: Lukas Heumos <[email protected]>

* Fix type annotation

Signed-off-by: Lukas Heumos <[email protected]>

* Fix empty figure returns when show=True in plotting functions (#703)

* Removed show parameter

* Adapt plotting API for Augur, Coda, Dialogue

* Adapted plotting API for Milo, Mixscape, scgen

* Add joblib

* Remove joblib

---------

Co-authored-by: Lukas Heumos <[email protected]>

* Fix scikit-learn intendation

Signed-off-by: Lukas Heumos <[email protected]>

---------

Signed-off-by: zethson <[email protected]>
Signed-off-by: Lukas Heumos <[email protected]>
Co-authored-by: Lilly May <[email protected]>
Co-authored-by: Lukas Heumos <[email protected]>
Co-authored-by: Gregor Sturm <[email protected]>
Co-authored-by: Emma Dann <[email protected]>
Co-authored-by: Emma Dann <[email protected]>

* Merge main into branch (#706)

* Set legend anchor as parameter (#660)

* Fix missing space

* Remove explicit anndata in dependencies (#666)

* Incorporate use case tutorials (#665)

* Fixed DEG layer retrieval

* Use-case tutorial icons

* Restructure tutorial page

* Subgroup tutorials

* Improve KNN label_transfer in PerturbationSpace (#658)

* Add uncertainty score in KNN label_transfer in PerturbationSpace
Certainty is quantified as the fraction of nearest neighbors belonging to the classified (i.e. the most abundant) label compared to the total number of nearest neighbors.

* Update pre-commit-config.yaml
Replaces yanked dependency of mypy "types-pkg-resources" with "types-setuptools" as recommended: https://pypi.org/project/types-pkg-resources/

* Improve label imputation in PerturbationSpace class
Key changes:
- Now uses KNN graph in adata: saves cost and increases consistency
- Vectorized operations instead of expensive for loop
- Distance weighting for KNN imputation
- Quantifies uncertainty as local KNN label entropy

* Fixed plotting for mixscape.plot_barplot and sccoda.plot_effects_barplot (#667)

* Augur scsim warnings (#670)

* Augur scsim warnings

Signed-off-by: zethson <[email protected]>

* Submodules

Signed-off-by: zethson <[email protected]>

---------

Signed-off-by: zethson <[email protected]>

* Add PerturbationDataValidator (#672)

* Augur scsim warnings

Signed-off-by: zethson <[email protected]>

* Submodules

Signed-off-by: zethson <[email protected]>

* Add super draft of pertpy validator

Signed-off-by: zethson <[email protected]>

* Polish

Signed-off-by: zethson <[email protected]>

* Polish

Signed-off-by: zethson <[email protected]>

* Nested try

Signed-off-by: zethson <[email protected]>

* validator in test

Signed-off-by: zethson <[email protected]>

* try uv for rtd

Signed-off-by: zethson <[email protected]>

* rtd uv

Signed-off-by: zethson <[email protected]>

* rtd uv

Signed-off-by: zethson <[email protected]>

* rtd uv fix

Signed-off-by: zethson <[email protected]>

* mb sphinx fix for validator

Signed-off-by: zethson <[email protected]>

* docs

Signed-off-by: zethson <[email protected]>

* remove PerturbationValidator from docs

Signed-off-by: zethson <[email protected]>

* remove PerturbationValidator from docs

Signed-off-by: zethson <[email protected]>

---------

Signed-off-by: zethson <[email protected]>

* Latest OS for RTD

* Remove curator again

Signed-off-by: zethson <[email protected]>

* Fix jax random array (#686)

* Fix jax random array

Signed-off-by: zethson <[email protected]>

* Fix further jax warnings

Signed-off-by: zethson <[email protected]>

* Fix edger

Signed-off-by: zethson <[email protected]>

* Fix choice

Signed-off-by: zethson <[email protected]>

---------

Signed-off-by: zethson <[email protected]>

* Switch to formulaic-contrasts (#682)

* Switch to formulaic-contrasts

* Cleanup

* removing design matrix workaround (#691)

Co-authored-by: Emma Dann <[email protected]>

* Fix PyDESeq2

* Update tests

* fix typo in gitignore

* Remove contrast dataclass, which isnt used anywhere

* Fix edgeR rpy2 tests (#692)

* fix broken rpy2 edger tests

* updated edger tests

* Fix tests (scipy)

Signed-off-by: zethson <[email protected]>

* submodule

Signed-off-by: zethson <[email protected]>

* Remove unused code

Signed-off-by: zethson <[email protected]>

* type hints

Signed-off-by: zethson <[email protected]>

---------

Signed-off-by: zethson <[email protected]>
Co-authored-by: Emma Dann <[email protected]>
Co-authored-by: Emma Dann <[email protected]>
Co-authored-by: zethson <[email protected]>

* Release 0.9.5

Signed-off-by: zethson <[email protected]>

* Prepare 0.10.0

Signed-off-by: zethson <[email protected]>

* Added Mixscape seeds and test (#683)

Co-authored-by: Lukas Heumos <[email protected]>

* Fix probability data type (#696)

Signed-off-by: Lukas Heumos <[email protected]>

* Optimize MeanVarDistributionDistance (#697)

* Fix probability data type

Signed-off-by: Lukas Heumos <[email protected]>

* Optimize mean_var distance

Signed-off-by: Lukas Heumos <[email protected]>

---------

Signed-off-by: Lukas Heumos <[email protected]>

* Optimize test speed (#699)

* Try buildjet

Signed-off-by: Lukas Heumos <[email protected]>

* Try buildjet large

Signed-off-by: Lukas Heumos <[email protected]>

* speed up predict_differential_prioritization

Signed-off-by: Lukas Heumos <[email protected]>

* speed up tests

Signed-off-by: Lukas Heumos <[email protected]>

---------

Signed-off-by: Lukas Heumos <[email protected]>

* Lower bound for scikit-learn (#701)

Signed-off-by: Lukas Heumos <[email protected]>

* Fix type annotation

Signed-off-by: Lukas Heumos <[email protected]>

* Fix empty figure returns when show=True in plotting functions (#703)

* Removed show parameter

* Adapt plotting API for Augur, Coda, Dialogue

* Adapted plotting API for Milo, Mixscape, scgen

* Add joblib

* Remove joblib

---------

Co-authored-by: Lukas Heumos <[email protected]>

* Fix scikit-learn intendation

Signed-off-by: Lukas Heumos <[email protected]>

---------

Signed-off-by: zethson <[email protected]>
Signed-off-by: Lukas Heumos <[email protected]>
Co-authored-by: Lilly May <[email protected]>
Co-authored-by: Lukas Heumos <[email protected]>
Co-authored-by: Gregor Sturm <[email protected]>
Co-authored-by: Emma Dann <[email protected]>
Co-authored-by: Emma Dann <[email protected]>

* Refactor guide assignment logic and enhance mixture model parameters

* Cleanup MixtureModel class

* Enhance guide assignment validation and error handling in GuideAssignment class

* Update dev nb

* Add test for grna_mixture_model

* Remove dev nb

* Update notebook for guide assignment

* Update guide assignment notebooks

* Apply suggestions from code review

Review comments by @Zethson

Co-authored-by: Lukas Heumos <[email protected]>

* Improve code to fit review suggestions
- Added lots of type hints and return types
- Improved naming of variables
- Added and removed a few comments
- Added user warnings if a guide is not expressed at all

* Fix sloppy data dimensions for numpyro
Previously data was (N,1) dim. Now applying ravel, and changed numpyro plates accordingly for correct batching.

* Update test_grna_assignment.py
We changed "Negative" to "negative" :)

* Polish

Signed-off-by: Lukas Heumos <[email protected]>

* Polish

Signed-off-by: Lukas Heumos <[email protected]>

---------

Signed-off-by: zethson <[email protected]>
Signed-off-by: Lukas Heumos <[email protected]>
Co-authored-by: Lilly May <[email protected]>
Co-authored-by: Lukas Heumos <[email protected]>
Co-authored-by: Gregor Sturm <[email protected]>
Co-authored-by: Emma Dann <[email protected]>
Co-authored-by: Emma Dann <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants