Skip to content

Commit ec4fb3a

Browse files
authored
Add DE set comparisons (#489)
* Add DE set comparisons Signed-off-by: zethson <[email protected]> * Add docs Signed-off-by: zethson <[email protected]> --------- Signed-off-by: zethson <[email protected]>
1 parent 83c0350 commit ec4fb3a

File tree

3 files changed

+145
-49
lines changed

3 files changed

+145
-49
lines changed

docs/usage/usage.md

Lines changed: 14 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -80,10 +80,6 @@ Simple functions for:
8080

8181
Assigning guides based on thresholds. Each cell is assigned to the most expressed gRNA if it has at least the specified number of counts.
8282

83-
```{eval-rst}
84-
.. currentmodule:: pertpy
85-
```
86-
8783
```{eval-rst}
8884
.. autosummary::
8985
:toctree: preprocessing
@@ -111,6 +107,20 @@ ga.plot_heatmap(gdo, layer="assigned_guides")
111107

112108
## Tools
113109

110+
### Differential gene expression
111+
112+
Differential gene expression involves the quantitative comparison of gene expression levels between two or more groups,
113+
such as different cell types, tissues, or conditions to discern genes that are significantly up- or downregulated in response to specific biological contexts or stimuli.
114+
Pertpy provides utilities to conduct differential gene expression tests through a common interface that supports complex designs and methods.
115+
116+
```{eval-rst}
117+
.. autosummary::
118+
:toctree: preprocessing
119+
:nosignatures:
120+
121+
tools.DifferentialGeneExpression
122+
```
123+
114124
### Pooled CRISPR screens
115125

116126
#### Mixscape
@@ -122,10 +132,6 @@ Mixscape first tries to remove confounding sources of variation such as cell cyc
122132
Next, it determines which targeted cells were affected by the genetic perturbation (=KO) and which targeted cells were not (=NP) with the use of mixture models.
123133
Finally, it visualizes similarities and differences across different perturbations.
124134

125-
```{eval-rst}
126-
.. currentmodule:: pertpy
127-
```
128-
129135
```{eval-rst}
130136
.. autosummary::
131137
:toctree: tools
@@ -155,10 +161,6 @@ See [mixscape tutorial](https://pertpy.readthedocs.io/en/latest/tutorials/notebo
155161
A Python implementation of Milo for differential abundance testing on KNN graphs, to ease interoperability with scverse pipelines for single-cell analysis.
156162
See [Differential abundance testing on single-cell data using k-nearest neighbor graphs](https://www.nature.com/articles/s41587-021-01033-z) for details on the statistical framework.
157163

158-
```{eval-rst}
159-
.. currentmodule:: pertpy
160-
```
161-
162164
```{eval-rst}
163165
.. autosummary::
164166
:toctree: tools
@@ -195,10 +197,6 @@ milo.da_nhoods(mdata, design="~Status")
195197
Reimplementation of scCODA for identification of compositional changes in high-throughput sequencing count data and tascCODA for sparse, tree-aggregated modeling of high-throughput sequencing data.
196198
See [scCODA is a Bayesian model for compositional single-cell data analysis](https://www.nature.com/articles/s41467-021-27150-6) for statistical methodology and benchmarking performance of scCODA and [tascCODA: Bayesian Tree-Aggregated Analysis of Compositional Amplicon and Single-Cell Data](https://www.frontiersin.org/articles/10.3389/fgene.2021.766405/full) for statistical methodology and benchmarking performance of tascCODA.
197199

198-
```{eval-rst}
199-
.. currentmodule:: pertpy
200-
```
201-
202200
```{eval-rst}
203201
.. autosummary::
204202
:toctree: tools
@@ -246,10 +244,6 @@ sccoda.plot_effects_barplot(
246244
A **work in progress (!)** Python implementation of DIALOGUE for the discovery of multicellular programs.
247245
See [DIALOGUE maps multicellular programs in tissue from single-cell or spatial transcriptomics data](https://www.nature.com/articles/s41587-022-01288-0) for more details on the methodology.
248246

249-
```{eval-rst}
250-
.. currentmodule:: pertpy
251-
```
252-
253247
```{eval-rst}
254248
.. autosummary::
255249
:toctree: tools
@@ -286,10 +280,6 @@ all_results, new_mcps = dl.multilevel_modeling(
286280

287281
#### Enrichment
288282

289-
```{eval-rst}
290-
.. currentmodule:: pertpy
291-
```
292-
293283
```{eval-rst}
294284
.. autosummary::
295285
:toctree: tools
@@ -312,10 +302,6 @@ pt_enricher.score(adata)
312302
General purpose functions for distances and permutation tests.
313303
Reimplements functions from [scperturb](http://projects.sanderlab.org/scperturb/) package.
314304

315-
```{eval-rst}
316-
.. currentmodule:: pertpy
317-
```
318-
319305
```{eval-rst}
320306
.. autosummary::
321307
:toctree: tools
@@ -375,10 +361,6 @@ results["summary_metrics"]
375361

376362
See [augur tutorial](https://pertpy.readthedocs.io/en/latest/tutorials/notebooks/augur.html) for a more elaborate tutorial.
377363

378-
```{eval-rst}
379-
.. currentmodule:: pertpy
380-
```
381-
382364
```{eval-rst}
383365
.. autosummary::
384366
:toctree: tools
@@ -392,10 +374,6 @@ See [augur tutorial](https://pertpy.readthedocs.io/en/latest/tutorials/notebooks
392374
Reimplementation of scGen for perturbation response prediction of scRNA-seq data in Jax.
393375
See [scGen predicts single-cell perturbation responses](https://www.nature.com/articles/s41592-019-0494-8) for more details.
394376

395-
```{eval-rst}
396-
.. currentmodule:: pertpy
397-
```
398-
399377
```{eval-rst}
400378
.. autosummary::
401379
:toctree: tools
@@ -435,10 +413,6 @@ CINEMA-OT separates confounding sources of variation from perturbation effects t
435413
These cell pairs represent causal perturbation responses permitting a number of novel analyses, such as individual treatment effect analysis, response clustering, attribution analysis, and synergy analysis.
436414
See [Causal identification of single-cell experimental perturbation effects with CINEMA-OT](https://www.biorxiv.org/content/10.1101/2022.07.31.502173v3.abstract) for more details.
437415

438-
```{eval-rst}
439-
.. currentmodule:: pertpy
440-
```
441-
442416
```{eval-rst}
443417
.. autosummary::
444418
:toctree: tools
@@ -473,10 +447,6 @@ See [CINEMA-OT tutorial](https://pertpy.readthedocs.io/en/latest/tutorials/noteb
473447

474448
Various modules for calculating and evaluating perturbation spaces.
475449

476-
```{eval-rst}
477-
.. currentmodule:: pertpy
478-
```
479-
480450
```{eval-rst}
481451
.. autosummary::
482452
:toctree: tools
@@ -533,10 +503,6 @@ Available databases for mechanism of action metadata:
533503

534504
- [CLUE](https://clue.io/)
535505

536-
```{eval-rst}
537-
.. currentmodule:: pertpy
538-
```
539-
540506
```{eval-rst}
541507
.. autosummary::
542508
:toctree: metadata

pertpy/tools/_differential_gene_expression.py

Lines changed: 74 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,10 @@
55
import decoupler as dc
66
import numpy as np
77
import numpy.typing as npt
8+
import pandas as pd
9+
from scipy.stats import kendalltau, pearsonr, spearmanr
810

911
if TYPE_CHECKING:
10-
import pandas as pd
1112
from anndata import AnnData
1213

1314

@@ -143,6 +144,78 @@ def filter_by_prop(self, adata: AnnData, min_prop: float = 0.2, min_samples: int
143144

144145
return filtered_adata
145146

147+
def calculate_correlation(
148+
self,
149+
de_res_1: pd.DataFrame,
150+
de_res_2: pd.DataFrame,
151+
method: Literal["spearman", "pearson", "kendall-tau"] = "spearman",
152+
) -> pd.DataFrame:
153+
"""Calculate the Spearman correlation coefficient for 'pvals_adj' and 'logfoldchanges' columns.
154+
155+
Args:
156+
de_res_1: A DataFrame with DE result columns.
157+
de_res_2: Another DataFrame with the same DE result columns.
158+
method: The correlation method to apply. One of `spearman`, `pearson`, `kendall-tau`.
159+
Defaults to `spearman`.
160+
161+
Returns:
162+
A DataFrame with the Spearman correlation coefficients for 'pvals_adj' and 'logfoldchanges'.
163+
"""
164+
columns_of_interest = ["pvals_adj", "logfoldchanges"]
165+
correlation_data = {}
166+
for col in columns_of_interest:
167+
match method:
168+
case "spearman":
169+
correlation, _ = spearmanr(de_res_1[col], de_res_2[col])
170+
case "pearson":
171+
correlation, _ = pearsonr(de_res_1[col], de_res_2[col])
172+
case "kendall-tau":
173+
correlation, _ = kendalltau(de_res_1[col], de_res_2[col])
174+
case _:
175+
raise ValueError("Unknown correlation method.")
176+
correlation_data[col] = correlation
177+
178+
return pd.DataFrame([correlation_data], columns=columns_of_interest)
179+
180+
def calculate_jaccard_index(self, de_res_1: pd.DataFrame, de_res_2: pd.DataFrame, threshold: float = 0.05) -> float:
181+
"""Calculate the Jaccard index for sets of significantly expressed genes/features based on a p-value threshold.
182+
183+
Args:
184+
de_res_1: A DataFrame with DE result columns, including 'pvals'.
185+
de_res_2: Another DataFrame with the same DE result columns.
186+
threshold: A threshold for determining significant expression (default is 0.05).
187+
188+
Returns:
189+
The Jaccard index.
190+
"""
191+
significant_set_1 = set(de_res_1[de_res_1["pvals"] <= threshold].index)
192+
significant_set_2 = set(de_res_2[de_res_2["pvals"] <= threshold].index)
193+
194+
intersection = significant_set_1.intersection(significant_set_2)
195+
union = significant_set_1.union(significant_set_2)
196+
197+
return len(intersection) / len(union) if union else 0
198+
199+
def calculate_cohens_d(self, de_res_1: pd.DataFrame, de_res_2: pd.DataFrame) -> pd.Series:
200+
"""Calculate Cohen's D for the logfoldchanges.
201+
202+
Args:
203+
de_res_1: A DataFrame with DE result columns, including 'logfoldchanges'.
204+
de_res_2: Another DataFrame with the same DE result columns.
205+
206+
Returns:
207+
A pandas Series containing Cohen's D for each gene/feature.
208+
"""
209+
means_1 = de_res_1["logfoldchanges"].mean()
210+
means_2 = de_res_2["logfoldchanges"].mean()
211+
sd_1 = de_res_1["logfoldchanges"].std()
212+
sd_2 = de_res_2["logfoldchanges"].std()
213+
214+
pooled_sd = np.sqrt((sd_1**2 + sd_2**2) / 2)
215+
cohens_d = (means_1 - means_2) / pooled_sd
216+
217+
return cohens_d
218+
146219
def de_analysis(
147220
self,
148221
adata: AnnData,
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
import pandas as pd
2+
import pertpy as pt
3+
import pytest
4+
5+
6+
@pytest.fixture
7+
def dummy_de_results():
8+
data1 = {"pvals": [0.1, 0.2, 0.3, 0.4], "pvals_adj": [0.1, 0.25, 0.35, 0.45], "logfoldchanges": [1, 2, 3, 4]}
9+
data2 = {"pvals": [0.1, 0.2, 0.3, 0.4], "pvals_adj": [0.15, 0.2, 0.35, 0.5], "logfoldchanges": [2, 3, 4, 5]}
10+
de_res_1 = pd.DataFrame(data1)
11+
de_res_2 = pd.DataFrame(data2)
12+
13+
return de_res_1, de_res_2
14+
15+
16+
@pytest.fixture
17+
def pt_de():
18+
pt_de = pt.tl.DifferentialGeneExpression()
19+
return pt_de
20+
21+
22+
def test_calculate_spearman_correlation(dummy_de_results, pt_de):
23+
de_res_1, de_res_2 = dummy_de_results
24+
25+
result = pt_de.calculate_correlation(de_res_1, de_res_2, method="spearman")
26+
assert result.shape == (1, 2)
27+
assert all(column in result for column in ["pvals_adj", "logfoldchanges"])
28+
29+
30+
def test_calculate_pearson_correlation(dummy_de_results, pt_de):
31+
de_res_1, de_res_2 = dummy_de_results
32+
33+
result = pt_de.calculate_correlation(de_res_1, de_res_2, method="pearson")
34+
assert result.shape == (1, 2)
35+
assert all(column in result for column in ["pvals_adj", "logfoldchanges"])
36+
37+
38+
def test_calculate_kendall_tau__correlation(dummy_de_results, pt_de):
39+
de_res_1, de_res_2 = dummy_de_results
40+
41+
result = pt_de.calculate_correlation(de_res_1, de_res_2, method="kendall-tau")
42+
assert result.shape == (1, 2)
43+
assert all(column in result for column in ["pvals_adj", "logfoldchanges"])
44+
45+
46+
def test_jaccard_index(dummy_de_results, pt_de):
47+
de_res_1, de_res_2 = dummy_de_results
48+
49+
jaccard_index = pt_de.calculate_jaccard_index(de_res_1, de_res_2)
50+
assert 0 <= jaccard_index <= 1
51+
52+
53+
def test_calculate_cohens_d(dummy_de_results, pt_de):
54+
de_res_1, de_res_2 = dummy_de_results
55+
56+
cohens_d = pt_de.calculate_cohens_d(de_res_1, de_res_2)
57+
assert isinstance(cohens_d, float)

0 commit comments

Comments
 (0)