dataset: synthetic from PANTHER #25

tristan-f-r · 2025-07-01T18:44:17Z

This does not add anything to config/*.yaml.

Co-Authored-By: Neha Talluri [email protected]
Co-Authored-By: Oliver Faulkner Anderson [email protected]
Co-Authored-By: Altaf Barelvi [email protected]

Co-Authored-By: Neha Talluri <[email protected]> Co-Authored-By: Oliver Faulkner Anderson <[email protected]> Co-Authored-By: Altaf Barelvi <[email protected]>

ntalluri · 2025-07-01T20:18:37Z

datasets/synthetic-data/Snakefile

@@ -0,0 +1,100 @@
+pathways = ["Apoptosis_signaling", "B_cell_activation",


Each of the files have this variable. I think we should have it only in the snakefile and send this list to each of the files that use this pathway list

ntalluri · 2025-07-01T20:18:44Z

datasets/synthetic-data/Snakefile

+            "FGF_signaling", "Interferon_gamma_signaling",
+            "JAK_STAT_signaling", "VEGF_signaling"]
+# TODO: deduplicate this from thresholding scripts by passing it in?
+thresholds = [1, 100, 200, 300, 400, 500, 600, 700, 800, 900]


similar thing for the thresholds

ntalluri · 2025-07-03T18:16:43Z

A question I’d appreciate feedback on: Currently, we generate separate source, target, and prize files for each pathway, but we combine all pathways into each thresholded interactome. Should we also create a combined list of sources, targets, and prizes? Should we also combine the gold standard as well? Or would it be better to keep separate interactomes for each individual pathway (keep it the way it is)?

tristan-f-r · 2025-07-28T18:11:41Z

We should have separate gold standards.

ntalluri · 2025-07-28T18:33:18Z

When this is reviewed (or before) we should do tests to see how connected the networks are after thresholding, adding back the pathway data, and removing proteins that don't have uniprot ids.

ntalluri · 2025-07-28T18:52:38Z

Also there is a chance we can use more panther pathways, we should look to see what else we can use from pathway commons.

ntalluri · 2025-07-31T16:35:50Z

@oliverfanderson @ctrlaltaf For the gold standard nodes (and potentially the edges), should we exclude source, target, and prize nodes when defining it? Currently, it looks like we’re including these nodes in the gold standard for each pathway. These nodes overlap with the gold standard, but that overlap should happen naturally, not by construction/being predefined. I’m concerned this could inflate our precision and recall metrics, because of a form of data leakage.

ntalluri · 2025-08-11T21:25:55Z

@oliverfanderson @ctrlaltaf For the gold standard nodes (and potentially the edges), should we exclude source, target, and prize nodes when defining it? Currently, it looks like we’re including these nodes in the gold standard for each pathway. These nodes overlap with the gold standard, but that overlap should happen naturally, not by construction/being predefined. I’m concerned this could inflate our precision and recall metrics, because of a form of data leakage.

Plan to keep all of them in the gold standard. But update the evaluation code to deal with the sources/targets/prizes being in the gold standard and shown as a different baseline where those are all set as frequency 1.0.

ntalluri · 2025-08-13T19:33:24Z

Should we also consider how sparse an interactome becomes after applying a threshold to the STRING interactome? When we filter by size, we implicitly accounting for the decrease in graph density as well. Would it make more sense to treat size and density as separate variables when evaluating performance? However, does testing for density even matter in this context; are there any interactomes that aren’t already highly connected?

I’m thinking we should first threshold the interactomes, then select only those that are highly connected (e.g., density ≥ 0.85). From that subset, we could choose a few to represent different size scales.

feat: synthetic pathways

20b1580

Co-Authored-By: Neha Talluri <[email protected]> Co-Authored-By: Oliver Faulkner Anderson <[email protected]> Co-Authored-By: Altaf Barelvi <[email protected]>

tristan-f-r added the dataset Mutating datasets in any way. label Jul 1, 2025

ntalluri reviewed Jul 1, 2025

View reviewed changes

Merge branch 'main' into synthetic

fc12b4e

tristan-f-r mentioned this pull request Jul 30, 2025

dataset: DepMap #41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

dataset: synthetic from PANTHER #25

dataset: synthetic from PANTHER #25

Uh oh!

tristan-f-r commented Jul 1, 2025 •

edited

Loading

Uh oh!

ntalluri Jul 1, 2025

Uh oh!

ntalluri Jul 1, 2025

Uh oh!

ntalluri commented Jul 3, 2025

Uh oh!

tristan-f-r commented Jul 28, 2025

Uh oh!

ntalluri commented Jul 28, 2025 •

edited

Loading

Uh oh!

ntalluri commented Jul 28, 2025

Uh oh!

ntalluri commented Jul 31, 2025

Uh oh!

ntalluri commented Aug 11, 2025 •

edited

Loading

Uh oh!

ntalluri commented Aug 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -0,0 +1,100 @@
		pathways = ["Apoptosis_signaling", "B_cell_activation",

dataset: synthetic from PANTHER #25

Are you sure you want to change the base?

dataset: synthetic from PANTHER #25

Uh oh!

Conversation

tristan-f-r commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ntalluri Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

ntalluri Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

ntalluri commented Jul 3, 2025

Uh oh!

tristan-f-r commented Jul 28, 2025

Uh oh!

ntalluri commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ntalluri commented Jul 28, 2025

Uh oh!

ntalluri commented Jul 31, 2025

Uh oh!

ntalluri commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ntalluri commented Aug 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tristan-f-r commented Jul 1, 2025 •

edited

Loading

ntalluri commented Jul 28, 2025 •

edited

Loading

ntalluri commented Aug 11, 2025 •

edited

Loading