simulation code upload

clarkenj · clarkenj · commit b86f3b493214 · 2023-10-30T11:54:50.000-04:00
diff --git a/README.md b/README.md
@@ -1,2 +1,13 @@
 # cwas_simulation
-Simulate a connectome wide association study with real data
+Simulate a connectome wide association study (CWAS) with real data.
+
+This repo contains code used to run a CWAS simulation for the paper "Investigating the convergence of resting-state functional connectivity profiles in Alzheimer’s disease with neuropsychiatric symptoms and schizophrenia".
+
+The CWAS contrasts cases and controls, using whole-brain connectomes with functional connectivity (FC) standardised (z-scored) based on the variance of the control group. CWAS is conducted using linear regression, with z-scored FC as the dependent variable, and case/control label as the explanatory variable. The models are adjusted for site of data collection. The linear regression is applied to each connection (here, 2080), with the resulting β values determining the case FC profile. Number of tests is corrected using Benjamini-Hochberg correction for FDR at a threshold of q < 0.1. The top-10% effect size of the group label on FC is measured as the mean of the top decile of the absolute β values.
+
+The simulation uses real connectomes from neurotypical control participants. The participants are first randomly split into two equal-sized groups. An effect of disease on FC is modelled by altering pi% of connections for one group ("cases"), given by the equation connection<sub>i</sub> = connection<sub>i</sub> + *d* x std, where *d* is equal to a previously published effect size (Cohen’s *d*) for the disease, and std is equal to the standard deviation of the connectivity values combined across groups. This process is repeated (default=100), FDR corrected at a threshold of *q*, and the average sensitivity and specificity calculated.
+
+The notebook `run_simulation.ipynb` runs the simulation to estimate the sensitivity and specificity for a range of sample sizes with a *d* of 0.3, pi of 20% and q of 0.1. Connectomes used in this simulation were derived from the Autism Brain Imaging Data Exchange (ABIDE-1 and -2) initiative.
+
+
+
diff --git a/requirements.txt b/requirements.txt
@@ -0,0 +1,4 @@
+numpy>=1.23.1
+pandas>=1.4.3
+scikit-learn>=1.2.2
+statsmodels>=0.14.0
diff --git a/run_simulation.ipynb b/run_simulation.ipynb
@@ -0,0 +1,145 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "71e121d4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import simulation_tools as sim\n",
+    "import pandas as pd"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "fb17ca19",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Path to a .csv file with connectomes in upper triangular form\n",
+    "path_conn = \"/home/neuromod/ad_sz/data/abide/abide1_2_controls_concat.csv\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "8b6c3bf5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Load control connectomes from ABIDE\n",
+    "conn_df = pd.read_csv(path_conn)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "5df7cb69",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create a range of N values\n",
+    "N_values = range(300, 951, 50)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "3b4565ab",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Simulation ran for N=300.\n",
+      "Simulation ran for N=350.\n",
+      "Simulation ran for N=400.\n",
+      "Simulation ran for N=450.\n",
+      "Simulation ran for N=500.\n",
+      "Simulation ran for N=550.\n",
+      "Simulation ran for N=600.\n",
+      "Simulation ran for N=650.\n",
+      "Simulation ran for N=700.\n",
+      "Simulation ran for N=750.\n",
+      "Simulation ran for N=800.\n",
+      "Simulation ran for N=850.\n",
+      "Simulation ran for N=900.\n",
+      "Simulation ran for N=950.\n"
+     ]
+    }
+   ],
+   "source": [
+    "result_list = []\n",
+    "# Loop through the values of N and run simulation with specififed parameters\n",
+    "for N in N_values:\n",
+    "    result = sim.run_multiple_simulation(conn_df, N=N, pi=0.20, d=0.3, q=0.1, num_sample=100)\n",
+    "    print(f\"Simulation ran for N={N}.\")\n",
+    "    result_list.append(result)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "ebf2916e",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Estimated mean sensitivity to detect d=0.3, with pi=0.2%, q=0.1 and N=300: 0.52, with mean specificity of 0.99.\n",
+      "Estimated mean sensitivity to detect d=0.3, with pi=0.2%, q=0.1 and N=350: 0.62, with mean specificity of 0.99.\n",
+      "Estimated mean sensitivity to detect d=0.3, with pi=0.2%, q=0.1 and N=400: 0.72, with mean specificity of 0.99.\n",
+      "Estimated mean sensitivity to detect d=0.3, with pi=0.2%, q=0.1 and N=450: 0.78, with mean specificity of 0.98.\n",
+      "Estimated mean sensitivity to detect d=0.3, with pi=0.2%, q=0.1 and N=500: 0.84, with mean specificity of 0.98.\n",
+      "Estimated mean sensitivity to detect d=0.3, with pi=0.2%, q=0.1 and N=550: 0.88, with mean specificity of 0.98.\n",
+      "Estimated mean sensitivity to detect d=0.3, with pi=0.2%, q=0.1 and N=600: 0.91, with mean specificity of 0.98.\n",
+      "Estimated mean sensitivity to detect d=0.3, with pi=0.2%, q=0.1 and N=650: 0.94, with mean specificity of 0.98.\n",
+      "Estimated mean sensitivity to detect d=0.3, with pi=0.2%, q=0.1 and N=700: 0.95, with mean specificity of 0.98.\n",
+      "Estimated mean sensitivity to detect d=0.3, with pi=0.2%, q=0.1 and N=750: 0.96, with mean specificity of 0.98.\n",
+      "Estimated mean sensitivity to detect d=0.3, with pi=0.2%, q=0.1 and N=800: 0.97, with mean specificity of 0.98.\n",
+      "Estimated mean sensitivity to detect d=0.3, with pi=0.2%, q=0.1 and N=850: 0.98, with mean specificity of 0.98.\n",
+      "Estimated mean sensitivity to detect d=0.3, with pi=0.2%, q=0.1 and N=900: 0.99, with mean specificity of 0.98.\n",
+      "Estimated mean sensitivity to detect d=0.3, with pi=0.2%, q=0.1 and N=950: 0.99, with mean specificity of 0.98.\n"
+     ]
+    }
+   ],
+   "source": [
+    "for result in result_list:\n",
+    "    print(result)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cd6f422f",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/simulation_tools.py b/simulation_tools.py