Skip to content

Latest commit

 

History

History
18 lines (12 loc) · 1.35 KB

input_data.md

File metadata and controls

18 lines (12 loc) · 1.35 KB

Input data

Requirement on #SNPs and #individuals

In order for the desirable properties of the h2-GRE estimator to hold, h2-GRE requires individual-level genotype and phenotype data where the number of individuals (N) is larger than the number of genotyped SNPs per chromosome (m_k for the k-th chromosome).

As a reference point, in our analyses of the UK Biobank genotyped SNPs (UK Biobank Axiom Array), chromosome 1 (the largest chromosomes) had about 45-55K SNPs depending on how we QC-ed the data, while number of individuals is about 300K.

Prepare the data

The genotype and phenotypes should be in PLINK format. We will also use PLINK software to perform OLS regression.

!!! note Here we assume that you know the basic about plink, e.g. basic commands, data format. You can read more on the plink website about covariate file, phenotype file and association analysis.

We will use the following data.

  • all_bfile: Genotype for the unrelated individuals in UK Biobank after quality control. This contains genome-wide data in one file.
  • pheno_file: the phenotype file.
  • covar_file: the covariates which will be used for association studies.