You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since SNPs that are close to each other are passed down together in populations, genetic variability can separate different countries and geographical locations to a certain extent (see links). It would be cool to know whether a given genomic region has genetic variability that differs between ethnicities or geographical location. This might be helpful to suggest whether there might be different health effects from that region in different populations. This could be done in a supervised or unsupervised way.
Supervised: Get a large amount of genetic data and personal data. For each ethnicity, correlate genetic variability with that ethnicity. Use those correlation values to give a score to each region for how much it varies between other ethnicities and that ethnicity (bar plot or heatmap).
Unsupervised: Do PCA of a large amount of genetic data as has been done before (refs). PC loadings would give a score to regions that represents their inter-ethnic/geographic variability along the largest axes of genetic variation.
Note: people in our department might have easy access to/familiarity with this type of genetic data.
Ok, so the idea here is that, given a BED file, I would give you a plot showing the degree of genetic variation for each of a series of ethnicities, aggregated across the regions that you've provided -- right?
I was more referring to quantifying variation between different ethnicities. That way you might be able to infer whether various ethnicities might have different regulation of that region set.
Since SNPs that are close to each other are passed down together in populations, genetic variability can separate different countries and geographical locations to a certain extent (see links). It would be cool to know whether a given genomic region has genetic variability that differs between ethnicities or geographical location. This might be helpful to suggest whether there might be different health effects from that region in different populations. This could be done in a supervised or unsupervised way.
Supervised: Get a large amount of genetic data and personal data. For each ethnicity, correlate genetic variability with that ethnicity. Use those correlation values to give a score to each region for how much it varies between other ethnicities and that ethnicity (bar plot or heatmap).
Unsupervised: Do PCA of a large amount of genetic data as has been done before (refs). PC loadings would give a score to regions that represents their inter-ethnic/geographic variability along the largest axes of genetic variation.
Note: people in our department might have easy access to/familiarity with this type of genetic data.
Links:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2735096/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5644186/
The text was updated successfully, but these errors were encountered: