From 83f61d4d7866e420c5a1ae1718d075ef0e9b4f06 Mon Sep 17 00:00:00 2001 From: Jana Ebler <47976081+eblerjana@users.noreply.github.com> Date: Tue, 5 Jul 2022 09:29:14 +0200 Subject: [PATCH] Update README.md --- README.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/README.md b/README.md index 5489328..3214a20 100644 --- a/README.md +++ b/README.md @@ -133,6 +133,15 @@ With the data described here: https://doi.org/10.1038/s41588-022-01043-w, PanGen The largest dataset that we have tested contained around 16M variants, 64 haplotypes and around 30x read coverage. Using 24 cores, PanGenie run in 1 hour and 46 minutes (24 CPU hours) and used 120 GB of RAM. +## Notes + +The largest panel we have run PanGenie on so far consisted of 44 samples (88 haplotypes). On this data, PanGenie needed 53 CPU hours (03:15 h wallclock time using 24 cores) and 153 GB of memory in order to genotype 20,661,169 variants. + +## Limitations + +The runtime of PanGenie gets slow as the number of haplotype paths increases. Due to technical reasons, the current implementation of PanGenie cannot handle more than 254 input haplotypes (127 diploid samples). +In order to efficiently handle panels of this size and larger, the underlying model needs to be optimized. + ## Demo