-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simulate test sets using Arabidopsis genome and chloroplast sequence #6
Comments
Available Data from F1: Simulating Data with ART: http://www.niehs.nih.gov/research/resources/software/biostatistics/art/
(only 16x coverage because reads of Chloro and A.thaliana was used 6x )
|
I don't hate the idea. However, one thing to consider is that if we target 200x chloroplast coverage the last dataset would require a genomic coverage of 200,000x |
100% agree, but I would like to know what will happen if we only provide rare chloroplast sequences. Wrong assemblies? Error messages? Anything else? Nevertheless, we are using a definition for ratio 1:1 of one complete host genome to one complete chloroplast genome. Another definition is also possible: 1:1 in that case means, that one read belongs to the host genome and the second read belongs to the chloroplast genome. (I just wanted to state that here to ensure, that we later remember our definition) :) |
Yeah, good point. I'd suggest we first try it with 10:1 then. We could use a 500x covered genome (so chloroplast will be coverd 50x). With default parameters I expect ChloroExtractor to fail when it tries to scale reads to 200x coverage. We can then re-run ChloroExtractor with target coverage of 40x to see what happens then. I'm also curious how the other tools behave. |
Generate test sets to evaluate assembler performance.
Therefore, use Arabidopsis genome and chloroplast from Genbank and simulate short read libraries fulfilling those characteristics:
Take care of the circular sequence of the chloroplast genome!
Use a simulation software which allows the usage of a random seed to ensure reproducability. Maybe this paper gives some ideas which tool to use.
The text was updated successfully, but these errors were encountered: