-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathbiologistSlides.html
608 lines (559 loc) · 25 KB
/
biologistSlides.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="generator" content="pandoc">
<meta name="author" content="Married with Scaffolds" />
<title>PacBio <3 Illumina</title>
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no">
<link rel="stylesheet" href="reveal.js/css/reveal.min.css"/>
<style type="text/css">code{white-space: pre;}</style>
<link rel="stylesheet" href="reveal.js/css/theme/simple.css" id="theme">
<link rel="stylesheet" media="print" href="reveal.js/css/print/pdf.css" />
<!--[if lt IE 9]>
<script src="reveal.js/lib/js/html5shiv.js"></script>
<![endif]-->
</head>
<body>
<div class="reveal">
<div class="slides">
<section>
<h1 class="title">PacBio <3 Illumina</h1>
<h2 class="author">Married with Scaffolds</h2>
<h3 class="date">Heinz Ekker (CSF.NGS) 2014-03-12</h3>
</section>
<section class="slide level1">
<h2 id="pacbio-3-illumina">PacBio <3 Illumina</h2>
<h3 id="a-short-introduction-to-hybrid-de-novo-genome-assembly-combining-illumina-short-reads-with-pacbio-long-reads.">A short introduction to hybrid <em>de novo</em> genome assembly combining Illumina short reads with Pacbio long reads.</h3>
<p>Pipeline scripts, markdown source code and data for assembly, analysis and presentation available at <a href="https://github.com/h3kker/assemblyTalk">https://github.com/h3kker/assemblyTalk</a></p>
</section>
<section class="slide level1">
<h2 id="content">Content</h2>
<ol type="1">
<li>An Idiot's Guide to Assembly & PacBio Technology</li>
<li>Error Correction and Hybrid Assembly Strategies</li>
<li>Results</li>
<li>Assembly Assessment</li>
<li>Elsewhere</li>
</ol>
</section>
<section id="assembly-basics" class="slide level1">
<h1>Assembly Basics</h1>
</section>
<section class="slide level1">
<h3 id="standard-assembly">Standard Assembly</h3>
<figure>
<img src="figure/assemblySteps1.svg" alt="Assembly Steps" /><figcaption>Assembly Steps</figcaption>
</figure>
</section>
<section class="slide level1">
<h3 id="error-correction">Error Correction</h3>
<figure>
<img src="figure/assemblySteps2.svg" alt="Assembly Steps" /><figcaption>Assembly Steps</figcaption>
</figure>
</section>
<section class="slide level1">
<h3 id="preprocessing">Preprocessing</h3>
<ul>
<li>Illumina Error Correction</li>
<li>Error Correction of Pacbio reads using Short Reads (Illumina)</li>
<li>Adapter trimming</li>
<li>Quality trimming</li>
<li>Deduplication</li>
</ul>
<p>Some assemblers depend on other, existing tools to perform these steps or do one or more as part of the pipeline.</p>
</section>
<section class="slide level1">
<h3 id="graph-construction">Graph Construction</h3>
<figure>
<img src="figure/drinks.png" alt="Assembling Hemingway" /><figcaption>Assembling Hemingway</figcaption>
</figure>
</section>
<section class="slide level1">
<h3 id="graph-construction-1">Graph Construction</h3>
<figure>
<img src="figure/assemblyGraph.jpg" alt="Suffix-to-Prefix Graph" /><figcaption>Suffix-to-Prefix Graph</figcaption>
</figure>
</section>
<section class="slide level1">
<h3 id="graph-construction---path-enumeration">Graph Construction - Path Enumeration</h3>
<blockquote>
<p>They stared at the drinks were gone</p>
</blockquote>
<blockquote>
<p>They stared at the drinks went gone</p>
</blockquote>
<blockquote>
<p>They stared at the drinks the drinks were gone</p>
</blockquote>
<blockquote>
<p>....</p>
</blockquote>
<blockquote>
<p>They stared at the drinks. The drinks went warm. They drank.</p>
</blockquote>
<figure>
<img src="figure/assemblyPath.jpg" alt="Longest Path" /><figcaption>Longest Path</figcaption>
</figure>
</section>
<section class="slide level1">
<h3 id="graph-construction---strategies">Graph Construction - Strategies</h3>
<p><strong>Overlap Consensus Layout, eg. Celera, SGA</strong></p>
<p>Each read is represented by one node. Node 1 and 2 are connected if the end of read 1 matches start of read 2 with a minimum overlap of <em>k</em>. The parameter <em>k</em> determines how complex the graph will be (the lower it is the more nodes are connected). Limited by the data itself (polyploidity, sequencing errors). The ideal assembly visits every node once (Hamilton Path).</p>
<p>String Graphs are a special variant where all transitive edges <code>((1,2), (1, 3), (2, 3))</code> are reduced to <code>((1,3))</code>, to <em>irreducible edges</em>.</p>
</section>
<section class="slide level1">
<h3 id="graph-construction---strategies-1">Graph Construction - Strategies</h3>
<p><strong>K-mer based, eg. Abyss, SOAPdenovo</strong></p>
<p>All reads are chopped into kmers, each kmer is represented by one node. Two kmers are connected if there is a <code>k-1</code> overlap between the nodes (de Bruijn graph). The ideal assembly visits each edge exactly once (Euler path).</p>
<p>K-mer size (parameter <em>k</em>) should be chosen large enough to reduce the number of wrong connections between contigs, but small enough to allow for errors.</p>
<p><em>Hybrid strategies proposed: Combine contig and graph output from both types of assemblers.</em></p>
</section>
<section class="slide level1">
<h3 id="graph-simplification">Graph Simplification</h3>
<p>Graph structure is very complex due to</p>
<ul>
<li>transitive edges like <code>((1,2), (1,3), (2,3))</code></li>
<li>consecutive nodes like <code>((1,2), (2,3), (3,4))</code></li>
<li>error reads (branches that converge again later)</li>
<li>spurious branch points on repeat edges</li>
<li>dead ends (tips)</li>
</ul>
</section>
<section class="slide level1">
<h3 id="graph-simplification---remove-transitive-edges">Graph Simplification - Remove Transitive Edges</h3>
<p>Transitive Edges do not add information, they can be removed.</p>
<figure>
<img src="figure/assemblyTransitive.jpg" alt="1. Merge Transitive Edges" /><figcaption>1. Merge Transitive Edges</figcaption>
</figure>
</section>
<section class="slide level1">
<h3 id="graph-simplification---node-merging">Graph Simplification - Node Merging</h3>
<p>Collapse nodes that connect unambiguously (without branching) into one node representing the merged sequence.</p>
<figure>
<img src="figure/assemblyConsecutive.jpg" alt="2. Merge Consecutive Nodes" /><figcaption>2. Merge Consecutive Nodes</figcaption>
</figure>
</section>
<section class="slide level1">
<h3 id="graph-simplification---dead-end-removal">Graph Simplification - Dead End Removal</h3>
<p>Sometimes also: tip erosion. Remove all nodes with connections only in one direction. These can be caused by low coverage regions and read errors. Can also shorten valid contigs!</p>
<figure>
<img src="figure/assemblyDeadEnd.jpg" alt="Dead End" /><figcaption>Dead End</figcaption>
</figure>
</section>
<section class="slide level1">
<h3 id="graph-simplification---bubble-popping">Graph Simplification - Bubble Popping</h3>
<p>Bubbles due to sequencing errors or polyploid genomes, heterozygosity. Selection of branch based on different criteria like coverage, quality.</p>
<figure>
<img src="figure/assemblyBubble.jpg" alt="Bubble Popping" /><figcaption>Bubble Popping</figcaption>
</figure>
</section>
<section class="slide level1">
<h3 id="graph-simplification---repeat-tangles">Graph Simplification - Repeat tangles</h3>
<p>Formed in repeated regions, were many reconstructions are possible. Resolved by forming parallel paths. Paired-End constraints can be used to discard invalid edges (too short, too long reconstruction).</p>
<figure>
<img src="figure/assemblyUntangle.jpg" alt="Create parallel paths" /><figcaption>Create parallel paths</figcaption>
</figure>
</section>
<section class="slide level1">
<h3 id="postprocessing">Postprocessing</h3>
<p><strong>Contigs</strong>: Build contiguous stretches of sequence, filter and correct (consensus)</p>
<p><strong>Scaffolds</strong>: Either with built-in scaffolder or external program. Most assemblers come with their own scaffolder for PE or mate pair library information. Using Pacbio CLRs not yet popular.</p>
<p>Missing sequence information is filled with N (assembly gaps)</p>
</section>
<section class="slide level1">
<h3 id="postprocessing---scaffolding">Postprocessing - Scaffolding</h3>
<p>Use paired end information to join and orient contigs. Can also detect and filter misjoined contigs.</p>
<figure>
<img src="figure/assemblyScaffolding2.jpg" alt="Scaffolding" /><figcaption>Scaffolding</figcaption>
</figure>
</section>
<section class="slide level1">
<h3 id="choosing-your-assembler">Choosing your Assembler</h3>
<p>They all follow the same principles! Main "unique selling points" seem to be algorithms and data structures. The strategies and heuristics employed in graph simplification and postprocessing make the difference in results.</p>
</section>
<section class="slide level1">
<h3 id="differences-between-assemblers-...-and-datasets">Differences between Assemblers ... and datasets</h3>
<figure>
<img src="figure/assemblathon2Cmp.jpg" alt="Bradnam KR et al. Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Gigascience. 2013;2(1):10." /><figcaption>Bradnam KR et al. Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Gigascience. 2013;2(1):10.</figcaption>
</figure>
</section>
<section id="pacbio-basics" class="slide level1">
<h1>PacBio Basics</h1>
<figure>
<img src="figure/pacbioSequencing.jpg" alt="Library Preparation + Sequencing" /><figcaption>Library Preparation + Sequencing</figcaption>
</figure>
</section>
<section class="slide level1">
<h2 id="pacbio-basics-1">PacBio Basics</h2>
<figure>
<img src="figure/subreadFiltering.jpg" alt="Subread Filtering" /><figcaption>Subread Filtering</figcaption>
</figure>
</section>
<section id="pacbio-error-correction" class="slide level1">
<h1>Pacbio Error Correction</h1>
</section>
<section class="slide level1">
<h2 id="preassembly-and-pacbiotoca">PreAssembly and pacBioToCA</h2>
<figure>
<img src="figure/errorCorrection.jpg" alt="Error Correction Workflows" /><figcaption>Error Correction Workflows</figcaption>
</figure>
</section>
<section class="slide level1">
<h3 id="pacbiotoca-error-correction">pacBioToCA Error Correction</h3>
<p>see <a href="report.html#toc_17">report.html</a></p>
<ul>
<li>Corrected reads are actually shorter than before.</li>
<li>Computationally very intense (good for keeping clusters busy)</li>
<li>Reduction in Depth makes assembly seem infeasible</li>
</ul>
</section>
<section class="slide level1">
<h3 id="smrtanalysis-preassembler-workflow">SMRTanalysis PreAssembler Workflow</h3>
<p>see <a href="report.html#toc_32">report.html</a></p>
<ul>
<li>Fewer, even shorter reads</li>
<li>Bad results, but minimal relaxation of alignment criteria produced ~200GB of alignment files which then could not be read</li>
<li>Very sensitive to parameters for alignment between PacBio and Illumina Reads</li>
<li>Mapping information between corrected and original reads, better diagnostics</li>
</ul>
</section>
<section class="slide level1">
<h3 id="resulting-reads">Resulting Reads</h3>
<ul>
<li>Filtered Subreads: 40000 reads with 150 Mbp (~1.5x, mean length: 3750 bp)</li>
<li>pacBioToEC: 55000 reads with 125 Mbp (~1.25x, mean length: 2300 bp)</li>
<li>PreAssembly: 18000 reads with 15 Mbp (...)</li>
</ul>
<figure>
<img src="figure/preAssemblyCumLength.png" alt="cumulative read lengths" /><figcaption>cumulative read lengths</figcaption>
</figure>
</section>
<section id="hierarchical-assembly" class="slide level1">
<h1>Hierarchical Assembly</h1>
<p>Compensate for short read length by assembling high-fidelity Illumina reads (with high coverage) and resolve repeats and gaps using long Pacbio reads.</p>
<ol type="1">
<li>Run standard assembler</li>
<li>Use Cerulean or PBJelly to scaffold and fill gaps</li>
</ol>
<p>Few assemblers have native support for including Pacbio CLRs (in contrast to Mate Pair and Sanger reads)</p>
</section>
<section class="slide level1">
<h3 id="short-read-assembly">Short-Read Assembly</h3>
<p>Using 45M Illumina PE100 reads (~9 Gbp, 450x Coverage)</p>
<table>
<thead>
<tr class="header">
<th style="text-align: left;">Set</th>
<th style="text-align: right;"># >2kb</th>
<th style="text-align: right;">N50</th>
<th style="text-align: right;">max</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;">SOAP</td>
<td style="text-align: right;">521</td>
<td style="text-align: right;">78347</td>
<td style="text-align: right;">280862</td>
</tr>
<tr class="even">
<td style="text-align: left;">SGA</td>
<td style="text-align: right;">467</td>
<td style="text-align: right;">57237</td>
<td style="text-align: right;">199401</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Abyss</td>
<td style="text-align: right;">698</td>
<td style="text-align: right;">51236</td>
<td style="text-align: right;">200210</td>
</tr>
</tbody>
</table>
<figure>
<img src="figure/n50.jpg" alt="N50 explained" /><figcaption>N50 explained</figcaption>
</figure>
</section>
<section class="slide level1">
<h2 id="improvement-with-scaffolding-using-pacbio-reads">Improvement with Scaffolding Using Pacbio Reads</h2>
<ul>
<li>Abyss + Longscaff</li>
<li>Abyss + Cerulean</li>
<li>any assembly + PBJelly</li>
</ul>
</section>
<section class="slide level1">
<h3 id="abyss-cerulean-or-longscaff">Abyss + Cerulean or Longscaff</h3>
<table>
<thead>
<tr class="header">
<th style="text-align: left;">set</th>
<th style="text-align: right;"># >2kb</th>
<th style="text-align: right;">N50</th>
<th style="text-align: right;">max</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;">scaffolds</td>
<td style="text-align: right;">698</td>
<td style="text-align: right;">52136</td>
<td style="text-align: right;">200210</td>
</tr>
<tr class="even">
<td style="text-align: left;">longscaff</td>
<td style="text-align: right;">475</td>
<td style="text-align: right;">81601</td>
<td style="text-align: right;">435667</td>
</tr>
<tr class="odd">
<td style="text-align: left;">cerulean</td>
<td style="text-align: right;">310</td>
<td style="text-align: right;">106883</td>
<td style="text-align: right;">366413</td>
</tr>
</tbody>
</table>
<p>see <a href="report.html#toc_42">report.html</a></p>
</section>
<section class="slide level1">
<h3 id="pbjelly">PBJelly</h3>
<table>
<thead>
<tr class="header">
<th style="text-align: left;">set</th>
<th style="text-align: right;"># >2kb</th>
<th style="text-align: right;">N50</th>
<th style="text-align: right;">max</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;">longscaff</td>
<td style="text-align: right;">475</td>
<td style="text-align: right;">81601</td>
<td style="text-align: right;">435667</td>
</tr>
<tr class="even">
<td style="text-align: left;">cerulean</td>
<td style="text-align: right;">310</td>
<td style="text-align: right;">106883</td>
<td style="text-align: right;">366413</td>
</tr>
<tr class="odd">
<td style="text-align: left;">SOAP</td>
<td style="text-align: right;">521</td>
<td style="text-align: right;">78347</td>
<td style="text-align: right;">280862</td>
</tr>
<tr class="even">
<td style="text-align: left;">SGA</td>
<td style="text-align: right;">467</td>
<td style="text-align: right;">57237</td>
<td style="text-align: right;">199401</td>
</tr>
</tbody>
</table>
<p><em>Before</em></p>
<table>
<thead>
<tr class="header">
<th style="text-align: left;">set</th>
<th style="text-align: right;"># >2kb</th>
<th style="text-align: right;">N50</th>
<th style="text-align: right;">max</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;">SGA</td>
<td style="text-align: right;">183</td>
<td style="text-align: right;">234931</td>
<td style="text-align: right;">767671</td>
</tr>
<tr class="even">
<td style="text-align: left;">SOAP</td>
<td style="text-align: right;">174</td>
<td style="text-align: right;">201830</td>
<td style="text-align: right;">541843</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Cerulean</td>
<td style="text-align: right;">238</td>
<td style="text-align: right;">159023</td>
<td style="text-align: right;">489237</td>
</tr>
</tbody>
</table>
<p><em>After</em></p>
</section>
<section class="slide level1">
<h3 id="all-contig-stats">All Contig Stats</h3>
<figure>
<img src="figure/pbjContigs.png" alt="Contig lengths and N50" /><figcaption>Contig lengths and N50</figcaption>
</figure>
</section>
<section class="slide level1">
<h2 id="all-contig-stats-1">All Contig Stats</h2>
<figure>
<img src="figure/allContigStat.png" alt="Contig lengths and counts" /><figcaption>Contig lengths and counts</figcaption>
</figure>
</section>
<section id="quality-checks-for-assembly-assessment" class="slide level1">
<h1>Quality Checks For Assembly Assessment</h1>
</section>
<section class="slide level1">
<h2 id="size-is-not-everything">Size Is Not Everything</h2>
<h2 id="quality-assessment-needed."><strong>Quality Assessment needed.</strong></h2>
<p>But we do not have the luxury of <strong>Assemblathon</strong> or <strong>GAGE</strong> to have a reference to compare to!</p>
<figure>
<img src="figure/sizeVsQuality.png" alt="N50 vs sum of z-scores from different evaluations (Assemblathon 2)" /><figcaption>N50 vs sum of z-scores from different evaluations (Assemblathon 2)</figcaption>
</figure>
</section>
<section class="slide level1">
<h3 id="alignment-to-close-relative-u.-hordei">Alignment to Close Relative (<em>U. hordei</em>)</h3>
<figure>
<img src="figure/wgsAlignment.png" alt="Nucmer Alignment of SGA Scaffolds to U. hordei assembly" /><figcaption>Nucmer Alignment of SGA Scaffolds to U. hordei assembly</figcaption>
</figure>
</section>
<section class="slide level1">
<h3 id="alignment-of-pacbio-reads">Alignment of PacBio reads</h3>
<p>Aligned with <code>bwa mem -a -T 60 -k 16 -A 2 -L 4 -t 8 -S -P -k 32</code></p>
<figure>
<img src="figure/bwaMappingRatio.png" alt="matching contigs" /><figcaption>matching contigs</figcaption>
</figure>
</section>
<section class="slide level1">
<h3 id="alignment-of-pacbio-reads-1">Alignment of PacBio reads</h3>
<figure>
<img src="figure/bwaDepth.png" alt="2D density plot of depth vs contig length" /><figcaption>2D density plot of depth vs contig length</figcaption>
</figure>
<p>A number of contigs with very high depth (>300) were found - A random BLAST produced rDNA.</p>
</section>
<section class="slide level1">
<h3 id="cegma---presence-of-core-genes">CEGMA - Presence Of Core Genes</h3>
<figure>
<img src="figure/cegmaPlot.png" alt="CEGMA results" /><figcaption>CEGMA results</figcaption>
</figure>
</section>
<section class="slide level1">
<h3 id="reapr">REAPR</h3>
<blockquote>
<p>Hunt M, et al. Genome Biol. 2013</p>
</blockquote>
<figure>
<img src="figure/reaprOutput.png" alt="REAPR output" /><figcaption>REAPR output</figcaption>
</figure>
</section>
<section id="elsewhere..." class="slide level1">
<h1>Elsewhere...</h1>
</section>
<section class="slide level1">
<h3 id="pacbio-only-assembly">Pacbio Only Assembly</h3>
<p><strong>Arabidopsis Ler-0 using P4 enzyme, C2 chemistry</strong></p>
<ul>
<li>Genome size: 124.6 Mb, GC content: 33.92%</li>
<li>Raw data: 11 Gb, Assembly coverage: 15.37x</li>
<li>Polished Contigs: 540</li>
<li>Max Contig Length: 12.98 Mb</li>
<li>N50 Contig Length: 6.19 Mb</li>
<li>Sum of Contig Lengths: 124.57 Mb</li>
</ul>
<p><a href="http://blog.pacificbiosciences.com/2013/08/new-data-release-arabidopsis-assembly.html">http://blog.pacificbiosciences.com/2013/08/new-data-release-arabidopsis-assembly.htm</a></p>
</section>
<section class="slide level1">
<h3 id="pacbio-only-assembly-1">Pacbio Only Assembly</h3>
<p><strong>Drosophila using P5 enzyme, C3 chemistry</strong></p>
<ul>
<li>Total number of bases: 15,208,567,933 bp, Total number of reads: 1,514,730</li>
<li>Average read length: 10,040 bp, Half of sequenced bases in reads greater than: 14,214 bp</li>
<li><p>PacBio RS II instrument time for sequencing: 6 days, Number of SMRT® Cells: 42</p></li>
<li>Contigs: 128</li>
<li>Max Contig Length: 24.6 Mbp</li>
<li>N50: 15.3 Mb</li>
<li><p>Sum of Contig Lengths: 138.4 Mbp</p></li>
</ul>
<p><a href="http://blog.pacificbiosciences.com/2014/01/data-release-preliminary-de-novo.html">http://blog.pacificbiosciences.com/2014/01/data-release-preliminary-de-novo.html</a></p>
</section>
<section class="slide level1">
<h3 id="correction-free-assembly">Correction-Free Assembly</h3>
<p>Cod and Salmon. Shown: Salmon (3GBp)</p>
<figure>
<img src="figure/correctionFreeSalmonPoster.png" alt="Towards correction-free assembly of raw PacBio reads, Nederbragt et al. 2014" /><figcaption>Towards correction-free assembly of raw PacBio reads, Nederbragt et al. 2014</figcaption>
</figure>
</section>
<section class="slide level1">
<h3 id="correction-free-assembly-1">Correction-Free Assembly</h3>
<p><strong>Their Conclusions</strong></p>
<blockquote>
<p>10-20x raw PacBio assemblies can yield 2-5x larger contig NG50 compared to short-read assemblies</p>
<p>10-20x raw PacBio assembly are not a finished product, but a good tool to have for improving short-read assemblies:</p>
<ul>
<li><p>provides great amount of contiguity</p></li>
<li><p>useful for evaluation, gap closing, repeat resolution, scaffold joining</p></li>
</ul>
<p>10-20x raw PacBio assemblies are a valuable alternative</p>
<p>obtaining 100x coverage in raw PacBio reads is still too expensive for large genomes -> combine with short-read datasets and assemblies</p>
</blockquote>
<p><strong>Outlook</strong></p>
<blockquote>
<p>Raw PacBio read error rates are not expected to improve</p>
<p>PacBio read lengths are getting longer</p>
<p>Throughput is going up, hopefully reducing cost</p>
</blockquote>
</section>
<section class="slide level1">
<h3 id="pacbio-only">Pacbio Only</h3>
<figure>
<img src="figure/pacbioOnlyChemistries.png" alt="from a presentation at PAG XXII" /><figcaption>from a presentation at PAG XXII</figcaption>
</figure>
</section>
<section class="slide level1">
<h4 id="literature">Literature</h4>
<ol type="1">
<li>Simpson JT, Wong K, Jackman SD, et al. ABySS: a parallel assembler for short read sequence data. Genome Res. 2009;19(6):1117–23.</li>
<li>Bradnam KR, Fass JN, Alexandrov A, et al. Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Gigascience. 2013;2(1):10.</li>
<li>Deshpande V, Fung E, Pham S, Bafna V. Cerulean: A hybrid assembly using high throughput short and long reads. Algorithms Bioinforma. 2013;8126:349–363.</li>
<li>Simpson JT, Durbin R. Efficient de novo assembly of large genomes using compressed data structures. Genome Res. 2012;22(3):549–56.</li>
<li>Simpson J. Exploring Genome Characteristics and Sequence Quality Without a Reference. arXiv Prepr. 2013:1–29.</li>
<li>Salzberg SL, Phillippy AM, Zimin A, et al. GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome Res. 2012;22(3):557–67.</li>
<li>English AC, Richards S, Han Y, et al. Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology. PLoS One. 2012;7(11):e47768.</li>
<li>El-Metwally S, Hamza T, Zakaria M, Helmy M. Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges Markel S, ed. PLoS Comput. Biol. 2013;9(12):e1003345.</li>
<li>Hunt M, Kikuchi T, Sanders M, et al. REAPR: a universal tool for genome assembly evaluation. Genome Biol. 2013;14(5):R47.</li>
<li>Luo R, Liu B, Xie Y, et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience. 2012;1(1):18.</li>
<li>Boetzer M, Pirovano W. Toward almost closed genomes with GapFiller. Genome Biol. 2012;13(6):R56.</li>
<li>Parra G, Bradnam K, Korf I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics. 2007;23(9):1061–7.</li>
</ol>
</section>
<section id="thanks." class="slide level1">
<h1>Thanks.</h1>
<p>Slides, scripts, markdown source code and data for assembly, analysis and presentation available at <a href="https://github.com/h3kker/assemblyTalk">https://github.com/h3kker/assemblyTalk</a></p>
</section>
</div>
</div>
<script src="reveal.js/lib/js/head.min.js"></script>
<script src="reveal.js/js/reveal.min.js"></script>
<script>
// Full list of configuration options available here:
// https://github.com/hakimel/reveal.js#configuration
Reveal.initialize({
controls: true,
progress: true,
history: true,
center: true,
theme: 'simple', // available themes are in /css/theme
transition: 'linear', // default/cube/page/concave/zoom/linear/fade/none
// Optional libraries used to extend on reveal.js
dependencies: [
{ src: 'reveal.js/lib/js/classList.js', condition: function() { return !document.body.classList; } },
{ src: 'reveal.js/plugin/zoom-js/zoom.js', async: true, condition: function() { return !!document.body.classList; } },
{ src: 'reveal.js/plugin/notes/notes.js', async: true, condition: function() { return !!document.body.classList; } },
// { src: 'reveal.js/plugin/search/search.js', async: true, condition: function() { return !!document.body.classList; }, }
// { src: 'reveal.js/plugin/remotes/remotes.js', async: true, condition: function() { return !!document.body.classList; } }
]});
</script>
</body>
</html>