Skip to content

Commit e7ec82f

Browse files
author
zhangrengang
committed
update readme
1 parent 13aaaa2 commit e7ec82f

File tree

4 files changed

+64
-46
lines changed

4 files changed

+64
-46
lines changed

README.md

Lines changed: 63 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -57,14 +57,14 @@ The below is an example of output figures of wheat (ABD, 1n=3x=21):
5757
![wheat](example_data/wheat_figures.png)
5858
**Figure. Phased subgenomes of allohexaploid bread wheat genome.** Colors are unified with each subgenome in subplots `B-F`, i.e. the same color means the same subgenome.
5959
* (**A**) The histogram of differential k-mers among homoeologous chromosome sets.
60-
* (**B**) Heatmap and clustering of differential k-mers. The x-axis, k-mers; y-axis, chromosomes.
60+
* (**B**) Heatmap and clustering of differential k-mers. The x-axis, differential k-mers; y-axis, chromosomes. The vertical color bar, each chromosome is assigned to which subgenome; the horizontal color bar, each k-mer is specific to which subgenome (blank for non-specific kmers).
6161
* (**C**) Principal component analysis (PCA) of differential k-mers.
62-
* (**D**) Chromosomal characteristics. Rings from outer to inner:
63-
- (**1**) Karyotypes of subgenome assignments by a k-Means algorithm.
64-
- (**2**) Significant enrichment of subgenome-specific k-mers.
62+
* (**D**) Chromosomal characteristics (window size: 1 Mb). Rings from outer to inner:
63+
- (**1**) Subgenome assignments by a k-Means algorithm.
64+
- (**2**) Significant enrichment of subgenome-specific k-mers (blank for non-enriched windows).
6565
- (**3**) Normalized proportion of subgenome-specific k-mers.
66-
- (**4-6**) Density distribution of each subgenome-specific k-mer set.
67-
- (**7**) Density distribution of subgenome-specific LTR-RTs and other LTR-RTs (the most outer, in grey color).
66+
- (**4-6**) Density distribution (count) of each subgenome-specific k-mer set.
67+
- (**7**) Density distribution (count) of subgenome-specific LTR-RTs and other LTR-RTs (the most outer, in grey color).
6868
- (**8**) Homoeologous blocks of each homoeologous chromosome set.
6969
* (**E**) Insertion time of subgenome-specific LTR-RTs.
7070
* (**F**) A phylogenetic tree of 1,000 randomly subsampled LTR/Gypsy elements.
@@ -137,6 +137,7 @@ phase-results/
137137
├── k15_q200_f2.chrom-subgenome.tsv # subgenome assignments and bootstrap values
138138
├── k15_q200_f2.sig.kmer-subgenome.tsv # subgenome-specific kmers
139139
├── k15_q200_f2.bin.enrich # subgenome-specific enrichments by genome window/bin
140+
├── k15_q200_f2.bin.group # grouped bins by potential exchanges based on enrichments
140141
├── k15_q200_f2.ltr.enrich # subgenome-specific LTR-RTs
141142
├── k15_q200_f2.ltr.insert.pdf # density plot of insertion age of subgenome-specific LTR-RTs
142143
├── k15_q200_f2.ltr.insert.R # R script for the density plot
@@ -164,28 +165,32 @@ tmp/
164165
```
165166
usage: subphaser [-h] -i GENOME [GENOME ...] -c CFGFILE [CFGFILE ...]
166167
[-labels LABEL [LABEL ...]] [-no_label]
167-
[-target FILE] [-sep STR]
168+
[-target FILE] [-sg_assigned FILE] [-sep STR]
168169
[-custom_features FASTA [FASTA ...]] [-pre STR]
169170
[-o DIR] [-tmpdir DIR] [-k INT] [-f FLOAT] [-q INT]
170171
[-baseline BASELINE] [-lower_count INT]
171172
[-min_prop FLOAT] [-max_freq INT] [-max_prop FLOAT]
172173
[-low_mem] [-by_count] [-re_filter] [-nsg INT]
173174
[-replicates INT] [-jackknife FLOAT]
174-
[-max_pval FLOAT] [-figfmt {pdf,png}]
175+
[-max_pval FLOAT]
176+
[-test_method {ttest_ind,kruskal,wilcoxon,mannwhitneyu}]
177+
[-figfmt {pdf,png}]
175178
[-heatmap_colors COLOR [COLOR ...]]
176-
[-heatmap_options STR] [-disable_ltr]
179+
[-heatmap_options STR] [-just_core] [-disable_ltr]
177180
[-ltr_detectors {ltr_finder,ltr_harvest} [{ltr_finder,ltr_harvest} ...]]
178181
[-ltr_finder_options STR] [-ltr_harvest_options STR]
179182
[-tesorter_options STR] [-all_ltr] [-intact_ltr]
180-
[-shared_ltr] [-mu FLOAT] [-disable_ltrtree]
181-
[-subsample INT]
183+
[-exclude_exchanges] [-shared_ltr] [-mu FLOAT]
184+
[-disable_ltrtree] [-subsample INT]
182185
[-ltr_domains {GAG,PROT,INT,RT,RH,AP,RNaseH} [{GAG,PROT,INT,RT,RH,AP,RNaseH} ...]]
183186
[-trimal_options STR]
184187
[-tree_method {iqtree,FastTree}] [-tree_options STR]
185188
[-ggtree_options STR] [-disable_circos]
186189
[-window_size INT] [-disable_blocks] [-aligner PROG]
187-
[-aligner_options STR] [-min_block INT] [-p INT]
188-
[-max_memory MEM] [-cleanup] [-overwrite] [-v]
190+
[-aligner_options STR] [-min_block INT]
191+
[-alt_cfgs CFGFILE [CFGFILE ...]] [-chr_ordered FILE]
192+
[-p INT] [-max_memory MEM] [-cleanup] [-overwrite]
193+
[-v]
189194
190195
Phase and visualize subgenomes of an allopolyploid or hybrid based on the repetitive kmers.
191196
@@ -198,19 +203,22 @@ Input:
198203
-i GENOME [GENOME ...], -genomes GENOME [GENOME ...]
199204
Input genome sequences in fasta format [required]
200205
-c CFGFILE [CFGFILE ...], -sg_cfgs CFGFILE [CFGFILE ...]
201-
Subgenomes config file (one homoeologous group per
206+
Subgenomes config file (one homologous group per
202207
line); this chromosome set is for identifying
203208
differential kmers [required]
204209
-labels LABEL [LABEL ...]
205210
For multiple genomes, provide prefix labels for each
206211
genome sequence to avoid conficts among chromosome id
207212
[default: '1-, 2-, ..., n-']
208213
-no_label Do not use default prefix labels for genome sequences
209-
as there is no confict among chromosome id [default:
210-
False]
214+
as there is no confict among chromosome id
215+
[default=False]
211216
-target FILE Target chromosomes to output; id mapping is allowed;
212217
this chromosome set is for cluster and phase [default:
213218
the same chromosome set as `-sg_cfgs`]
219+
-sg_assigned FILE Provide subgenome assignments to skip k-means
220+
clustering and to identify subgenome-specific features
221+
[default=None]
214222
-sep STR Seperator for chromosome ID [default="|"]
215223
-custom_features FASTA [FASTA ...]
216224
Custom features in fasta format to enrich subgenome-
@@ -243,7 +251,7 @@ Kmer:
243251
[default=None]
244252
-low_mem Low MEMory but slower [default: True if genome size >
245253
3G, else False]
246-
-by_count Calculate fold by count instead of by propor
254+
-by_count Calculate fold by count instead of by proportion
247255
[default=False]
248256
-re_filter Re-filter with subset of chromosomes (subgenome
249257
assignments are expected to change) [default=False]
@@ -253,68 +261,73 @@ Cluster:
253261
254262
-nsg INT Number of subgenomes (>1) [default: auto]
255263
-replicates INT Number of replicates for bootstrap [default=1000]
256-
-jackknife FLOAT Percent of kmers to resample for bootstrap
264+
-jackknife FLOAT Percent of kmers to resample for each bootstrap
257265
[default=50]
258266
-max_pval FLOAT Maximum P value for all hypothesis tests
259267
[default=0.05]
268+
-test_method {ttest_ind,kruskal,wilcoxon,mannwhitneyu}
269+
The test method to identify differiential
270+
kmers[default=ttest_ind]
260271
-figfmt {pdf,png} Format of figures [default=pdf]
261272
-heatmap_colors COLOR [COLOR ...]
262-
Color panel (2 or 3 colors) for heatmap plot
263-
[default=('green', 'black', 'red')]
273+
Color panel (2 or 3 colors) for heatmap plot [default:
274+
('green', 'black', 'red')]
264275
-heatmap_options STR Options for heatmap plot (see more in R shell with
265276
`?heatmap.2` of `gplots` package) [default="Rowv=T,Col
266277
v=T,scale='col',dendrogram='row',labCol=F,trace='none'
267278
,key=T,key.title=NA,density.info='density',main=NA,xla
268-
b=NA,margins=c(4,8)"]
279+
b='Differential kmers',margins=c(2.5,12)"]
280+
-just_core Exit after the core phasing module
281+
[default=False]
269282
270283
LTR:
271284
Options for LTR analyses
272285
273286
-disable_ltr Disable this step (this step is time-consuming for
274287
large genome) [default=False]
275288
-ltr_detectors {ltr_finder,ltr_harvest} [{ltr_finder,ltr_harvest} ...]
276-
Programs to detect LTR-RTs [default=['ltr_harvest',
277-
'ltr_finder']]
289+
Programs to detect LTR-RTs [default=['ltr_harvest']]
278290
-ltr_finder_options STR
279291
Options for `ltr_finder` to identify LTR-RTs (see more
280-
with `ltr_finder -h`) [default="-w 2 -D 20000 -d 1000
281-
-L 7000 -l 100 -p 20 -C -M 0.6"]
292+
with `ltr_finder -h`) [default="-w 2 -D 15000 -d 1000
293+
-L 7000 -l 100 -p 20 -C -M 0.8"]
282294
-ltr_harvest_options STR
283295
Options for `gt ltrharvest` to identify LTR-RTs (see
284296
more with `gt ltrharvest -help`) [default="-seqids yes
285-
-similar 60 -vic 10 -seed 20 -minlenltr 100 -maxlenltr
286-
7000 -maxdistltr 20000 -mindistltr 1000 -mintsd 4
287-
-maxtsd 20"]
297+
-similar 80 -vic 10 -seed 20 -minlenltr 100 -maxlenltr
298+
7000 -mintsd 4 -maxtsd 6"]
288299
-tesorter_options STR
289300
Options for `TEsorter` to classify LTR-RTs (see more
290-
with `TEsorter -h`) [default="-db rexdb-plant -dp2"]
291-
-all_ltr Use all LTR identified by `-ltr_detectors` (more LTRs
292-
but slower) [default: only use LTR as classified by
293-
`TEsorter`]
294-
-intact_ltr Use completed LTR as classified by `TEsorter` (less
295-
LTRs but faster) [default: the same as `-all_ltr`]
296-
-shared_ltr Identify shared LTRs among subgenomes (experimental)
297-
[default=False]
301+
with `TEsorter -h`) [default="-db rexdb -dp2"]
302+
-all_ltr Use all LTR-RTs identified by `-ltr_detectors` (more
303+
LTR-RTs but slower) [default: only use LTR as
304+
classified by `TEsorter`]
305+
-intact_ltr Use completed LTR-RTs classified by `TEsorter` (less
306+
LTR-RTs but faster) [default: the same as `-all_ltr`]
307+
-exclude_exchanges Exclude potential exchanged LTRs for insertion age
308+
estimation and phylogenetic trees [default=False]
309+
-shared_ltr Identify shared LTR-RTs among subgenomes
310+
(experimental) [default=False]
298311
-mu FLOAT Substitution rate per year in the intergenic region,
299312
for estimating age of LTR insertion [default=1.3e-08]
300313
-disable_ltrtree Disable subgenome-specific LTR tree (this step is
301-
time-consuming when subgenome-specific LTRs are too
314+
time-consuming when subgenome-specific LTR-RTs are too
302315
many, so `-subsample` is enabled by defualt)
303316
[default=False]
304-
-subsample INT Subsample LTRs to avoid too many to construct a tree
305-
[default=1000] (0 to disable)
317+
-subsample INT Subsample LTR-RTs to avoid too many to construct a
318+
tree [default=1000] (0 to disable)
306319
-ltr_domains {GAG,PROT,INT,RT,RH,AP,RNaseH} [{GAG,PROT,INT,RT,RH,AP,RNaseH} ...]
307320
Domains for LTR tree (Note: for domains identified by
308321
`TEsorter`, PROT (rexdb) = AP (gydb), RH (rexdb) =
309-
RNaseH (gydb)) [default=['INT', 'RT', 'RH']]
322+
RNaseH (gydb)) [default: ['INT', 'RT', 'RH']]
310323
-trimal_options STR Options for `trimal` to trim alignment (see more with
311324
`trimal -h`) [default="-automated1"]
312325
-tree_method {iqtree,FastTree}
313326
Programs to construct phylogenetic trees
314-
[default=iqtree]
327+
[default=FastTree]
315328
-tree_options STR Options for `-tree_method` to construct phylogenetic
316329
trees (see more with `iqtree -h` or `FastTree
317-
-expert`) [default="-mset JTT"]
330+
-expert`) [default=""]
318331
-ggtree_options STR Options for `ggtree` to show phylogenetic trees (see
319332
more from `https://yulab-smu.top/treedata-book`)
320333
[default="branch.length='none', layout='circular'"]
@@ -324,17 +337,22 @@ Circos:
324337
325338
-disable_circos Disable this step [default=False]
326339
-window_size INT Window size (bp) for circos plot [default=1000000]
327-
-disable_blocks Disable to plot homoeologous blocks [default=False]
328-
-aligner PROG Programs to identify homoeologous blocks
340+
-disable_blocks Disable to plot homologous blocks [default=False]
341+
-aligner PROG Programs to identify homologous blocks
329342
[default=minimap2]
330343
-aligner_options STR Options for `-aligner` to align chromosome sequences
331344
[default="-x asm20 -n 10"]
332345
-min_block INT Minimum block size (bp) to show [default=100000]
346+
-alt_cfgs CFGFILE [CFGFILE ...]
347+
An alternative config file for identifying homologous
348+
blocks [default=None]
349+
-chr_ordered FILE Provide a chromosome order to plot circos
350+
[default=None]
333351
334352
Other options:
335353
-p INT, -ncpu INT Maximum number of processors to use [default=32]
336354
-max_memory MEM Maximum memory to use where limiting can be enabled.
337-
[default=65.1G]
355+
[default=65.2G]
338356
-cleanup Remove the temporary directory [default=False]
339357
-overwrite Overwrite even if check point files existed
340358
[default=False]

example_data/wheat_figures.png

5.17 KB
Loading

example_data/wheat_figures.v1.1.png

379 KB
Loading

subphaser/__main__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,7 @@ def makeArgparse():
121121
help='Options for heatmap plot (see more in R shell with `?heatmap.2` \
122122
of `gplots` package) [default="%(default)s"]')
123123
group_clst.add_argument('-just_core', action="store_true", default=False,
124-
help="Exit after after the core phasing module [default=%(default)s]")
124+
help="Exit after the core phasing module [default=%(default)s]")
125125

126126
# LTR
127127
group_ltr = parser.add_argument_group('LTR', 'Options for LTR analyses')

0 commit comments

Comments
 (0)