The aim of the R script is to exclude extremely long branches -- representing potentially misclassified taxa or genomic regions with undetected recombinant parts -- from a phylogenetic tree.
Rscript pruning_tree.R --original_tree tree.nwk
optparse, tidyverse, magrittr, ape, caper, treeio, MASS, phytools, viridis, ggtree
- required:
--original_tree
: a NEWICK tree file
- optional argument:
--tipprop
: The proportion of tips on a single branch that can be excluded based on the upper fence of the IQR of branch lengths.The default value is 0.05 (5%).
-
pruned.nwk: the pruned NEWICK tree file
-
branch_length_nr_of_tips.png & pdf: a scatter plot of branch lengths and number of tips with the IQR upper fence and the proportion of tips gave by the
--tipprop
argument -
branch_length_distribution.png & pdf: a histogram of branch lengths indicating the count of included and excluded tips
-
original_vs_pruned_tree.png & pdf: a tree plot indicating the pruned tip branches
-
tree_pruning.log: a text file containing the summary comparison of the original tree and the pruned tree.
Abbreviations:
-
IQR: the upper fence of the Inter Quartile Range of a vector: Q3 + 3 * (Q3 - Q1) = extreme outlier threshold
-
R2T: root-to-tip distance on a midpoint rooted tree
1st part: Pruning the unrooted tree based on the upper fence of IQR branch lengths.
-
Calculating the minimum number of tips for each branch (unrooted bifurcating tree, both directions are looked up, than the least tip number is chosen to represent the branch).
-
Calculating the IQR for branch lengths.
-
Excluding those extreme outlier branches containing less than 0.05 (or a given) proportion of the tips.
2nd part: Pruning the midpoint rooted tree based on root-to-tip distances.
Excluding tips based on the R2T IQRs, while iteratively midpoint rooting and excluding the top gretaest extreme outlier.
-
Midpoint rooting the tree (after pruning with method described in the 1st part).
-
Calculating the IQR for root-to-tip distances.
-
Excluding the most extreme outlier tip based on the IQR for root-to-tip distances.
-
Repeating point 4. to 6. till there are no more extreme outlier IQR tip is found.
-
Unrooting the pruned tree.