Skip to content

Commit

Permalink
VSEARCH 2.11.0: Added ability to filter paired reads + xee option
Browse files Browse the repository at this point in the history
  • Loading branch information
torognes committed Feb 13, 2019
1 parent 6f6f30e commit 97c8924
Show file tree
Hide file tree
Showing 94 changed files with 818 additions and 466 deletions.
34 changes: 17 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Most of the nucleotide based commands and options in USEARCH version 7 are suppo

## Getting Help

If you can't find an answer in the [VSEARCH documentation](https://github.com/torognes/vsearch/releases/download/v2.10.4/vsearch_manual.pdf), please visit the [VSEARCH Web Forum](https://groups.google.com/forum/#!forum/vsearch-forum) to post a question or start a discussion.
If you can't find an answer in the [VSEARCH documentation](https://github.com/torognes/vsearch/releases/download/v2.11.0/vsearch_manual.pdf), please visit the [VSEARCH Web Forum](https://groups.google.com/forum/#!forum/vsearch-forum) to post a question or start a discussion.

## Example

Expand All @@ -37,9 +37,9 @@ In the example below, VSEARCH will identify sequences in the file database.fsa t
**Source distribution** To download the source distribution from a [release](https://github.com/torognes/vsearch/releases) and build the executable and the documentation, use the following commands:

```
wget https://github.com/torognes/vsearch/archive/v2.10.4.tar.gz
tar xzf v2.10.4.tar.gz
cd vsearch-2.10.4
wget https://github.com/torognes/vsearch/archive/v2.11.0.tar.gz
tar xzf v2.11.0.tar.gz
cd vsearch-2.11.0
./autogen.sh
./configure
make
Expand Down Expand Up @@ -68,43 +68,43 @@ Binary distributions are provided for x86-64 systems running GNU/Linux, macOS (v
Download the appropriate executable for your system using the following commands if you are using a Linux x86_64 system:

```sh
wget https://github.com/torognes/vsearch/releases/download/v2.10.4/vsearch-2.10.4-linux-x86_64.tar.gz
tar xzf vsearch-2.10.4-linux-x86_64.tar.gz
wget https://github.com/torognes/vsearch/releases/download/v2.11.0/vsearch-2.11.0-linux-x86_64.tar.gz
tar xzf vsearch-2.11.0-linux-x86_64.tar.gz
```

Or these commands if you are using a Linux ppc64le system:

```sh
wget https://github.com/torognes/vsearch/releases/download/v2.10.4/vsearch-2.10.4-linux-ppc64le.tar.gz
tar xzf vsearch-2.10.4-linux-ppc64le.tar.gz
wget https://github.com/torognes/vsearch/releases/download/v2.11.0/vsearch-2.11.0-linux-ppc64le.tar.gz
tar xzf vsearch-2.11.0-linux-ppc64le.tar.gz
```

Or these commands if you are using a Linux aarch64 system:

```sh
wget https://github.com/torognes/vsearch/releases/download/v2.10.4/vsearch-2.10.4-linux-aarch64.tar.gz
tar xzf vsearch-2.10.4-linux-aarch64.tar.gz
wget https://github.com/torognes/vsearch/releases/download/v2.11.0/vsearch-2.11.0-linux-aarch64.tar.gz
tar xzf vsearch-2.11.0-linux-aarch64.tar.gz
```

Or these commands if you are using a Mac:

```sh
wget https://github.com/torognes/vsearch/releases/download/v2.10.4/vsearch-2.10.4-macos-x86_64.tar.gz
tar xzf vsearch-2.10.4-macos-x86_64.tar.gz
wget https://github.com/torognes/vsearch/releases/download/v2.11.0/vsearch-2.11.0-macos-x86_64.tar.gz
tar xzf vsearch-2.11.0-macos-x86_64.tar.gz
```

Or if you are using Windows, download and extract (unzip) the contents of this file:

```
https://github.com/torognes/vsearch/releases/download/v2.10.4/vsearch-2.10.4-win-x86_64.zip
https://github.com/torognes/vsearch/releases/download/v2.11.0/vsearch-2.11.0-win-x86_64.zip
```

Linux and Mac: You will now have the binary distribution in a folder called `vsearch-2.10.4-linux-x86_64` or `vsearch-2.10.4-macos-x86_64` in which you will find three subfolders `bin`, `man` and `doc`. We recommend making a copy or a symbolic link to the vsearch binary `bin/vsearch` in a folder included in your `$PATH`, and a copy or a symbolic link to the vsearch man page `man/vsearch.1` in a folder included in your `$MANPATH`. The PDF version of the manual is available in `doc/vsearch_manual.pdf`.
Linux and Mac: You will now have the binary distribution in a folder called `vsearch-2.11.0-linux-x86_64` or `vsearch-2.11.0-macos-x86_64` in which you will find three subfolders `bin`, `man` and `doc`. We recommend making a copy or a symbolic link to the vsearch binary `bin/vsearch` in a folder included in your `$PATH`, and a copy or a symbolic link to the vsearch man page `man/vsearch.1` in a folder included in your `$MANPATH`. The PDF version of the manual is available in `doc/vsearch_manual.pdf`.

Windows: You will now have the binary distribution in a folder called `vsearch-2.10.4-win-x86_64`. The vsearch executable is called `vsearch.exe`. The manual in PDF format is called `vsearch_manual.pdf`.
Windows: You will now have the binary distribution in a folder called `vsearch-2.11.0-win-x86_64`. The vsearch executable is called `vsearch.exe`. The manual in PDF format is called `vsearch_manual.pdf`.


**Documentation** The VSEARCH user's manual is available in the `man` folder in the form of a [man page](https://github.com/torognes/vsearch/blob/master/man/vsearch.1). A pdf version ([vsearch_manual.pdf](https://github.com/torognes/vsearch/releases/download/v2.10.4/vsearch_manual.pdf)) will be generated by `make`. To install the manpage manually, copy the `vsearch.1` file or a create a symbolic link to `vsearch.1` in a folder included in your `$MANPATH`. The manual in both formats is also available with the binary distribution. The manual in PDF form ([vsearch_manual.pdf](https://github.com/torognes/vsearch/releases/download/v2.10.4/vsearch_manual.pdf)) is also attached to the latest [release](https://github.com/torognes/vsearch/releases).
**Documentation** The VSEARCH user's manual is available in the `man` folder in the form of a [man page](https://github.com/torognes/vsearch/blob/master/man/vsearch.1). A pdf version ([vsearch_manual.pdf](https://github.com/torognes/vsearch/releases/download/v2.11.0/vsearch_manual.pdf)) will be generated by `make`. To install the manpage manually, copy the `vsearch.1` file or a create a symbolic link to `vsearch.1` in a folder included in your `$MANPATH`. The manual in both formats is also available with the binary distribution. The manual in PDF form ([vsearch_manual.pdf](https://github.com/torognes/vsearch/releases/download/v2.11.0/vsearch_manual.pdf)) is also attached to the latest [release](https://github.com/torognes/vsearch/releases).


## Plugins, packages, and wrappers
Expand Down Expand Up @@ -176,11 +176,11 @@ The code is written in C++ but most of it is actually mostly C with some C++ syn

File | Description
---|---
**abundance.cc** | Code for extracting and printing abundance information from FASTA headers
**align.cc** | New Needleman-Wunsch global alignment, serial. Only for testing.
**align_simd.cc** | SIMD parallel global alignment of 1 query with 8 database sequences
**allpairs.cc** | All-vs-all optimal global pairwise alignment (no heuristics)
**arch.cc** | Architecture specific code (Mac/Linux)
**attributes.cc** | Extraction and printing of attributes in FASTA headers
**bitmap.cc** | Implementation of bitmaps
**chimera.cc** | Chimera detection
**city.cc** | CityHash code
Expand Down
2 changes: 1 addition & 1 deletion configure.ac
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# Process this file with autoconf to produce a configure script.

AC_PREREQ([2.63])
AC_INIT([vsearch], [2.10.4], [[email protected]])
AC_INIT([vsearch], [2.11.0], [[email protected]])
AC_CANONICAL_TARGET
AM_INIT_AUTOMAKE([subdir-objects])
AC_LANG([C++])
Expand Down
111 changes: 79 additions & 32 deletions man/vsearch.1
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
.\" ============================================================================
.TH vsearch 1 "January 10, 2019" "version 2.10.4" "USER COMMANDS"
.TH vsearch 1 "February 13, 2019" "version 2.11.0" "USER COMMANDS"
.\" ============================================================================
.SH NAME
vsearch \(em chimera detection, clustering, dereplication and
Expand Down Expand Up @@ -51,9 +51,9 @@ FASTA/FASTQ file processing:
\fBvsearch\fR (\-\-fastq_eestats | \-\-fastq_eestats2) \fIfastqfile\fR
\-\-output \fIoutputfile\fR [\fIoptions\fR]
.PP
\fBvsearch\fR \-\-fastq_filter \fIfastqfile\fR (\-\-fastaout |
\-\-fastaout_discarded | \-\-fastqout | \-\-fastqout_discarded)
\fIoutputfile\fR [\fIoptions\fR]
\fBvsearch\fR \-\-fastq_filter \fIfastqfile\fR [\-\-reverse
\fIfastqfile\fR] (\-\-fastaout | \-\-fastaout_discarded | \-\-fastqout |
\-\-fastqout_discarded \-\-fastaout_rev | \-\-fastaout_discarded_rev | \-\-fastqout_rev | \-\-fastqout_discarded_rev) \fIoutputfile\fR [\fIoptions\fR]
.PP
\fBvsearch\fR \-\-fastq_join \fIfastqfile\fR \-\-reverse
\fIfastqfile\fR (\-\-fastaout | \-\-fastqout) \fIoutputfile\fR
Expand All @@ -68,7 +68,11 @@ FASTA/FASTQ file processing:
\fBvsearch\fR \-\-fastq_stats \fIfastqfile\fR
[\-\-log \fIlogfile\fR] [\fIoptions\fR]
.PP
\fBvsearch\fR \-\-fastx_revcomp \fIfastxfile\fR (\-\-fastaout |
\fBvsearch\fR \-\-fastx_filter \fIinputfile\fR [\-\-reverse
\fIinputfile\fR] (\-\-fastaout | \-\-fastaout_discarded | \-\-fastqout |
\-\-fastqout_discarded \-\-fastaout_rev | \-\-fastaout_discarded_rev | \-\-fastqout_rev | \-\-fastqout_discarded_rev) \fIoutputfile\fR [\fIoptions\fR]
.PP
\fBvsearch\fR \-\-fastx_revcomp \fIinputfile\fR (\-\-fastaout |
\-\-fastqout) \fIoutputfile\fR [\fIoptions\fR]
.PP
\fBvsearch\fR \-\-sff_convert \fIsff-file\fR \-\-fastqout
Expand Down Expand Up @@ -957,15 +961,15 @@ file.
FASTA/FASTQ file processing options:
.RS
.PP
Analyse, shorten, filter, convert or merge sequences in FASTQ files,
or reverse complement sequences in FASTA or FASTQ files. The
Analyse, trim, filter, convert or merge sequences in FASTQ files, or
reverse complement sequences in FASTA or FASTQ files. The
\-\-fastq_chars command can be used to analyse FASTQ files to identify
the quality encoding and the range of quality score values used. To
convert between different FASTQ file variants, use the
\-\-fastq_convert command. Statistical analysis of the quality and
length of the sequences in a FASTQ file may be performed with the
\-\-fastq_stats, \-\-fastq_eestats, and \-\-fastq_eestats2
commands. Sequences may be shortened, filtered and converted by the
commands. Sequences may be trimmed, filtered and converted by the
\-\-fastq_filter or \-\-fastx_filter commands. Paired-end reads can be
merged using the \-\-fastq_mergepairs command. The \-\-fastx_revcomp
command reverse-complements sequences. Finally, the \-\-sff_convert
Expand All @@ -975,7 +979,9 @@ command can be used to convert SFF files to FASTQ.
.B \-\-eeout
When using \-\-fastq_filter or \-\-fastq_mergepairs, include the
number of expected errors (ee) in the sequence header of FASTQ and
FASTA files. This option is a synonym of the \-\-fastq_eeout option.
FASTA files. This option is a synonym of the \-\-fastq_eeout
option. Use the \-\-xee option to remove this information from
headers.
.TP
.BI \-\-eetabbedout \0filename
When specified with the \-\-fastq_mergepairs command, write statistics
Expand All @@ -992,6 +998,11 @@ When using \-\-fastq_filter, \-\-fastq_mergepairs or \-\-fastx_filter,
write to the given FASTA-formatted file the sequences passing the
filter, or the merged sequences.
.TP
.BI \-\-fastaout_rev \0filename
When using \-\-fastq_filter, or \-\-fastx_filter,
write to the given FASTA-formatted file the reverse reads passing the
filter.
.TP
.BI \-\-fastaout_notmerged_fwd \0filename
When using \-\-fastq_mergepairs, write forward reads not merged to the
specified FASTA file.
Expand All @@ -1004,6 +1015,11 @@ specified FASTA file.
Write sequences that do not pass the filter of the \-\-fastq_filter or
\-\-fastx_filter command to the given FASTA-formatted file.
.TP
.BI \-\-fastaout_discarded_rev \0filename
Write reverse reads that do not pass the filter of the
\-\-fastq_filter or \-\-fastx_filter command to the given
FASTA-formatted file.
.TP
.B \-\-fastq_allowmergestagger
When using \-\-fastq_mergepairs, allow to merge staggered read
pairs. Staggered pairs are pairs where the 3' end of the reverse read
Expand Down Expand Up @@ -1051,9 +1067,11 @@ be limited using the \-\-fastq_qminout and \-\-fastq_qmaxout
options. The output file is specified with the \-\-fastqout option.
.TP
.B \-\-fastq_eeout
When using \-\-fastq_filter or \-\-fastq_mergepairs, include the
number of expected errors (ee) in the sequence header of FASTQ and
FASTA files. This option is a synonym of the \-\-eeout option.
When using \-\-fastq_filter, \-\-fastx_filter or \-\-fastq_mergepairs,
include the number of expected errors (ee) in the sequence header of
FASTQ and FASTA files. This option is a synonym of the \-\-eeout
option. Use the \-\-xee option to remove this information from
headers.
.TP
.BI \-\-fastq_eestats \0filename
Analyze a FASTQ file and report statistics on the distributions of
Expand Down Expand Up @@ -1098,7 +1116,7 @@ as its argument. The default setting is "0.5,1.0,2.0" that indicates
that expected error levels of 0.5, 1.0 and 2.0 should be used.
.TP
.BI \-\-fastq_filter \0filename
Shorten and/or filter sequences in the given FASTQ file. Similar to
Trim and/or filter sequences in the given FASTQ file. Similar to
the \-\-fastx_filter command, but works only on FASTQ files. See
\-\-fastx_filter for details.
.TP
Expand Down Expand Up @@ -1341,10 +1359,19 @@ When using \-\-fastq_filter, \-\-fastq_mergepairs or \-\-fastx_filter,
write to the given FASTQ-formatted file the sequences passing the
filter, or the merged sequences.
.TP
.BI \-\-fastqout_rev \0filename
When using \-\-fastq_filter or \-\-fastx_filter,
write to the given FASTQ-formatted file the reverse reads passing the
filter.
.TP
.BI \-\-fastqout_discarded \0filename
When using \-\-fastq_filter or \-\-fastx_filter, write sequences that
do not pass the filter to the given FASTQ-formatted file.
.TP
.BI \-\-fastqout_discarded_rev \0filename
When using \-\-fastq_filter or \-\-fastx_filter, write reverse reads that
do not pass the filter to the given FASTQ-formatted file.
.TP
.BI \-\-fastqout_notmerged_fwd \0filename
When using \-\-fastq_mergepairs, write forward reads not merged to the
specified FASTQ file.
Expand All @@ -1354,26 +1381,34 @@ When using \-\-fastq_mergepairs, write reverse reads not merged to the
specified FASTQ file.
.TP
.BI \-\-fastx_filter \0filename
Shorten and/or filter the sequences in the given FASTA or FASTQ file
and output the remaining sequences to the FASTQ file specified with
the \-\-fastqout option and to the FASTA file specified with the
\-\-fastaout option. The discarded sequences are written to the files
Trim and/or filter the sequences in the given FASTA or FASTQ file and
output the remaining sequences to the FASTQ file specified with the
\-\-fastqout option and/or to the FASTA file specified with the
\-\-fastaout option. Discarded sequences are written to the files
specified with the \-\-fastaout_discarded and \-\-fastqout_discarded
options. The input format (FASTA or FASTQ) is automatically
detected. Output can not be written to FASTQ files if the input is in
FASTA format. Sequences may be shortened using the options
\-\-fastq_stripleft, \-\-fastq_stripright, \-\-fastq_truncee,
\-\-fastq_trunclen, \-\-fastq_trunclen_keep and
\-\-fastq_truncqual. The sequences may be filtered using the options
detected. If the input consists of paired sequences, an input file
with reverse reads may be specified with the \-\-reverse option, and
corresponding output will be written to the files specified with the
\-\-fastqout_rev, \-\-fastaout_rev, \-\-fastqout_discarded_rev, and
\-\-fastaout_discarded_rev options. Output can not be written to FASTQ files
if the input is in FASTA format. The sequences are first trimmed and
then filtered based on the remaining bases. Sequences may be trimmed
using the options \-\-fastq_stripleft, \-\-fastq_stripright,
\-\-fastq_truncee, \-\-fastq_trunclen, \-\-fastq_trunclen_keep and
\-\-fastq_truncqual. The sequences may be filtered using the options
\-\-fastq_maxee, \-\-fastq_maxee_rate, \-\-fastq_maxlen,
\-\-fastq_maxns, \-\-fastq_minlen, \-\-fastq_trunclen, \-\-maxsize,
and \-\-minsize. If shortening results in an empty sequence, it is
discarded. The sequences are first shortened and then filtered based
on the remaining bases. If no shortening or filtering options are
given, all sequences are written to the output files, possibly after
conversion from FASTQ to FASTA format. The \-\-relabel option may be
used to relabel the output sequences. The \-\-eeout may be used to
output the expected number of errors in each sequence.
\-\-fastq_maxns, \-\-fastq_minlen (default 1), \-\-fastq_trunclen,
\-\-maxsize, and \-\-minsize. Sequences not satisfying the
requirements are discarded. For pairs of sequences, both sequences in
a pair must satisfy the requirements, otherwise both are
discarded. If no shortening or filtering options are given, all
sequences are written to the output files, possibly after conversion
from FASTQ to FASTA format. The \-\-relabel option may be used to
relabel the output sequences. The \-\-eeout option may be used to output the
expected number of errors in each sequence. After all sequences have
been processed, the number of kept and discarded sequences will be
shown, as well as how many of the kept sequences were trimmed.
.TP
.BI \-\-fastx_revcomp \0filename
Reverse-complement the sequences in the given FASTA or FASTQ file to a
Expand Down Expand Up @@ -1426,8 +1461,9 @@ Please see the description of the same option under Chimera detection
for details.
.TP
.BI \-\-reverse \0filename
When using \-\-fastq_mergepairs or \-\-fastq_join, specify the FASTQ
file containing containing the reverse reads.
When using \-\-fastq_filter, \-\-fastx_filter, \-\-fastq_mergepairs or
\-\-fastq_join, specify the FASTQ file containing containing the
reverse reads.
.TP
.BI \-\-sff_convert \0filename
Convert the given SFF file to FASTQ. The FASTQ output file is
Expand All @@ -1447,6 +1483,11 @@ default no clipping is performed.
.B \-\-xsize
Strip abundance information from the headers when writing the output
file.
.TP
.B \-\-xee
Strip information about expected errors (ee) from the output file
headers. This information is added by the \-\-fastq_eeout and
\-\-eeout options.
.RE
.PP
.\" ----------------------------------------------------------------------------
Expand Down Expand Up @@ -3508,6 +3549,12 @@ Fixed serious bug in x86_64 SIMD alignment code introduced in version
2.10.3. Added link to BioConda in README. Fixed bug in fastq_stats
with sequence length 1. Fixed use of equals symbol in UC files for
identical sequences with cluster_fast.
.TP
.BR v2.11.0\~ "released February 13th, 2019"
Added ability to trim and filter paired-end reads using the reverse
option with the fastx_filter and fastq_filter commands. Added \-\-xee
option to remove ee attributes from FASTA headers. Minor invisible
improvement to the progress indicator.
.RE
.LP
.\" ============================================================================
Expand Down
4 changes: 2 additions & 2 deletions src/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,11 @@ AM_CFLAGS=$(AM_CXXFLAGS)
export MACOSX_DEPLOYMENT_TARGET

VSEARCHHEADERS=\
abundance.h \
align.h \
align_simd.h \
allpairs.h \
arch.h \
attributes.h \
bitmap.h \
chimera.h \
city.h \
Expand Down Expand Up @@ -108,11 +108,11 @@ endif
endif

__top_builddir__bin_vsearch_SOURCES = $(VSEARCHHEADERS) \
abundance.cc \
align.cc \
align_simd.cc \
allpairs.cc \
arch.cc \
attributes.cc \
bitmap.cc \
chimera.cc \
cluster.cc \
Expand Down
2 changes: 1 addition & 1 deletion src/align.cc
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
VSEARCH: a versatile open source tool for metagenomics
Copyright (C) 2014-2018, Torbjorn Rognes, Frederic Mahe and Tomas Flouri
Copyright (C) 2014-2019, Torbjorn Rognes, Frederic Mahe and Tomas Flouri
All rights reserved.
Contact: Torbjorn Rognes <[email protected]>,
Expand Down
2 changes: 1 addition & 1 deletion src/align.h
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
VSEARCH: a versatile open source tool for metagenomics
Copyright (C) 2014-2018, Torbjorn Rognes, Frederic Mahe and Tomas Flouri
Copyright (C) 2014-2019, Torbjorn Rognes, Frederic Mahe and Tomas Flouri
All rights reserved.
Contact: Torbjorn Rognes <[email protected]>,
Expand Down
2 changes: 1 addition & 1 deletion src/align_simd.h
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
VSEARCH: a versatile open source tool for metagenomics
Copyright (C) 2014-2018, Torbjorn Rognes, Frederic Mahe and Tomas Flouri
Copyright (C) 2014-2019, Torbjorn Rognes, Frederic Mahe and Tomas Flouri
All rights reserved.
Contact: Torbjorn Rognes <[email protected]>,
Expand Down
Loading

0 comments on commit 97c8924

Please sign in to comment.