Skip to content

Commit f0a76a5

Browse files
Skola, DylanGitHub Enterprise
Skola, Dylan
authored and
GitHub Enterprise
committed
Clean merge (#115)
* fix bold GFM rendering Without a space between the *** and the : , Github won't render *** as bold. * typos? * Clarify that INFO.VQSLOD is required for --roc * Update read me with GCC/G++ 4.9.2+ requirement See #66 * Remove regex dependency + fix test error with gcc 7.3 * Fix Docker build * Update rtg-tools to work with new Java * Documentation updates for 0.3.12 release (#110) * Update RELEASES.md (#87) * Updated RELASES.md and normalization.md (#109) * Merge doc changes from dev (#112) * Update RELEASES.md (#87) * Updated RELASES.md and normalization.md (#109) * Update more docs (#111) * Update happy.md * Cleaned up normalisation.md * Update happy.md * Created a new section for working with genome VCFs * Added description of --filter-nonref * Updated description of --convert-gvcf-xxxx options * Added note toward beginning of document calling attention to genome VCF section * Fixed broken links in TOC * Fixed inconsistent case in headings * Resolved merge conflict * Remove seemingly-unused ENV command * Restore ENV DEBIAN_FRONTEND=noninteractive
1 parent 6542350 commit f0a76a5

9 files changed

+38
-19
lines changed

.dockerignore

+4-1
Original file line numberDiff line numberDiff line change
@@ -24,4 +24,7 @@ compile_commands.json
2424
tags
2525
tags_sorted_by_file
2626
.tags
27-
.tags_sorted_by_file
27+
.tags_sorted_by_file
28+
cmake-build-debug
29+
cmake-build-release
30+
build

Dockerfile

+3-2
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
1-
FROM ubuntu:16.04
1+
FROM ubuntu:18.04
2+
3+
ENV DEBIAN_FRONTEND=noninteractive
24

35
RUN apt-get update && \
46
apt-get install -y \
@@ -29,7 +31,6 @@ RUN apt-get update && \
2931
zlib1g-dev && \
3032
apt-get clean -y
3133

32-
3334
RUN pip install bx-python
3435

3536
# copy git repository into the image

Dockerfile.ubuntu-with-tests

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
FROM ubuntu:16.04
1+
FROM ubuntu:18.04
22

33
RUN apt-get update && \
44
apt-get install -y \

README.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@ when using variant calling methods that produce many complex variant calls,
105105
these corner cases can become relevant. Moreover, when benchmarking against
106106
gold-standard datasets that cover difficult regions of the genome (e.g.
107107
[Platinum Genomes](http://www.illumina.com/platinumgenomes/)), the more complicated
108-
subsets of the genome will be respnsible for most of the difference between
108+
subsets of the genome will be responsible for most of the difference between
109109
methods.
110110

111111
### Variant preprocessing
@@ -164,7 +164,7 @@ prefer to combine local haplotypes in the same variant records
164164
different variant calling methods.
165165

166166
```
167-
chr1 201586350 . CTCTCTCTCT C
167+
chr1 201586350 . CTCTCTCTC C
168168
chr1 201586359 . T A
169169
```
170170

@@ -351,7 +351,7 @@ docker build -f Dockerfile.centos6 .
351351
You will need these tools / libraries on your system to compile the code:
352352

353353
* CMake > 2.8
354-
* GCC/G++ 4.8+ for compiling
354+
* GCC/G++ 4.9.2+ for compiling
355355
* Boost 1.55+
356356
* Python 2, version 2.7.8 or greater
357357
* Python packages: Pandas, Numpy, Scipy, pysam, bx-python

doc/happy.md

+5-5
Original file line numberDiff line numberDiff line change
@@ -48,13 +48,13 @@ are not supported, all input bed or bed.gz files must only contain bed records).
4848
4949
Hap.py will report counts of
5050

51-
* ***true-positives (TP)***: variants/genotypes that match in truth and query.
52-
* ***false-positives (FP)***: variants that have mismatching genotypes or alt
51+
* ***true-positives (TP)*** : variants/genotypes that match in truth and query.
52+
* ***false-positives (FP)*** : variants that have mismatching genotypes or alt
5353
alleles, as well as query variant calls in regions a truth set would call
5454
confident hom-ref regions.
5555
* ***false-negatives (FN)*** : variants present in the truth set, but missed
5656
in the query.
57-
* ***non-assessed calls (UNK)***: variants outside the truth set regions
57+
* ***non-assessed calls (UNK)*** : variants outside the truth set regions
5858

5959
From these counts, we are able to calculate
6060

@@ -488,8 +488,8 @@ a ROC curve based on the query GQX field:
488488
The `--roc` switch specifies the feature to filter on. Hap.py translates the
489489
truth and query GQ(X) fields into the INFO fields T_GQ and Q_GQ, it tries to
490490
use GQX first, if this is not present, it will use GQ. When run without
491-
internal preprocessing any other input INFO field can be used (e.g. VQSLOD for
492-
GATK).
491+
internal preprocessing any other input INFO field can be used (e.g.
492+
--roc INFO.VQSLOD for GATK).
493493

494494
The `--roc-filter` switch may be used to specify the particular VCF filter
495495
which implements a threshold on the quality score. When calculating filtered

example/happy/microbenchmark.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ REF=$DIR/hg38.chr21.fa
4444
# -------
4545
#
4646
# To make ROCs for GATK, we discard the LowQual filter and use QUAL
47-
# For VQSR ROCs, we would use VQSLOD and discard the VQSR Tranche filters
47+
# For VQSR ROCs, we would use INFO.VQSLOD and discard the VQSR Tranche filters
4848

4949
f=GATK3
5050
g=${DIR}/NA12878-GATK3-chr21.vcf.gz

src/c++/lib/diploidgraphs/DiploidReference.cpp

+5
Original file line numberDiff line numberDiff line change
@@ -230,6 +230,11 @@ void DiploidReference::setRegion(
230230
if(opposite_path != nu_haps.end() && opposite_path != nu_haps.begin())
231231
{
232232
size_t p2 = opposite_path->second;
233+
// make order reproducible since map is not ordered
234+
if(p2 > p1)
235+
{
236+
std::swap(p1, p2);
237+
}
233238

234239
nu_haps.erase(nu_haps.begin());
235240
nu_haps.erase(opposite_path);

src/c++/lib/quantify/QuantifyRegions.cpp

+15-5
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,6 @@
4141
#include "helpers/BCFHelpers.hh"
4242

4343
#include <map>
44-
#include <regex>
4544
#include <unordered_map>
4645
#include <htslib/vcf.h>
4746

@@ -102,7 +101,6 @@ namespace variant
102101
void QuantifyRegions::load(std::vector<std::string> const &rnames, bool fixchr)
103102
{
104103
std::unordered_map<std::string, size_t> label_map;
105-
const std::regex trailing_number_regex ("(.+)_([0-9]+)$");
106104
for (std::string const &f : rnames)
107105
{
108106
std::vector<std::string> v;
@@ -238,10 +236,22 @@ namespace variant
238236

239237
if (!fixed_label && v.size() > 3)
240238
{
241-
std::smatch string_matches;
242-
if(std::regex_match(v[3], string_matches, trailing_number_regex))
239+
size_t split = v[3].size();
240+
for (size_t pos = v[3].size(); pos != 0; --pos)
243241
{
244-
label_ids.insert(getLabelId(label + "_" + string_matches.str(1), 1));
242+
if (v[3][pos] < '0' || v[3][pos] > '9')
243+
{
244+
break;
245+
}
246+
else
247+
{
248+
split = pos;
249+
}
250+
}
251+
252+
if(split < v[3].size() && v[3][split] == '_')
253+
{
254+
label_ids.insert(getLabelId(label + "_" + v[3].substr(0, split), 1));
245255
label_ids.insert(getLabelId(label + "_" + v[3], 2));
246256
}
247257
else

src/sh/run_happy_pg_test.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ ${PYTHON} ${HCDIR}/hap.py \
9595
-r ${DIR}/../../example/chr21.fa \
9696
-o ${TMP_OUT}.unhappy \
9797
-X --unhappy \
98-
--roc VQSLOD \
98+
--roc INFO.VQSLOD \
9999
--force-interactive
100100

101101
if [[ $? != 0 ]]; then

0 commit comments

Comments
 (0)