Skip to content

Commit 9d4204b

Browse files
committed
Squashed 'lib/foldseek/' changes from e00a3dc1..15c0516f
15c0516f Remove --tar-include and --tar-exclude from createdb as they were never used e1394aac Rework createdb to correctly allow for only one directory or (new) tsv input, in additonal to loose files 0c3b7f23 scorecomplex minor update c4a4b6a4 scorecomplex redundant complex alignment re-solve e77b6431 scorecomplex complex alignment redundancy solved d725f0e3 scorecomplex rbh filtering with 0.7 as the margin d785457c change names for tm based rbh filtering 405e64c2 scorecomplex: rbh filtering with query tm score e5c52dda scorecomplex commit for benchmark test 537c6160 scorecomplex commit for benchmark test 4f3bc395 Merge branch 'master' of https://github.com/steineggerlab/foldseek 5b789cfd scorecomplex rollback to DBSCAN 493cefe7 Add mode to compute exact (slow) tmscore. 75a50f7c Fix steineggerlab/foldseek#214 38e5e93f Merge pull request #244 from steineggerlab/test 6b1dd706 fix typo c388d483 confilcts solved ecf85daf final update scorecomplex with nearest neighbors 802235db backtrace related issue detour f629bbe1 Merge commit '8faebba3f96210242892943d37e4fe9e8a5eed8d' 8faebba3 Squashed 'lib/mmseqs/' changes from 22a77eeb..950342d9 dc272d56 complexsearch with DBSCAN commit for benchmark a7fefa22 bitscore margine change a6b1928e DBSCAN update retry 25031967 foldseek DBSCAN with RBH filter and NN rescue ac2b1dcf nearest neighbors update 41c7f9c9 nearest neighbors update e4079c49 new nearest neighbors 2742f469 new nearest neighbors 4d426356 nearest neighbors new 6cb3ea6c revoke nearest neighbors only 093af914 test nearest neighbors only2 1dbaac36 test nearest neighbors only 3bf3cdf5 Fix this in a different way f4a1a527 Fix compile on older compilers d3fca9e8 Update citations for databases 39ade546 Update README.md 87caae8e rbh filter margine improvement 6e4184a1 implement getting neareast neighbors in sorecomplex 6e632c30 eps related update revoke 65550247 scorecomplex learningRate=0 issue solved 095102ff infinite loop bug fix scorecomplex 9ca20244 Merge branch 'master' of https://github.com/steineggerlab/foldseek 096613dc commit for pull 0eff0231 Update scorecomplex.cpp: eps related update rollback f690b9d5 Merge branch 'master' of https://github.com/steineggerlab/foldseek da825d55 rbh filtering with bitscore & clustering eps update 852434a4 Add --input-format to createdb to force an input structure format 00ab450f scorecomplex rbh filtering implement 6893dcc5 Add CATH50 steineggerlab/foldseek#232 bb090174 [expandcomplex] eased e-value for the 2nd alignment e9f76df6 update scorecomplex dbscan error fixed 1cb3a80d scorecomplex alignment clustering algorithm update 5433d6db Update README.md 1bc8d2e5 update to latest MMseqs2-App master c6f4f2a6 fix regression fail 10289c64 scorecomplex DBSCAN impropvement 6816a641 skip single chained complex for scorecomplex 2a187342 complexsearch initial search parameter adjustment 49dabe0d scorecomplex many against many bug fixed real final 1c4fdfa6 scorecomplex many against many bug fix final 9f8a2ef9 scorecomplex many against many bug fixed 3fe1f9e4 Merge branch 'master' of https://github.com/steineggerlab/foldseek c28e7938 fix wrong parameters in easy-complexsearch 035edc18 expandcomplex should now work correctly with both cluster and non-cluster dbs e396ca4d Carry extended dbtype for complexsearch to work with clustered dbs f05703dd Remove std::cout in structurerescorediagonal 7b68363d Merge branch 'master' of https://github.com/steineggerlab/foldseek 886021d2 Fix issue steineggerlab/foldseek#205 592ffa80 fix explanation of complex related tools a6a712c1 fix easycomplexsearch.sh wrong param 799d42ca update EasyComplexSearch; improve expandcomplex stability 258be0fc Update README.md c382b8fa Update README.md d3f4980d Update README.md a417633d Update README.md b156e065 implement complexsearch and sanitizing expandcomplex b220b5a9 Update README.md ec32bee1 update expandcomplex 76ffa031 update expandcomplex 10ba8f53 Fix easy-complexsearch workflow shell scripting issues 75cc763a fix regression failed 206f600a fix regression test failed 9cc02eb7 expandcomplex a695211e expandcomplex dffdf788 Increase buffer size 79f865d6 Add --complex-report-mode to allow disabling report in easy-scorecomplex 38290cf7 Cleanup easy-complexsearch 08f1db5e Expose --db-output for createcomplexreport bdeb0024 Update README.md 629de617 Merge branch 'master' of https://github.com/steineggerlab/foldseek f7857793 Fix sameDB (clusterDB) issue in structurealign 7180ed43 Merge branch 'master' of https://github.com/steineggerlab/foldseek 28a4a7f5 update createcomplexreport with multithreading issue fixed git-subtree-dir: lib/foldseek git-subtree-split: 15c0516fbae0d7e0903ee80f14cb927782b394d0
1 parent 711aad8 commit 9d4204b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+1439
-523
lines changed

README.md

Lines changed: 40 additions & 47 deletions
Large diffs are not rendered by default.

data/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ set(COMPILED_RESOURCES
1313
evalue_nn.kerasify
1414
main.js
1515
vendor.js.zst
16+
complexsearch.sh
1617
easycomplexsearch.sh
1718
)
1819

data/complexsearch.sh

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
#!/bin/sh -e
2+
fail() {
3+
echo "Error: $1"
4+
exit 1
5+
}
6+
7+
notExists() {
8+
[ ! -f "$1" ]
9+
}
10+
11+
if notExists "${TMP_PATH}/result.dbtype"; then
12+
# shellcheck disable=SC2086
13+
"$MMSEQS" search "${QUERYDB}" "${TARGETDB}" "${TMP_PATH}/result" "${TMP_PATH}/search_tmp" ${SEARCH_PAR} \
14+
|| fail "Search died"
15+
fi
16+
17+
RESULT="${TMP_PATH}/result"
18+
if [ "$PREFMODE" != "EXHAUSTIVE" ]; then
19+
if notExists "${TMP_PATH}/result_expand_pref.dbtype"; then
20+
# shellcheck disable=SC2086
21+
"$MMSEQS" expandcomplex "${QUERYDB}" "${TARGETDB}" "${RESULT}" "${TMP_PATH}/result_expand_pref" ${THREADS_PAR} \
22+
|| fail "Expandcomplex died"
23+
fi
24+
if notExists "${TMP_PATH}/result_expand_aligned.dbtype"; then
25+
# shellcheck disable=SC2086
26+
"$MMSEQS" $COMPLEX_ALIGNMENT_ALGO "${QUERYDB}" "${TARGETDB}" "${TMP_PATH}/result_expand_pref" "${TMP_PATH}/result_expand_aligned" ${COMPLEX_ALIGN_PAR} \
27+
|| fail $COMPLEX_ALIGNMENT_ALGO "died"
28+
fi
29+
RESULT="${TMP_PATH}/result_expand_aligned"
30+
fi
31+
if notExists "${TMP_PATH}/complex_result.dbtype"; then
32+
# shellcheck disable=SC2086
33+
$MMSEQS scorecomplex "${QUERYDB}" "${TARGETDB}" "${RESULT}" "${OUTPUT}" ${SCORECOMPLEX_PAR} \
34+
|| fail "ScoreComplex died"
35+
fi
36+
37+
if [ -n "${REMOVE_TMP}" ]; then
38+
# shellcheck disable=SC2086
39+
"$MMSEQS" rmdb "${TMP_PATH}/result" ${VERBOSITY}
40+
if [ "$PREFMODE" != "EXHAUSTIVE" ]; then
41+
# shellcheck disable=SC2086
42+
"$MMSEQS" rmdb "${TMP_PATH}/result_expand_aligned" ${VERBOSITY}
43+
fi
44+
rm -rf "${TMP_PATH}/search_tmp"
45+
rm -f "${TMP_PATH}/complexsearch.sh"
46+
fi

data/easycomplexsearch.sh

Lines changed: 18 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -26,39 +26,31 @@ if notExists "${TARGET}.dbtype"; then
2626
TARGET="${TMP_PATH}/target"
2727
fi
2828

29-
30-
SEARCH_RESULT="${TMP_PATH}/result"
31-
if notExists "${SEARCH_RESULT}.dbtype"; then
29+
if notExists "${TMP_PATH}/complex_result.dbtype"; then
3230
# shellcheck disable=SC2086
33-
34-
"$MMSEQS" search "${QUERY}" "${TARGET}" "${SEARCH_RESULT}" "${TMP_PATH}/search_tmp" ${SEARCH_PAR} \
35-
|| fail "Search died"
31+
"$MMSEQS" complexsearch "${QUERY}" "${TARGET}" "${TMP_PATH}/complex_result" "${TMP_PATH}/complexsearch_tmp" ${COMPLEXSEARCH_PAR} \
32+
|| fail "ComplexSearch died"
3633
fi
3734

38-
SCORECOMPLEX_RESULT="${TMP_PATH}/result2"
39-
if notExists "${SCORECOMPLEX_RESULT}/.dbtype"; then
40-
# shellcheck disable=SC2086
41-
$MMSEQS scorecomplex "${QUERY}" "${TARGET}" "${SEARCH_RESULT}" ${SCORECOMPLEX_RESULT} ${SCORECOMPLEX_PAR} \
42-
|| fail "ScoreComplex died"
43-
fi
35+
# shellcheck disable=SC2086
36+
"$MMSEQS" convertalis "${QUERY}" "${TARGET}" "${TMP_PATH}/complex_result" "${OUTPUT}" ${CONVERT_PAR} \
37+
|| fail "Convert Alignments died"
4438

45-
if notExists "${TMP_PATH}/alis.dbtype"; then
39+
if [ -z "${NO_REPORT}" ]; then
4640
# shellcheck disable=SC2086
47-
"$MMSEQS" convertalis "${QUERY}" "${TARGET}" "${SCORECOMPLEX_RESULT}" "${OUTPUT}" ${CONVERT_PAR} \
48-
|| fail "Convert Alignments died"
41+
"$MMSEQS" createcomplexreport "${QUERY}" "${TARGET}" "${TMP_PATH}/complex_result" "${OUTPUT}_report" ${REPORT_PAR} \
42+
|| fail "createcomplexreport died"
4943
fi
50-
# shellcheck disable=SC2086
51-
"$MMSEQS" createcomplexreport "${QUERY}" "${TARGET}" "${SCORECOMPLEX_RESULT}" "${REPORT}" ${REPORT_PAR}\
52-
|| fail "Createcomplexreport dies"
53-
54-
55-
56-
57-
5844

5945
if [ -n "${REMOVE_TMP}" ]; then
6046
# shellcheck disable=SC2086
6147
"$MMSEQS" rmdb "${TMP_PATH}/result" ${VERBOSITY}
48+
if [ "$PREFMODE" != "EXHAUSTIVE" ]; then
49+
# shellcheck disable=SC2086
50+
"$MMSEQS" rmdb "${TMP_PATH}/result_expand_aligned" ${VERBOSITY}
51+
fi
52+
# shellcheck disable=SC2086
53+
"$MMSEQS" rmdb "${TMP_PATH}/complex_result" ${VERBOSITY}
6254
if [ -z "${LEAVE_INPUT}" ]; then
6355
if [ -f "${TMP_PATH}/target" ]; then
6456
# shellcheck disable=SC2086
@@ -79,6 +71,6 @@ if [ -n "${REMOVE_TMP}" ]; then
7971
# shellcheck disable=SC2086
8072
"$MMSEQS" rmdb "${TMP_PATH}/query_ss" ${VERBOSITY}
8173
fi
82-
rm -rf "${TMP_PATH}/search_tmp"
83-
rm -f "${TMP_PATH}/easyscorecomplex.sh"
84-
fi
74+
rm -rf "${TMP_PATH}/complexsearch_tmp"
75+
rm -f "${TMP_PATH}/easycomplexsearch.sh"
76+
fi

data/main.js

Lines changed: 3 additions & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

data/structdatabases.sh

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -150,9 +150,17 @@ case "${SELECTION}" in
150150
push_back "${TMP_PATH}/pdb"
151151
INPUT_TYPE="FOLDSEEK_DB"
152152
;;
153+
"CATH50")
154+
if notExists "${TMP_PATH}/cath50.tar.gz"; then
155+
downloadFile "https://foldseek.steineggerlab.workers.dev/cath50.tar.gz" "${TMP_PATH}/cath50.tar.gz"
156+
downloadFile "https://foldseek.steineggerlab.workers.dev/cath50.version" "${TMP_PATH}/version"
157+
fi
158+
tar xvfz "${TMP_PATH}/cath50.tar.gz" -C "${TMP_PATH}"
159+
push_back "${TMP_PATH}/cath50"
160+
INPUT_TYPE="FOLDSEEK_DB"
161+
;;
153162
esac
154163

155-
156164
if notExists "${OUTDB}.dbtype"; then
157165
case "${INPUT_TYPE}" in
158166
"FOLDSEEK_DB")

data/vendor.js.zst

33 Bytes
Binary file not shown.

lib/mmseqs/src/CommandDeclarations.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ extern int convertkb(int argc, const char **argv, const Command& command);
2323
extern int convertmsa(int argc, const char **argv, const Command& command);
2424
extern int convertprofiledb(int argc, const char **argv, const Command& command);
2525
extern int createdb(int argc, const char **argv, const Command& command);
26+
extern int makepaddedseqdb(int argc, const char **argv, const Command& command);
2627
extern int createindex(int argc, const char **argv, const Command& command);
2728
extern int createlinindex(int argc, const char **argv, const Command& command);
2829
extern int createseqfiledb(int argc, const char **argv, const Command& command);

lib/mmseqs/src/MMseqsBase.cpp

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -39,9 +39,9 @@ std::vector<Command> baseCommands = {
3939
"Slower, sensitive clustering",
4040
"mmseqs easy-cluster examples/DB.fasta result tmp\n"
4141
"# Cluster output\n"
42-
"# - result_rep_seq.fasta: Representatives\n"
43-
"# - result_all_seq.fasta: FASTA-like per cluster\n"
44-
"# - result_cluster.tsv: Adjacency list\n\n"
42+
"# - result_rep_seq.fasta: Representatives\n"
43+
"# - result_all_seqs.fasta: FASTA-like per cluster\n"
44+
"# - result_cluster.tsv: Adjacency list\n\n"
4545
"# Important parameter: --min-seq-id, --cov-mode and -c \n"
4646
"# --cov-mode \n"
4747
"# 0 1 2\n"
@@ -62,9 +62,9 @@ std::vector<Command> baseCommands = {
6262
"Fast linear time cluster, less sensitive clustering",
6363
"mmseqs easy-linclust examples/DB.fasta result tmp\n\n"
6464
"# Linclust output\n"
65-
"# - result_rep_seq.fasta: Representatives\n"
66-
"# - result_all_seq.fasta: FASTA-like per cluster\n"
67-
"# - result_cluster.tsv: Adjecency list\n\n"
65+
"# - result_rep_seq.fasta: Representatives\n"
66+
"# - result_all_seqs.fasta: FASTA-like per cluster\n"
67+
"# - result_cluster.tsv: Adjecency list\n\n"
6868
"# Important parameter: --min-seq-id, --cov-mode and -c \n"
6969
"# --cov-mode \n"
7070
"# 0 1 2\n"
@@ -130,14 +130,21 @@ std::vector<Command> baseCommands = {
130130
"<i:fastaFile1[.gz|.bz2]> ... <i:fastaFileN[.gz|.bz2]>|<i:stdin> <o:sequenceDB>",
131131
CITATION_MMSEQS2, {{"fast[a|q]File[.gz|bz2]|stdin", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA | DbType::VARIADIC, &DbValidator::flatfileStdinAndGeneric },
132132
{"sequenceDB", DbType::ACCESS_MODE_OUTPUT, DbType::NEED_DATA, &DbValidator::flatfile }}},
133+
{"makepaddedseqdb", makepaddedseqdb, &par.onlyverbosity, COMMAND_HIDDEN,
134+
"Generate a padded sequence DB",
135+
"Generate a padded sequence DB",
136+
"Martin Steinegger <[email protected]>",
137+
"<i:sequenceDB> <o:sequenceDB>",
138+
CITATION_MMSEQS2, {{"sequenceDB", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA|DbType::NEED_HEADER, &DbValidator::sequenceDb },
139+
{"sequenceIndexDB", DbType::ACCESS_MODE_OUTPUT, DbType::NEED_DATA, &DbValidator::sequenceDb }}},
133140
{"appenddbtoindex", appenddbtoindex, &par.appenddbtoindex, COMMAND_HIDDEN,
134141
NULL,
135142
NULL,
136143
"Milot Mirdita <[email protected]>",
137144
"<i:DB1> ... <i:DBN> <o:DB>",
138145
CITATION_MMSEQS2, {{"DB", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA | DbType::VARIADIC, &DbValidator::allDb },
139146
{"DB", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA, &DbValidator::allDb }}},
140-
{"indexdb", indexdb, &par.indexdb, COMMAND_HIDDEN,
147+
{"indexdb", indexdb, &par.indexdb, COMMAND_HIDDEN,
141148
NULL,
142149
NULL,
143150
"Martin Steinegger <[email protected]>",

lib/mmseqs/src/commons/Parameters.cpp

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2573,6 +2573,9 @@ void Parameters::setDefaults() {
25732573
taxonomySearchMode = Parameters::TAXONOMY_APPROX_2BLCA;
25742574
taxonomyOutputMode = Parameters::TAXONOMY_OUTPUT_LCA;
25752575

2576+
// help
2577+
help = 0;
2578+
25762579
// substituion matrix
25772580
substitutionMatrices = {
25782581
{"nucleotide.out", nucleotide_out, nucleotide_out_len },

lib/mmseqs/src/prefiltering/CacheFriendlyOperations.cpp

Lines changed: 47 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ CacheFriendlyOperations<BINSIZE>::~CacheFriendlyOperations<BINSIZE>(){
3636

3737
template<unsigned int BINSIZE>
3838
size_t CacheFriendlyOperations<BINSIZE>::findDuplicates(IndexEntryLocal **input, CounterResult *output,
39-
size_t outputSize, unsigned short indexFrom, unsigned short indexTo, bool computeTotalScore) {
39+
size_t outputSize, unsigned short indexFrom, unsigned short indexTo, bool computeTotalScore) {
4040
do {
4141
setupBinPointer();
4242
CounterResult *lastPosition = (binDataFrame + BINCOUNT * binSize) - 1;
@@ -58,12 +58,16 @@ size_t CacheFriendlyOperations<BINSIZE>::mergeElementsByScore(CounterResult *inp
5858
}
5959

6060
template<unsigned int BINSIZE>
61-
size_t CacheFriendlyOperations<BINSIZE>::mergeElementsByDiagonal(CounterResult *inputOutputArray, const size_t N) {
61+
size_t CacheFriendlyOperations<BINSIZE>::mergeElementsByDiagonal(CounterResult *inputOutputArray, const size_t N, const bool keepScoredHits) {
6262
do {
6363
setupBinPointer();
6464
hashElements(inputOutputArray, N);
6565
} while(checkForOverflowAndResizeArray(false) == true); // overflowed occurred
66-
return mergeDiagonalDuplicates(inputOutputArray);
66+
if(keepScoredHits){
67+
return mergeDiagonalKeepScoredHitsDuplicates(inputOutputArray);
68+
}else{
69+
return mergeDiagonalDuplicates(inputOutputArray);
70+
}
6771
}
6872

6973
template<unsigned int BINSIZE>
@@ -93,6 +97,7 @@ size_t CacheFriendlyOperations<BINSIZE>::mergeDiagonalDuplicates(CounterResult *
9397
--n;
9498
}
9599
// combine diagonals
100+
// we keep only the last diagonal element
96101
for (size_t n = 0; n < currBinSize; n++) {
97102
const CounterResult &element = binStartPos[n];
98103
const unsigned int hashBinElement = element.id >> (MASK_0_5_BIT);
@@ -109,6 +114,40 @@ size_t CacheFriendlyOperations<BINSIZE>::mergeDiagonalDuplicates(CounterResult *
109114
return doubleElementCount;
110115
}
111116

117+
118+
template<unsigned int BINSIZE>
119+
size_t CacheFriendlyOperations<BINSIZE>::mergeDiagonalKeepScoredHitsDuplicates(CounterResult *output) {
120+
size_t doubleElementCount = 0;
121+
const CounterResult *bin_ref_pointer = binDataFrame;
122+
// duplicateBitArray is already zero'd from findDuplicates
123+
124+
for (size_t bin = 0; bin < BINCOUNT; bin++) {
125+
const CounterResult *binStartPos = (bin_ref_pointer + bin * binSize);
126+
const size_t currBinSize = (bins[bin] - binStartPos);
127+
// write diagonals + 1 in reverse order in the byte array
128+
for (size_t n = 0; n < currBinSize; n++) {
129+
const unsigned int element = binStartPos[n].id >> (MASK_0_5_BIT);
130+
duplicateBitArray[element] = static_cast<unsigned char>(binStartPos[n].diagonal) + 1;
131+
}
132+
// combine diagonals
133+
// we keep only the last diagonal element
134+
size_t n = currBinSize - 1;
135+
while (n != static_cast<size_t>(-1)) {
136+
const CounterResult &element = binStartPos[n];
137+
const unsigned int hashBinElement = element.id >> (MASK_0_5_BIT);
138+
output[doubleElementCount].id = element.id;
139+
output[doubleElementCount].count = element.count;
140+
output[doubleElementCount].diagonal = element.diagonal;
141+
// std::cout << output[doubleElementCount].id << " " << (int)output[doubleElementCount].count << " " << (int)static_cast<unsigned char>(output[doubleElementCount].diagonal) << std::endl;
142+
// memory overflow can not happen since input array = output array
143+
doubleElementCount += (output[doubleElementCount].count != 0 || duplicateBitArray[hashBinElement] != static_cast<unsigned char>(binStartPos[n].diagonal)) ? 1 : 0;
144+
duplicateBitArray[hashBinElement] = static_cast<unsigned char>(element.diagonal);
145+
--n;
146+
}
147+
}
148+
return doubleElementCount;
149+
}
150+
112151
template<unsigned int BINSIZE>
113152
size_t CacheFriendlyOperations<BINSIZE>::mergeScoreDuplicates(CounterResult *output) {
114153
size_t doubleElementCount = 0;
@@ -211,12 +250,12 @@ size_t CacheFriendlyOperations<BINSIZE>::findDuplicates(CounterResult *output, s
211250
output[doubleElementCount].id = element;
212251
output[doubleElementCount].count = 0;
213252
output[doubleElementCount].diagonal = tmpElementBuffer[n].diagonal;
214-
// const unsigned char diagonal = static_cast<unsigned char>(tmpElementBuffer[n].diagonal);
253+
// const unsigned char diagonal = static_cast<unsigned char>(tmpElementBuffer[n].diagonal);
215254
// memory overflow can not happen since input array = output array
216-
// if(duplicateBitArray[hashBinElement] != tmpElementBuffer[n].diagonal){
217-
// std::cout << "seq="<< output[doubleElementCount].id << "\tDiag=" << (int) output[doubleElementCount].diagonal
218-
// << " dup.Array=" << (int)duplicateBitArray[hashBinElement] << " tmp.Arr="<< (int)tmpElementBuffer[n].diagonal << std::endl;
219-
// }
255+
// if(duplicateBitArray[hashBinElement] != tmpElementBuffer[n].diagonal){
256+
// std::cout << "seq="<< output[doubleElementCount].id << "\tDiag=" << (int) output[doubleElementCount].diagonal
257+
// << " dup.Array=" << (int)duplicateBitArray[hashBinElement] << " tmp.Arr="<< (int)tmpElementBuffer[n].diagonal << std::endl;
258+
// }
220259
doubleElementCount += (duplicateBitArray[hashBinElement] != static_cast<unsigned char>(tmpElementBuffer[n].diagonal)) ? 1 : 0;
221260
duplicateBitArray[hashBinElement] = static_cast<unsigned char>(tmpElementBuffer[n].diagonal);
222261
}

lib/mmseqs/src/prefiltering/CacheFriendlyOperations.h

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ class CacheFriendlyOperations {
8181
size_t mergeElementsByScore(CounterResult *inputOutputArray, const size_t N);
8282

8383
// merge elements in CounterResult by diagonal, combines elements with same ids that occur after each other
84-
size_t mergeElementsByDiagonal(CounterResult *inputOutputArray, const size_t N);
84+
size_t mergeElementsByDiagonal(CounterResult *inputOutputArray, const size_t N, const bool keepScoredHits = false);
8585

8686
size_t keepMaxScoreElementOnly(CounterResult *inputOutputArray, const size_t N);
8787

@@ -124,6 +124,8 @@ class CacheFriendlyOperations {
124124

125125
size_t mergeDiagonalDuplicates(CounterResult *output);
126126

127+
size_t mergeDiagonalKeepScoredHitsDuplicates(CounterResult *output);
128+
127129
size_t keepMaxElement(CounterResult *output);
128130
};
129131

0 commit comments

Comments
 (0)