Skip to content

Commit

Permalink
Remove lddt-html parameter, add msa2lddtreport, make lddt optional in…
Browse files Browse the repository at this point in the history
… easy-msa
  • Loading branch information
gamcil committed Mar 5, 2024
1 parent e472460 commit 4999357
Show file tree
Hide file tree
Showing 8 changed files with 62 additions and 31 deletions.
42 changes: 29 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,20 +57,19 @@ The `easy-msa` module allows you to align multiple query structures formatted in

#### Output
##### FASTA alignment
Foldmason generates alignments in FASTA-format.
Alignments using the 3Di alphabet can be generated by specifying the output mode `--output-mode 1`.
Foldmason generates alignments in FASTA-format, with both amino acid and 3Di alphabets (`_aa.fa` and `_3di.fa` suffixes, respectively).

##### Interactive HTML
Foldmason can generate a HTML MSA visualisation by specifying `--lddt-html` in the `msa2lddt` module.
Foldmason can generate a HTML MSA visualisation using the `msa2lddtreport` module.

```
foldmason msa2lddt myDb result.fasta --lddt-html result.html
foldmason msa2lddtreport myDb result.fa result.html
```

This is generated automatically when using the `easy-msa`. The following will produce `result.fasta` and `result.html`.

```
foldmason easy-msa <PDB/mmCIF files> result.fasta tmpFolder
foldmason easy-msa <PDB/mmCIF files> result tmpFolder
```

<p align="center"><img src="./.github/html.gif" height="400"/></p>
Expand Down Expand Up @@ -103,33 +102,50 @@ foldmason createdb example/ structureDB
The easiest way to use Foldmason is to use the `easy-msa` workflow like so:

```
foldmason easy-msa <PDB/mmCIF files> result.fasta tmpFolder
foldmason easy-msa <PDB/mmCIF files> result tmpFolder
```

By default, `easy-msa` produces multiple alignments in FASTA format (`result_aa.fa` and `result_3di.fa` for amino acid and 3Di alphabets, respectively),
as well as a Newick format tree (`result.nw`). This is equivalent to the following sequence of commands:

```
foldmason createdb <PDB/mmCIF files> myDb
foldmason structuremsa myDb result
```

`easy-msa` can also compute the average LDDT score of the alignment and generate the interactive HTML visualisation by specifying
`--report-mode 1`, like so:

```
foldmason easy-msa <PDB/mmCIF files> result tmpFolder --report-mode 1
```

By default, `easy-msa` produces the multiple alignment in FASTA format (`result.fasta`), as well as the interactive HTML visualisation (`result.html`).
This is the equivalent of the following sequence of commands:

```
foldmason createdb <PDB/mmCIF files> myDb
foldmason structuremsa myDb result.fasta
foldmason msa2lddt myDb result.fasta --lddt-html result.html
foldmason structuremsa myDb result
foldmason msa2lddtreport myDb result_aa.fa result.html --guide-tree result.nw
```

Note: the generated guide tree is passed to `msa2lddtreport` to display it inside the HTML report.

### Aligning large data sets
Foldmason can use the clustering capabilities of Foldseek to pre-cluster input structures before alignment by specifying `--precluster`,
allowing for alignments of large sets of proteins.

```
foldmason easy-msa <PDB/mmCIF files> result.fasta tmpFolder --precluster
foldmason easy-msa <PDB/mmCIF files> result tmpFolder --precluster
```

### Computing LDDT of an externally created MSA
The `msa2lddt` module computes an average [Local Distance Difference Test (LDDT) score](https://doi.org/10.1093/bioinformatics/btt473)
over the length of an MSA. This is done automatically in the `easy-msa` workflow, but `msa2lddt` can be called separately to compute
the LDDT of any given alignment, so long as the structures in the MSA are present in the given structure database.
over the length of an MSA. This can be done automatically in the `easy-msa` workflow by specifying `--report-mode 1`, but
`msa2lddt` can be called separately to compute the LDDT of any given alignment, so long as the structures in the MSA are present in the given structure database:

```
foldmason msa2lddt myDb result.fasta --lddt-html result.html
foldmason msa2lddt myDb result.fa
foldmason msa2lddtreport myDb result.fa result.html
```

Average MSA LDDT is calculated by computing per-column LDDT scores of every pair of sequences in the MSA, and then averaging them over the length of the MSA.
Expand Down
2 changes: 1 addition & 1 deletion regression/regression/run_easymsa.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/sh -ex
"$FOLDMASON" easy-msa --threads 1 $(find "$DATADIR" -type f -name 'd*') "${RESULTS}/easymsa" "${RESULTS}/tmp" > "${RESULTS}/easymsa.log"
"$FOLDMASON" easy-msa --report-mode 1 --threads 1 $(find "$DATADIR" -type f -name 'd*') "${RESULTS}/easymsa" "${RESULTS}/tmp" > "${RESULTS}/easymsa.log"
TARGET="0.698404"
awk -v target="$TARGET" \
'/Average MSA LDDT/ {print ($4 == target) ? "GOOD" : "BAD"; print "Expected: ", target; print "Actual: ", $4; }' \
Expand Down
1 change: 1 addition & 0 deletions src/LocalCommandDeclarations.h
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ extern int structureungappedalign(int argc, const char** argv, const Command &co
extern int structuremsa(int argc, const char** argv, const Command &command);
extern int structuremsacluster(int argc, const char** argv, const Command &command);
extern int msa2lddt(int argc, const char** argv, const Command &command);
extern int msa2lddtreport(int argc, const char** argv, const Command &command);
extern int refinemsa(int argc, const char** argv, const Command &command);
extern int convert2pdb(int argc, const char** argv, const Command &command);
extern int compressca(int argc, const char** argv, const Command &command);
Expand Down
5 changes: 3 additions & 2 deletions src/commons/FoldmasonParameters.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ FoldmasonParameters::FoldmasonParameters() :
PARAM_REFINE_ITERS(PARAM_REFINE_ITERS_ID, "--refine-iters", "Total refinement iterations", "Number of alignment refinement iterations", typeid(int), (void *) &refineIters, "[0-9]{1}[0-9]*$"),
PARAM_BITFACTOR_AA(PARAM_BITFACTOR_AA_ID, "--bitfactor-aa", "AA matrix bit factor", "AA matrix bit factor", typeid(float), (void *) &bitFactorAa, "^([0-9]*\\.[0-9]*)$"),
PARAM_BITFACTOR_3DI(PARAM_BITFACTOR_3DI_ID, "--bitfactor-3di", "3Di matrix bit factor", "3Di matrix bit factor", typeid(float), (void *) &bitFactor3Di, "^([0-9]*\\.[0-9]*)$"),
PARAM_LDDT_HTML(PARAM_LDDT_HTML_ID, "--lddt-html", "LDDT HTML file", "File to write LDDT MSA HTML visualisation to", typeid(std::string), (void *) &lddtHtml, ""),
PARAM_PAIR_THRESHOLD(PARAM_PAIR_THRESHOLD_ID, "--pair-threshold", "LDDT pair threshold", "% of pair subalignments with LDDT information [0.0,1.0]",typeid(float), (void *) &pairThreshold, "^0(\\.[0-9]+)?|1(\\.0+)?$"),
PARAM_REPORT_COMMAND(PARAM_REPORT_COMMAND_ID, "--report-command", "", "", typeid(std::string), (void *) &reportCommand, "")
{
Expand Down Expand Up @@ -53,7 +52,6 @@ FoldmasonParameters::FoldmasonParameters() :
structuremsacluster = combineList(structuremsacluster, structuremsa);
// msa2lddt
msa2lddt.push_back(&PARAM_LDDT_HTML);
msa2lddt.push_back(&PARAM_PAIR_THRESHOLD);
msa2lddt.push_back(&PARAM_THREADS);
msa2lddt.push_back(&PARAM_GUIDE_TREE);
Expand All @@ -63,10 +61,13 @@ FoldmasonParameters::FoldmasonParameters() :
// refinemsa
refinemsa = combineList(refinemsa, structuremsa);
// easymsa
PARAM_REPORT_MODE.description = "MSA report mode 0: AA/3Di FASTA files only, 1: Compute LDDT and generate HTML report";
easymsaworkflow = combineList(easymsaworkflow, structurecreatedb);
easymsaworkflow = combineList(easymsaworkflow, structuremsa);
easymsaworkflow = combineList(easymsaworkflow, msa2lddt);
easymsaworkflow.push_back(&PARAM_PRECLUSTER);
easymsaworkflow.push_back(&PARAM_REPORT_MODE);
pcaAa = 1.1;
pcbAa = 4.1;
Expand Down
2 changes: 0 additions & 2 deletions src/commons/FoldmasonParameters.h
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,6 @@ class FoldmasonParameters : public LocalParameters {
PARAMETER(PARAM_REFINE_ITERS)
PARAMETER(PARAM_BITFACTOR_AA)
PARAMETER(PARAM_BITFACTOR_3DI)
PARAMETER(PARAM_LDDT_HTML)
PARAMETER(PARAM_PAIR_THRESHOLD)
PARAMETER(PARAM_REPORT_COMMAND)

Expand All @@ -61,7 +60,6 @@ class FoldmasonParameters : public LocalParameters {
int refineIters;
float bitFactorAa;
float bitFactor3Di;
std::string lddtHtml;
std::string reportCommand;
float pairThreshold;
};
Expand Down
8 changes: 8 additions & 0 deletions src/foldmason.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,14 @@ std::vector<struct Command> foldmasonCommands = {
"<i:queryDB> <i:msaFile>",
CITATION_FOLDMASON, {{"queryDB", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA, &DbValidator::sequenceDb },
{"msaFile", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA, &DbValidator::flatfileAndStdin },}},
{"msa2lddtreport", msa2lddtreport, &foldmasonPar.msa2lddt, COMMAND_ALIGNMENT,
"Calculate LDDT of a MSA and generate HTML report",
NULL,
"Cameron Gilchrist <[email protected]> & Martin Steinegger <[email protected]>",
"<i:queryDB> <i:msaFile> <o:htmlFile>",
CITATION_FOLDMASON, {{"queryDB", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA, &DbValidator::sequenceDb },
{"msaFile", DbType::ACCESS_MODE_INPUT, DbType::NEED_DATA, &DbValidator::flatfileAndStdin },
{"htmlFile", DbType::ACCESS_MODE_OUTPUT, DbType::NEED_DATA, &DbValidator::flatfile }}},
{"refinemsa", refinemsa, &foldmasonPar.refinemsa, COMMAND_ALIGNMENT,
"Iteratively refine a MSA using structural information",
NULL,
Expand Down
18 changes: 12 additions & 6 deletions src/strucclustutils/msa2lddt.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -359,7 +359,7 @@ float getLDDTScore(
return lddtScore;
}

int msa2lddt(int argc, const char **argv, const Command& command) {
int msa2lddt(int argc, const char **argv, const Command& command, bool makeReport) {
FoldmasonParameters &par = FoldmasonParameters::getFoldmasonInstance();

const bool touch = (par.preloadMode != Parameters::PRELOAD_MODE_MMAP);
Expand Down Expand Up @@ -399,10 +399,8 @@ int msa2lddt(int argc, const char **argv, const Command& command) {
std::cout << "Average MSA LDDT: " << lddtScore << std::endl;

// Write clustal format MSA HTML
// TODO: make optional
if (par.lddtHtml != "") {
std::string lddtHtmlIdx = par.lddtHtml + ".index";
DBWriter resultWriter(par.lddtHtml.c_str(), lddtHtmlIdx.c_str(), static_cast<unsigned int>(par.threads), par.compressed, Parameters::DBTYPE_OMIT_FILE);
if (makeReport) {
DBWriter resultWriter(par.db3.c_str(), (par.db3 + ".index").c_str(), static_cast<unsigned int>(par.threads), par.compressed, Parameters::DBTYPE_OMIT_FILE);
resultWriter.open();

/*
Expand Down Expand Up @@ -518,7 +516,7 @@ R"html(<!DOCTYPE html>
resultWriter.writeData(end.c_str(), end.length(), 0, 0, false, false);
resultWriter.writeEnd(0, 0, false, 0);
resultWriter.close(true);
FileUtil::remove(lddtHtmlIdx.c_str());
FileUtil::remove((par.db3 + ".index").c_str());
}

if (par.reportCommand != "") {
Expand All @@ -530,4 +528,12 @@ R"html(<!DOCTYPE html>
seqDbr3Di.close();

return EXIT_SUCCESS;
}


int msa2lddt(int argc, const char **argv, const Command& command) {
return msa2lddt(argc, argv, command, false);
}
int msa2lddtreport(int argc, const char **argv, const Command& command) {
return msa2lddt(argc, argv, command, true);
}
15 changes: 8 additions & 7 deletions src/workflow/EasyMSA.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,7 @@ int easymsa(int argc, const char **argv, const Command &command) {
cmd.addVariable("THREADS_PAR", par.createParameterString(par.onlythreads).c_str());
cmd.addVariable("VERBOSITY", par.createParameterString(par.onlyverbosity).c_str());
cmd.addVariable("PRECLUSTER", par.precluster ? "TRUE" : NULL);
cmd.addVariable("MAKE_REPORT", (par.reportMode == 1) ? "TRUE" : NULL);
cmd.addVariable("CREATEDB_PAR", par.createParameterString(par.structurecreatedb).c_str());

// needs to be last
Expand All @@ -101,13 +102,13 @@ int easymsa(int argc, const char **argv, const Command &command) {
par.PARAM_MATCH_RATIO.wasSet = true;
par.PARAM_FILTER_MSA.wasSet = true;
par.reportCommand = par.createParameterString(par.easymsaworkflow, true);
for (size_t i = 0; i < par.msa2lddt.size(); i++) {
if (par.msa2lddt[i]->uniqid != par.PARAM_LDDT_HTML.uniqid &&
par.msa2lddt[i]->uniqid != par.PARAM_GUIDE_TREE.uniqid) {
msa2lddtWithoutHtml.push_back(par.msa2lddt[i]);
}
}
cmd.addVariable("MSA2LDDT_PAR", par.createParameterString(msa2lddtWithoutHtml).c_str());
// for (size_t i = 0; i < par.msa2lddt.size(); i++) {
// if (par.msa2lddt[i]->uniqid != par.PARAM_LDDT_HTML.uniqid &&
// par.msa2lddt[i]->uniqid != par.PARAM_GUIDE_TREE.uniqid) {
// msa2lddtWithoutHtml.push_back(par.msa2lddt[i]);
// }
// }
// cmd.addVariable("MSA2LDDT_PAR", par.createParameterString(msa2lddtWithoutHtml).c_str());

std::string program = tmpDir + "/easymsa.sh";
FileUtil::writeFile(program, easymsa_sh, easymsa_sh_len);
Expand Down

0 comments on commit 4999357

Please sign in to comment.