You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For createtsv on a cluster db to return a clusters where representatives (column 1) are more representative of the full cluster (column 2).
This could be a centroid or at least a sequence with bidirectional hits to relatively many different cluster members.
Current Behavior
The createtsv tsv file often selects outliers as representatives.
Steps to Reproduce (for bugs)
After a successful foldseek cluster result, execute: foldseek createtsv DB DB DB_C DB_C.tsv
Context
Current behaviour is problematic for two reasons:
When clustering is used to reduce redundancy, the non-redundant set becomes a poor representative of the full set
When outputing cluster MSAs with result2msa the msas are bad quality because of the outlier a3m reference
Both of the above lead to excessive information-loss as a result of clustering, which can be avoided by selecting a more appropriate cluster representative.
The text was updated successfully, but these errors were encountered:
shiraz-shah
changed the title
foldseek createtsv for a cluster db outputs a tsv file where cluster "representatives" are often outliers instead of centroidsfoldseek createtsv on a clustering result outputs a tsv file where "representatives" are outliers, not centroids
Dec 17, 2024
shiraz-shah
changed the title
foldseek createtsv on a clustering result outputs a tsv file where "representatives" are outliers, not centroidsfoldseek createtsv on a clustering result outputs a tsv file where representatives are outliers, not centroids
Dec 17, 2024
Expected Behavior
For
createtsv
on a cluster db to return a clusters where representatives (column 1) are more representative of the full cluster (column 2).This could be a centroid or at least a sequence with bidirectional hits to relatively many different cluster members.
Current Behavior
The createtsv tsv file often selects outliers as representatives.
Steps to Reproduce (for bugs)
After a successful
foldseek cluster
result, execute:foldseek createtsv DB DB DB_C DB_C.tsv
Context
Current behaviour is problematic for two reasons:
The text was updated successfully, but these errors were encountered: