Skip to content

Latest commit

 

History

History
116 lines (110 loc) · 6.4 KB

TODO.md

File metadata and controls

116 lines (110 loc) · 6.4 KB

In Progress

  • Tests
    • Workflow test with example data
    • Trivial examples for each function
    • Unit tests for SSI
    • Unit tests for density features
  • Integrate DiffNets.
    • Lay out module structure in separate branch.
    • Copy core network from DiffNets repo.
    • Try to use existing featurization.
    • Include existing DiffNets featurization and compare.
  • exploratory analysis via correlation coefficients of the features
    • First tests --> not very promising.
    • Try different metric
    • Find useful application or leave it out.
  • Unified tutorial in documentation. Make one page for each subpackage
    • preprocessing
      • coordinates
      • densities
    • featurization
      • structure features
      • water features
      • atom features
    • comparison
    • dimensionality reduction
    • clusters (show how to cluster on PCs)
    • SSI

Plans

  • More example tcl scripts for VMD
  • Facilitate calculation of JSD etc. on principal components
  • Facilitate calculation of SSI on results of joint clustering.
  • Feature comparison of more than two ensembles
    • with respect to the joint ensemble (all metrics)
    • with respect to a reference ensemble (will not always work for KLD)
  • Use MDAnalysis instead of biotite for water featurization
  • Use MDAnalysis instead of PyEMMA to read features (to avoid mmshare dependency).
  • Use scikit-learn or Deeptime instead of PyEMMA for clustering.
  • Use scikit-learn or Deeptime instead of PyEMMA for dimensionality reduction.
  • Put shared functionality of PCA and TICA into shared functions.
  • Weighted PCA/tICA? (to account for varying simulation lengths or uncertainty)
  • Implement T-distributed Stochastic Neighbor Embedding (t-SNE)
    • Read up on t-SNE for molecular trajectories
    • See if we can import or adapt existing code.
    • First tests with (regular) t-SNE
    • Test time-lagged t-SNE. How to handle time-dependence across simulations/ensembles?
    • write module
    • write unit tests
  • Implement a clustering algorithem designed for structural ensembles
    • Read up about CLoNe
    • First tests
    • write module
    • write unit tests
  • Make file format (png/pdf?) for matplotlib optional.
  • Implement Linear Discriminant Analysis.
  • Implement Non-Negative Matrix Factorization.
  • Implement nucleic acid torsions and pseudo-torsions, as reviewed Keating et al. and as used in x3DNA or Barnaba (Barnaba code on GitHub)

Ideas

Done ✓

  • Colab Tutorial
    • Put Notebook on Colab and get it to run.
    • Add visualizations.
    • Fix installation via pip.
    • Fix animations (they only show white canvas).
    • Add TICA to Colab tutorial.
  • Include TICA in unit tests
  • Write "getting started" for documentation
  • Refactoring and fixes for release 0.2
    • Restructure modules to subpackages
    • Adapt README
    • Adapt API documentation
    • Include SSI to comparison example script
    • Numbering of principal component trajectories starts with 0, should start with 1
    • Axis labels and legend name for distance matrix plot
    • Function pca_features() does not have labels
    • Function compare_projections() does not have labels or legend
  • Slack channel for all developers and testers, and to provide support for the user community.
  • Implement clustering in principal component space
  • Option to write and load features as CSV file.

Abandoned

  • Frame classification via CNN on features
    • Prototype to classify simulation frames --> Diffnets probably more powerful.
    • Interpret weights as relevance of features
    • Write module
    • Write unit tests