-This is a package that provides a subroutine that loads the DNA sequences in the specified fasta file. The DNA sequences are then transformed into some other useful information, e.g. one-hot/WYK encoded vectors, shuffled sequences, Markov background estimates, K-fold cross-validations, etc. for downstream machine learning tasks. As of now, the subroutine requires all sequences in the fasta file must be the same length, and strings must be defined on DNA alphabets `{A,C,G,T}`.
0 commit comments