Look into converting sequences into bit strings #22

ArtPoon · 2019-07-31T15:10:00Z

Two options:

Encode nucleotides with two bits (four states = four nucleotides)
Encode with four bits for presence/absence of each nucleotide, which enables us to encode mixtures (preferred).

Computing number of differences should be attainable by a fast bitwise operator.

-Implemented 'encode' (#22) -Added unit test -Fixes #19

kwade4 · 2019-08-07T14:11:15Z

The majority of the execution time is spent bootstrapping (eg: 151 seconds out of 164 seconds on Windows, using randint() ), so I am unsure whether using bit strings would provide much of a performance increase.

ArtPoon · 2019-08-07T14:12:21Z

Ok but won't bootstrapping the match/mismatch binary vector instead of sequences speed things up considerably?

ArtPoon · 2019-08-07T14:12:42Z

Also try replacing randint with random and round.

kwade4 · 2019-08-07T14:46:54Z

Yes, that would.
Regarding random and randint, I realized I misread the numbers. I updated my previous comment with the correct numbers.

Using randint: 151 of 164 seconds are spent bootstrapping (on Windows)
Using random and round: 47 of 52 seconds are spent bootstrapping (on Windows)

I also found using random.choices to pre-compute a list of random numbers, and slicing the list at each window could increase performance.

Using `random.random()` and `round`

n = len(best_seq)
for rep in range(nrep):
    boot = [best_seq[round(random.random() * (n - 1))] for _ in range(n)]
    if sum(boot) / len(boot) < second_p:
        count += 1
quant = count / nrep

Total Time = 52 seconds (on Windows)
Bootstrapping Tme = 47 seconds (on Windows)

Using `random.choices()`

n = len(best_seq)
sample = random.choices(best_seq, k=n*nrep)
for rep in range(nrep):
    boot = sample[rep: rep + n]
    if sum(boot) / n < second_p:
        count += 1
    quant = count / nrep

Total Time: 27 seconds (on Windows)
Bootstrapping Time: 23 seconds (on Windows)

kwade4 · 2019-08-07T14:47:11Z

I think using the match/mismatch binary vector and random.choices could speed things up (even on Windows).

ArtPoon mentioned this issue Jul 31, 2019

riplike: script is too slow compared to LANL tool #12

Open

kwade4 added a commit that referenced this issue Jul 31, 2019

-Fixes #17

8c805e1

-Implemented 'encode' (#22) -Added unit test -Fixes #19

kwade4 added a commit that referenced this issue Aug 28, 2019

-Fixed encoding in riplike(#22, #25)

f3c3225

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Look into converting sequences into bit strings #22

Look into converting sequences into bit strings #22

ArtPoon commented Jul 31, 2019

kwade4 commented Aug 7, 2019 •

edited

Loading

ArtPoon commented Aug 7, 2019

ArtPoon commented Aug 7, 2019

kwade4 commented Aug 7, 2019

kwade4 commented Aug 7, 2019

Look into converting sequences into bit strings #22

Look into converting sequences into bit strings #22

Comments

ArtPoon commented Jul 31, 2019

kwade4 commented Aug 7, 2019 • edited Loading

ArtPoon commented Aug 7, 2019

ArtPoon commented Aug 7, 2019

kwade4 commented Aug 7, 2019

Using random.random() and round

Using random.choices()

kwade4 commented Aug 7, 2019

kwade4 commented Aug 7, 2019 •

edited

Loading

Using `random.random()` and `round`

Using `random.choices()`