Skip to content

Commit

Permalink
BUG: Fix minor details for reproducibility
Browse files Browse the repository at this point in the history
  • Loading branch information
millanp95 committed Aug 30, 2024
1 parent 459ffa6 commit 1e80d54
Show file tree
Hide file tree
Showing 3 changed files with 12 additions and 11 deletions.
1 change: 1 addition & 0 deletions CC_requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ scipy==1.12+computecanada
seaborn
torch==2.1.1+computecanada
torchtext==0.16.1+computecanada
torchvision==0.16.1+computecanada
transformers==4.29.2+computecanada
umap-learn==0.5.6+computecanada
wandb
20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,13 @@ Currently, we leverage the release of the [BEND](https://github.com/frederikkema
* The nucleotide transformer (NT)

| Model | Architecture | SSL-Pretraining | Tokens seen | **Seen: Species** (Fine-tuned ) | Linear probe **Seen: Species** (Linear Probe) | **Unseen: Genus** (1NN-Probe) |
-------------|--------------|-----------------|---------------|:-----------------:|:-------------:|:----------------:|
-------------|--------------|-----------------|---------------|:-----------------:|:-------------:|:----------------:|
CNN baseline | CNN | -- | -- | 97.70 | -- | 29.88
NT | Transformer | Multi-Species | 300\,B | 98.99 | 52.41 | 21.67
DNABERT-2 | Transformer | Multi-Species | 512\,B | **99.23** | 67.81 | 17.99
DNABERT-S | Transformer | Multi-Species | ~1,000\,B | 98.99 | **95.50** | 17.70
HyenaDNA | SSM | Human DNA | 5\,B | 98.71 | 54.82 | 19.26
BarcodeBERT | Transformer | DNA barcodes | 5\,B | 98.52 | 91.93 | 23.15
NT | Transformer | Multi-Species | 300\,B | 98.99 | 52.41 | 21.67
DNABERT-2 | Transformer | Multi-Species | 512\,B | **99.23** | 67.81 | 17.99
DNABERT-S | Transformer | Multi-Species | ~1,000\,B | 98.99 | **95.50** | 17.70
HyenaDNA | SSM | Human DNA | 5\,B | 98.71 | 54.82 | 19.26
BarcodeBERT | Transformer | DNA barcodes | 5\,B | 98.52 | 91.93 | 23.15
Ours (8-4-4) | Transformer | DNA barcodes | 7\,B | **99.28** | 94.47 | **47.03**
BLAST* | -- | -- | -- | **99.78** | --- | **58.74**

Expand All @@ -43,9 +43,9 @@ python data_split.py BIOSCAN-5M_Dataset_metadata.tsv

```bash
python barcodebert/pretraining.py --dataset=BIOSCAN-5M --k_mer=8 --n_layers=4 --n_heads=4 --data_dir=data/ --checkpoint=model_checkpoints/BIOSCAN-5M/8_4_4/checkpoint_pretraining.pt
python barcodebert/knn_probing.py --data_dir=data/ --checkpoint=model_checkpoints/BIOSCAN-5M/8_4_4/checkpoint_pretraining.pt
python barcodebert/finetuning.py --data_dir=data/ --checkpoint=model_checkpoints/BIOSCAN-5M/8_4_4/checkpoint_pretraining.pt
python barcodebert/finetuning.py --data_dir=data/ --checkpoint=model_checkpoints/BIOSCAN-5M/8_4_4/checkpoint_pretraining.pt --freeze-encoder
python barcodebert/knn_probing.py --data_dir=data/ --pretrained_checkpoint=model_checkpoints/BIOSCAN-5M/8_4_4/checkpoint_pretraining.pt
python barcodebert/finetuning.py --data_dir=data/ --pretrained_checkpoint=model_checkpoints/BIOSCAN-5M/8_4_4/checkpoint_pretraining.pt
python barcodebert/finetuning.py --data_dir=data/ --pretrained_checkpoint=model_checkpoints/BIOSCAN-5M/8_4_4/checkpoint_pretraining.pt --freeze-encoder
```

4. Baseline model pipelines: The desired backbone can be selected using one of the following keywords: `NT, Hyena_DNA, DNABERT-2, DNABERT-S`
Expand Down Expand Up @@ -148,4 +148,4 @@ If there are any corrections automatically made by pre-commit or corrections you
pip install -r requirements.txt
```
--!>
--!>
2 changes: 1 addition & 1 deletion baselines/datasets.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ def representations_from_df(filename, embedder, batch_size=128):
os.mkdir(backbone_folder)

# Check if the embeddings have been saved for that file
prefix = filename.split(".")[0].split("/")[-1]
prefix = filename.split("/")[-1].split(".")[0]
out_fname = f"{os.path.join(backbone_folder, prefix)}.pickle"
print(out_fname)

Expand Down

0 comments on commit 1e80d54

Please sign in to comment.