how to train my own dataset ? #10

cdongxian · 2024-10-28T02:54:59Z

No description provided.

agadetsky · 2024-10-28T09:38:14Z

To add your own dataset you have to implement the dataset initialization pipeline in get_datasets function

turtle/dataset_preparation/data_utils.py

Line 70 in 9b8bbb7

def get_datasets(dataset, transform, root_dir='./data'):

Also, don't forget to specify the number of classes in your newly added dataset in datasets_to_c dictionary. You can specify the ground truth number or, in case you don't know it, a meaningful estimate of the number of clusters in your dataset.

turtle/utils.py

Line 99 in 9b8bbb7

datasets_to_c = {

After that, follow the README instructions in the repo to precompute representations. If you have ground truth labels for your dataset, then also use precompute_labels.py to precompute the labels for evaluation purposes in run_turtle.py script. If you don't have them, follow solutions of the similar issues, i.e., #1 and #5.

After everything above is prepared, you can run training using run_turtle.py script, specifying your dataset in the command line.

Let me know if that has resolved your issue.

Best,
Artyom

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to train my own dataset ? #10

how to train my own dataset ? #10

cdongxian commented Oct 28, 2024

agadetsky commented Oct 28, 2024

how to train my own dataset ? #10

how to train my own dataset ? #10

Comments

cdongxian commented Oct 28, 2024

agadetsky commented Oct 28, 2024