Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to train my own dataset ? #10

Open
cdongxian opened this issue Oct 28, 2024 · 1 comment
Open

how to train my own dataset ? #10

cdongxian opened this issue Oct 28, 2024 · 1 comment

Comments

@cdongxian
Copy link

No description provided.

@agadetsky
Copy link
Collaborator

Dear @cdongxian,

To add your own dataset you have to implement the dataset initialization pipeline in get_datasets function

def get_datasets(dataset, transform, root_dir='./data'):

Also, don't forget to specify the number of classes in your newly added dataset in datasets_to_c dictionary. You can specify the ground truth number or, in case you don't know it, a meaningful estimate of the number of clusters in your dataset.

datasets_to_c = {

After that, follow the README instructions in the repo to precompute representations. If you have ground truth labels for your dataset, then also use precompute_labels.py to precompute the labels for evaluation purposes in run_turtle.py script. If you don't have them, follow solutions of the similar issues, i.e., #1 and #5.

After everything above is prepared, you can run training using run_turtle.py script, specifying your dataset in the command line.

Let me know if that has resolved your issue.

Best,
Artyom

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants