Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poisoning Vulnerability on COYO-700M #12

Open
carlini opened this issue Jan 6, 2023 · 2 comments
Open

Poisoning Vulnerability on COYO-700M #12

carlini opened this issue Jan 6, 2023 · 2 comments

Comments

@carlini
Copy link

carlini commented Jan 6, 2023

With some coauthors at Google, I have developed an attack that would allow someone to poison 0.1% of your dataset. (For what the impact of such an attack could be, see e.g., https://arxiv.org/pdf/2106.09667.pdf or https://arxiv.org/abs/2205.06401). Previously poisoning attacks have been considered somewhat theoretical---in that we knew they could exist, but there weren't any practical ways to mount these attacks. With our new techniques I now have the power to poison the dataset of anyone who has downloaded COYO-700m since it was released (but I don't). We believe this attack is not currently being exploited in the wild, but are hoping to release a paper on this attack shortly.

As part of this paper we have developed techniques to remediate the attack. We would like to help you apply these defenses before we publish our paper. I would appreciate it if you could contact me ([email protected]) at your convenience. I have previously emailed you additional details of this attack to the contact email address you provide ([email protected]) if you'd like to know more.

@carlini
Copy link
Author

carlini commented Jan 13, 2023

Hi, just wanting to follow up on this -- we're hoping one of you will be able to get in contact with us so we can help mitigate any vulnerabilities before we publish our results.

@mwbyeon
Copy link
Collaborator

mwbyeon commented Feb 25, 2023

@carlini Hi,
We will update the COYO dataset to include the SHA256 hash values shared by the co-authors and then release it soon. Thank you for your very impressive research and contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants