Skip to content

Intruder detection scoring and example embedding scoring #113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 32 commits into from
Apr 21, 2025
Merged

Conversation

SrGonao
Copy link
Collaborator

@SrGonao SrGonao commented Apr 1, 2025

This PR became bigger than I wanted because by mistake I added code from two different branches.

It adds:

  • Two new scoring methods, that don't require any model explanations.
  • Centers activating examples. For now, 3/4 of the window will be to the left of the maximal activating token in the previous windows, and 1/4 will be to the right. To do this, I'm discarding all activations from the first and last window of each batch (which is 1/4 of the total activations we have if we collect with ctx_len 256).
  • Some fixes to the online client
  • New example sampling option

SrGonao and others added 30 commits March 7, 2025 15:08
@SrGonao SrGonao merged commit 6b9af04 into main Apr 21, 2025
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant