Intruder detection scoring and example embedding scoring #113

SrGonao · 2025-04-01T15:19:03Z

This PR became bigger than I wanted because by mistake I added code from two different branches.

It adds:

Two new scoring methods, that don't require any model explanations.
Centers activating examples. For now, 3/4 of the window will be to the left of the maximal activating token in the previous windows, and 1/4 will be to the right. To do this, I'm discarding all activations from the first and last window of each batch (which is 1/4 of the total activations we have if we collect with ctx_len 256).
Some fixes to the online client
New example sampling option

…hbour_latents

…uder

for more information, see https://pre-commit.ci

…nto intruder

for more information, see https://pre-commit.ci

…uder

for more information, see https://pre-commit.ci

…intruder

SrGonao and others added 30 commits March 7, 2025 15:08

Using activations of neibhour latents

8b28d78

Fix typing

f99a7f9

New fuzz metrics

53155a2

Adding default fallback

b4e91e4

Merge branch 'main' of https://github.com/EleutherAI/delphi into neig…

b6e978b

…hbour_latents

Merge branch 'main' of https://github.com/EleutherAI/delphi into neig…

242530f

…hbour_latents

New intruder scorer - no need for explanations

576730f

New fuzz

f297a1c

Merge branch 'main' of https://github.com/EleutherAI/delphi into intr…

33cbbe4

…uder

New prompts

06a9254

Simplifying intruder

b7bdfa7

Removing unused parts

628cb17

Fix fuzz

47868ab

Cleaning intruder code

0b75f97

Adding example embedding scorer

4a24cae

Add number of tokens to generate to clients

8d996d2

Merge branch 'main' of https://github.com/EleutherAI/delphi into intr…

710eda7

…uder

[pre-commit.ci] auto fixes from pre-commit.com hooks

fda5033

for more information, see https://pre-commit.ci

Change type

e8b11f4

Small loader debug thingy

6ed5456

[pre-commit.ci] auto fixes from pre-commit.com hooks

e920de7

for more information, see https://pre-commit.ci

Adding a ration

b33eee7

Merge branch 'mix_sampling' of https://github.com/EleutherAI/delphi i…

f10cae7

…nto intruder

[pre-commit.ci] auto fixes from pre-commit.com hooks

f3aff2c

for more information, see https://pre-commit.ci

Correct FAISS error

6e3b31a

New centered examples

d7ef127

Merge branch 'main' of https://github.com/EleutherAI/delphi into intr…

4880aa1

…uder

[pre-commit.ci] auto fixes from pre-commit.com hooks

aa041a6

for more information, see https://pre-commit.ci

Delete something that shouldn't be here

d171a04

Merge branch 'intruder' of https://github.com/EleutherAI/delphi into …

cae363e

…intruder

SrGonao added 2 commits April 21, 2025 11:02

Center option, on by default now

cc9387c

Beartyping and fixing test

0c9e294

SrGonao merged commit 6b9af04 into main Apr 21, 2025
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intruder detection scoring and example embedding scoring #113

Intruder detection scoring and example embedding scoring #113

SrGonao commented Apr 1, 2025 •

edited

Loading

Intruder detection scoring and example embedding scoring #113

Intruder detection scoring and example embedding scoring #113

Conversation

SrGonao commented Apr 1, 2025 • edited Loading

SrGonao commented Apr 1, 2025 •

edited

Loading