Skip to content

Add per token and per context frequency to LatentRecord #114

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions delphi/latents/constructors.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
global model_cache
if (name, device) not in model_cache:
print(f"Loading model {name} on device {device}")
model_cache[(name, device)] = SentenceTransformer(name, device=device)

Check failure on line 29 in delphi/latents/constructors.py

View workflow job for this annotation

GitHub Actions / test

No overloads for "__init__" match the provided arguments   Argument types: (str, str) (reportCallIssue)
return model_cache[(name, device)]


Expand Down Expand Up @@ -159,6 +159,9 @@
non_active_indices = mask.nonzero(as_tuple=False).squeeze()
activations = activation_data.activations

# per context frequency
record.per_context_frequency = len(unique_batch_pos) / n_windows

# Add activation examples to the record in place
token_windows, act_windows = pool_max_activation_windows(
activations=activations,
Expand Down
7 changes: 7 additions & 0 deletions delphi/latents/latents.py
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,13 @@ class LatentRecord:
extra_examples: Optional[list[Example]] = None
"""Extra examples to include in the record."""

per_token_frequency: float = 0.0
"""Frequency of the latent. Number of activations per token."""

per_context_frequency: float = 0.0
"""Frequency of the latent. Number of activations in a context per total
number of contexts."""

@property
def max_activation(self) -> float:
"""
Expand Down
8 changes: 8 additions & 0 deletions delphi/latents/loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -378,6 +378,14 @@ async def _aprocess_latent(self, latent_data: LatentData) -> LatentRecord | None
if self.tokens is None:
raise ValueError("Tokens are not loaded")
record = LatentRecord(latent_data.latent)

# number of activations in the latent
n_active = len(latent_data.activation_data.activations)
# number of tokens in the latent
n_tokens = self.tokens.shape[1] * self.tokens.shape[0]
# frequency of the latent
record.per_token_frequency = n_active / n_tokens

if self.neighbours is not None:
record.set_neighbours(
self.neighbours[latent_data.module][
Expand Down
Loading