Skip to content

Latest commit

 

History

History
260 lines (168 loc) · 7.74 KB

Scratch.md

File metadata and controls

260 lines (168 loc) · 7.74 KB
id aliases tags author comments date modified title
Scratch
dump
scratch
evergreen
aarnphm
false
2024-09-18
2024-12-23 16:09:16 GMT-05:00
Scratchpad

features development

design UI: everyone P0 (time to do this)

  • workflow: how does panel interact

  • implementation UI: @nebrask (A/B testing)

  • iterative feature development afterwards: @lucas @waleed

  • ML components: @aarnphm

  • training: (research) (quality testing) <- @aarnphm1

  • inference: (infrastructure) (A/B testing, regression testing) @waleed

    • OpenAI-compatible API server: functional
    • Edit logits for inference server (vllm, llama-cpp)
    • local inference
    • UX: TTFT (time to first tokens)
    • inference engine: vLLM (GPU), llama-cpp (CPU)
    • vllm plugins support
  • multiplayer text editor: (target: stakeholders) + (other player: AI models) (P3)

ux.

https://x.com/thesephist/status/1793033871606382748

Inline-definition

https://x.com/JohnPhamous/status/1841527353270476808

Storage (local):

  • XDG_DATA_HOME/tinymorph for configuration, state, db

accesibility:

Telescopic text

expansion upon telescopic text: notation

https://x.com/david_perell/status/1841875983676162124

website

cursor navigation:

https://x.com/JaceThings/status/1843441743187861850

graph-based:

=> Conceptual: Mind map

  • empirical

non-linear actions -> linear actions

drag-and-drop posted notes => generate posted-notes

cost.

Using EC2 for GPUs and inference cost. (Running on A100 with 32 CPUs)

text editor

[!question]

  • What sort of data structure we want to use for implement this?
  • How should we implement cursor and certain buffers?
  • File management locally (preferrably user-owned instead of centralized data storage)
  • [Stretch, preference] Can we support modal editing?
  • How do we handle syntax highlighting as well as markdown rendering (think of treesitter, but then shiki is pretty computationally expensive)
  • How should we handle file (Chromium has a file system API builtin the browser)

For node server, I'm thinking we should keep it light, as it can run a simple proxy server that opens a websocket to stream the JSON to the browser (probably easiest for us as we don't have to worry too much about graphQL or any of that nonsense db) has context menu

See [[play.html]] for dead-simple editor I played with.

Local file is a must (can be accessed via file:///)

async function createFolder() {
  try {
    const dirHandle = await window.showDirectoryPicker()

    // Create a new folder
    const newFolderHandle = await dirHandle.getDirectoryHandle("NewFolder", { create: true })

    console.log("Folder created successfully")
    return newFolderHandle
  } catch (err) {
    console.error("Error creating folder:", err)
  }
}

Possible UI component library: shadcn/ui

https://x.com/CherrilynnZ/status/1836881535154409629

editor: https://prosemirror.net/

[!question]- What is the data model for planning?

CoT drawbacks

training [[glossary#sparse autoencoders|SAEs]]

see also: Goodfire preview releases

Dictionary learning: https://transformer-circuits.pub/2023/monosemantic-features/index.html => motivation to prove SAE results in interpretable features

https://transformer-circuits.pub/2024/scaling-monosemanticity/

for finding attention activation.

Anthropic's report on training SAEs

on GPT-2: github and paper

  • lens into viewing random activations

https://lstmvis.vizhub.ai/ => LSTM vis https://github.com/TransformerLensOrg/TransformerLens

https://blog.eleuther.ai/autointerp/

Attribute allocation?

[!question] How should we steer?

  • Think of using SAEs => iterate better prompt

Features composition for guided steering

features rep? Correctness w/ models internal representation (trie for models)

  • manually curate features features ablation:

Accurate mappings based on human and machine features?

Context: reduction in third space of model representations

representation based on users text (build features)

Use SAE to steerable generations2 <= User feedbacks

[!IMPORTANT] problem statement. actionable steering for attention-based models

[!question] RAG-infused pipeline

What if we add additional web-search vectors to enhance correctness in steering?

inference

Steering Llama via Contrastive Activation Addition [@panickssery2024steeringllama2contrastive], code

  • Seems like they are using layer 16 for interp Claude's features

self-explanation

Excerpt from Self-explaining SAE features

  • Idea: replace residual stream on X with decoder direction times a given scale, called ==self-explanation==
  • auto-interp: use a larger LLM to spot patterns in max activating examples (See Neuronpedia's auto-interp)

A variant of activation patching

See also: SelfE or Patchscope

Important

align with auto-interp3 as the current standard for SAE feature interpretation.

self-similarity

measure cosine similarity between the 16th layer residual stream of last prompt tokens and original SAE feature.

entropy

based on predicted distribution of the answer's first token.

The distribution is represented as $P(t_n | t_{1 \dots n-1})$. Since we will insert SAE feature direction into one of the prompt token, the distribution becomes $P(t_n | t_{1 \dots n-1}, f)$ where $f$ is the SAE feature index..

Note that entropy decreases as the mutual information between random variable representing the feature and first answer token increases.

composite

ranks the optimal scale in the top-3 list for a much larger percentage of cases.

$$ \text{composite}(x) = \alpha \cdot \text{self-similarity}(x) + (1 - \alpha) \cdot \text{entropy}(x) $$

mathematical framework for transformers circuits

excerpt from this transformers threads

automatic interpretability

see also: Transluce's Monitor source

plan

  • update docs
    • waleed & lucas
  • frontend
    • nebras
  • server
    • aarnphm

Prep for POC interview

  1. Lectures
  2. TA meetings
  3. Extras

Extras

user manual and usability testing

Footnotes

  1. mwatkins's earlier exploration

  2. Linus' talk on interface for latent space exploration site or yt

  3. Language models can explain neurons in language models