Interpreting a Maze-Solving Network

1 I can't believe I haven't read this until now. This is mind provoking, and the result is an important step towards understanding neural networks.

Oct 7, 2023 · 1 min

Activation Addition (ActAdd)

1 TLDR: Propose ActAdd , a method for controlling model behavior during inference by modifying activations with a bias term that is learned from a pair of prompt. Summary: Propose ActAdd , a method for controlling model behavior by modifying activations at inference time. Steering vectors are computed by taking the ...

Oct 7, 2023 · 4 min