Tag: representation-engineering | Yingfa Chen 陈英发

Interpreting a Maze-Solving Network

1 I can't believe I haven't read this until now. This is mind provoking, and the result is an important step towards understanding neural networks.

Activation Addition (ActAdd)

1 TLDR: Propose ActAdd , a method for controlling model behavior during inference by modifying activations with a bias term that is learned from a pair of prompt. Summary: Propose ActAdd , a method for controlling model behavior by modifying activations at inference time. Steering vectors are computed by taking the ...