Yingfa Chen

2023

Activation Addition (ActAdd)

Paper

TLDR: Propose ActAdd, a method for controlling model behavior during inference by modifying activations with a bias term that is learned from a pair of prompt.

Summary:

  • Propose ActAdd, a method for controlling model behavior by modifying activations at inference time.
  • Steering vectors are computed by taking the activation differences that result from pairs of prompts. The vectors are added as bias during inference.
  • ActAdd provides control over high-level properties of the output, and preserves off-target model performance, and requires little computational and implementational costs.

709 words, 4 min

Paper Note

Safety and Ethical Concerns of Large Language Models

I will be holding a seminar at ModelBest (面壁智能) in Sep 20, 2023 in Beijing, Haidian, 科技园. The seminar will be in Chinese, and it's called "大模型安全与伦理问题" (translation: Safety and Ethical Concerns of Large Language Models). Below is a list of references.

635 words, 3 min

Thoughts
0 %