Tag: ai-alignment | Yingfa Chen 陈英发

Activation Addition (ActAdd)

1 TLDR: Propose ActAdd , a method for controlling model behavior during inference by modifying activations with a bias term that is learned from a pair of prompt. Summary: Propose ActAdd , a method for controlling model behavior by modifying activations at inference time. Steering vectors are computed by taking the ...

Safety and Ethical Concerns of Large Language Models

I will be holding a seminar at ModelBest (面壁智能) in Sep 20, 2023 in Beijing, Haidian, 科技园. The seminar will be in Chinese, and it's called "大模型安全与伦理问题" (translation: Safety and Ethical Concerns of Large Language Models). Below is a list of references.