Yingfa Chen

2025

Multi-Head DeltaNet

In DeltaProduct (Siems et al., 2025), they propose to improve DeltaNet (Yang et al., 2025) by updating the online memory with $n_h$ KVs for each token, which can be seen as performing multiple steps of gradient descent per token. I will explain how this method is almost the same as multi-KV DeltaNet and reveal a potential flaw in the design of DeltaProduct.

665 words, 4 min

Research

阿联酋之旅

现在是 2025 年 3 月 23 日,凌晨 2 点。离上次写博客已经有差不多半年了。中间跟 00 一起去了阿联酋参加学术会议。寒假跟 00 会她老家过年,在她家里住了一个星期。见识了很多本地的风俗,感觉跟想象中有一点点不一样,算是个神奇的体验。

因为值得记录的经历真的太多了,这篇文章只能先写下去年国庆之后,到我们一起去阿联酋的这段时间。下一篇会补充我跟 00 回家过年以及之后的经历。

646 words, 2 min

Life

Implementating Test-Time Training - Part 1

This blog post is part 1 of a series that describes my attempt in implementing the Test-Time Training (TTT) model proposed by Sun et al. (2024), and Titans, proposed by Behrouz et al., (2024). At the time of writing, these two are two strong recurrent language models, but they have not yet open-sourced their implementation (TTT has only open-sourced the Jax implementation).

1.7k words, 10 min

Research

2024

VS Code 等宽字体的问题

在 VS Code 中,中英混用的时候会发现字体没有对齐。VS Code 官方说法是,渲染字体的方式是 Chromium 决定的,所以他们无法解决这个问题,他们推荐我们自行找中文等宽的字体。网上最常见的说法是用叫做 Sarasa-Gothic 的字体(中文是「更纱黑体」)。可是这个字体不仅巨大,还有点丑,名字我也不喜欢。还好找到了一个更符合我要求的字体:Ubuntu Mono

270 words, 1 min

Life

2024 国庆之后

刚刚放完 🇨🇳 国庆假,从东莞回来了北京继续我的博士生涯。这几个月感觉事情特别多,虽然很充实,但也很累,刚好这个七天长假(实际只有五天)可以让我喘口气。很久没有写博客了,上一次关于我自己的博客内容好像就是去年国庆之后的。刚好过一年,也可以当作一个年度总结吧。

3.2k words, 10 min

Life

(EREN) Robust and Scalable Model Editing for Large Language Models

GitHub | Paper (upcoming)

TL;DR: A reader is augmented with a growing notebook that caches all edits in natural texts, and the reader retrieves relevant edits and make inference based on them. This achieves SOTA in model editing in QA and fact-checking.

525 words, 3 min

Paper

InfiniteBench: Extending Long Context Evaluation Beyond 100K Tokens

Code | Paper

The first benchmark for evaluating the effectiveness of LLMs in handling more than 100k tokens!

In the paper, we name it $\infty$-Bench, but I will sometimes use "InfiniteBench" in this blog post for better readability.

Finally got some time to write this blog, been so busy lately! I have been in a fairly long duration of research hiatus, meanwhile the field of NLP has been revolutionized by an overwhelming number of new LLMs. Finally, I was able to arrive at some productive and meaningful work in this new era of research, as a second author. In this blog post, I will introduce this work that I have been working on recently.

1.1k words, 7 min

Research

2023

Interpreting a Maze-Solving Network

The blog post

I can't believe I haven't read this until now. This is mind-provoking, and the result is an important step towards understanding neural networks.

56 words, 1 min

Thoughts

Activation Addition (ActAdd)

Paper

TLDR: Propose ActAdd, a method for controlling model behavior during inference by modifying activations with a bias term that is learned from a pair of prompt.

Summary:

  • Propose ActAdd, a method for controlling model behavior by modifying activations at inference time.
  • Steering vectors are computed by taking the activation differences that result from pairs of prompts. The vectors are added as bias during inference.
  • ActAdd provides control over high-level properties of the output, and preserves off-target model performance, and requires little computational and implementational costs.

709 words, 4 min

Paper Note
0 %