Yingfa Chen
rnn

2025

Multi-Head DeltaNet

In DeltaProduct (Siems et al., 2025), they propose to improve DeltaNet (Yang et al., 2025) by updating the online memory with $n_h$ KVs for each token, which can be seen as performing multiple steps of gradient descent per token. I will explain how this method is almost the same as multi-KV DeltaNet and reveal a potential flaw in the design of DeltaProduct.

665 words, 4 min

Research

2024

2024 国庆之后

刚刚放完 🇨🇳 国庆假,从东莞回来了北京继续我的博士生涯。这几个月感觉事情特别多,虽然很充实,但也很累,刚好这个七天长假(实际只有五天)可以让我喘口气。很久没有写博客了,上一次关于我自己的博客内容好像就是去年国庆之后的。刚好过一年,也可以当作一个年度总结吧。

3.2k words, 10 min

Life
0 %