Generalizing DeltaProduct

Generalizing DeltaProduct

In DeltaProduct 1, they propose to improve DeltaNet 1 by updating the online memory with KVs for each token, which can be seen as performing multiple steps of gradient descent per token. I will explain how this method is almost the same as multi KV DeltaNet and reveal a potential flaw in the design of DeltaProduct.

Mar 22, 2025 · 4 min
2024 国庆之后

2024 国庆之后

刚刚放完 🇨🇳 国庆假,从东莞回来了北京继续我的博士生涯。这几个月感觉事情特别多,虽然很充实,但也很累,刚好这个七天长假(实际只有五天)可以让我喘口气。很久没有写博客了,上一次关于我自己的博客内容好像就是去年国庆之后的。刚好过一年,也可以当作一个年度总结吧。

Oct 10, 2024 · 1 min