Generalizing DeltaProduct

Generalizing DeltaProduct

In DeltaProduct 1, they propose to improve DeltaNet 1 by updating the online memory with KVs for each token, which can be seen as performing multiple steps of gradient descent per token. I will explain how this method is almost the same as multi KV DeltaNet and reveal a potential flaw in the design of DeltaProduct.

Mar 22, 2025 · 4 min