<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Research on Yingfa Chen 陈英发</title><link>https://chen-yingfa.github.io/categories/research/</link><description>Recent content in Research on Yingfa Chen 陈英发</description><generator>Hugo -- 0.146.6</generator><language>en-us</language><lastBuildDate>Mon, 07 Jul 2025 20:11:00 +0000</lastBuildDate><atom:link href="https://chen-yingfa.github.io/categories/research/index.xml" rel="self" type="application/rss+xml"/><item><title>The Rise of Test-Time Training</title><link>https://chen-yingfa.github.io/research_posts/2025-rise-of-ttt/</link><pubDate>Mon, 07 Jul 2025 20:11:00 +0000</pubDate><guid>https://chen-yingfa.github.io/research_posts/2025-rise-of-ttt/</guid><description>&lt;p>&lt;strong>Abstract&lt;/strong>:&lt;/p>
&lt;p>The main idea in test-time training (TTT) (&lt;a href="https://arxiv.org/pdf/2407.04620">Sun et al. 2024&lt;/a>) is that a model with fixed parameters produces the supervision for another network that is updated during test-time (or inference-time). This article first reviews the TTT paper. Then, we discuss the problem with TTT and how LaCT addresses them, resulting in a powerful attention alternative that balances efficiency and performance.&lt;/p>
&lt;!-- more -->
&lt;blockquote>
&lt;p>Currently, &amp;ldquo;test-time training&amp;rdquo; is an overloaded term with multiple meanings. In this article, we use the term to refer to the test-time training paradigm proposed in &lt;a href="https://arxiv.org/pdf/2407.04620">Sun et al. 2024&lt;/a>, which is a framework for recurrent architectures for sequence modeling.&lt;/p></description></item><item><title>Generalizing DeltaProduct</title><link>https://chen-yingfa.github.io/research_posts/2025-generalizing-delta-product/</link><pubDate>Sat, 22 Mar 2025 23:40:08 +0000</pubDate><guid>https://chen-yingfa.github.io/research_posts/2025-generalizing-delta-product/</guid><description>&lt;script type="text/javascript" id="MathJax-script" async
src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js">
&lt;/script>
&lt;p>In DeltaProduct &lt;a href="https://arxiv.org/pdf/2502.10297">(Siems et al., 2025)&lt;/a>, they propose to improve DeltaNet &lt;a href="https://arxiv.org/pdf/2412.06464">(Yang et al., 2025)&lt;/a> by updating the online memory with $n_h$ KVs for each token, which can be seen as performing multiple steps of gradient descent per token. I will explain how this method is almost the same as multi-KV DeltaNet and reveal a potential flaw in the design of DeltaProduct.&lt;/p>
&lt;!-- more -->
&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;h3 id="deltanet">DeltaNet&lt;/h3>
&lt;blockquote>
&lt;p>We use row-vector notation.&lt;/p></description></item><item><title>Implementating Test-Time Training - Part 1</title><link>https://chen-yingfa.github.io/research_posts/2025-ttt-implementation/</link><pubDate>Wed, 19 Mar 2025 23:56:44 +0000</pubDate><guid>https://chen-yingfa.github.io/research_posts/2025-ttt-implementation/</guid><description>&lt;p>This blog post is part 1 of a series that describes my attempt in implementing the Test-Time Training (TTT) model proposed by &lt;a href="https://arxiv.org/abs/2407.04620">Sun et al. (2024)&lt;/a>, and Titans, proposed by &lt;a href="https://arxiv.org/abs/2501.00663">Behrouz et al., (2024)&lt;/a>. At the time of writing, these two are two strong recurrent language models, but they have not yet open-sourced their implementation (TTT has only open-sourced the Jax implementation).&lt;/p>
&lt;!-- more -->
&lt;h2 id="introduction-to-test-time-training">Introduction to Test-Time Training&lt;/h2>
&lt;p>Briefly explained, Test-Time Training (TTT) is an RNN model whose hidden state is replaced with an online learner, whose parameters are updated updated through gradient descent during inference. The goal is that this online learner compress contextual information into its parameters. A TTT operator can be expressed as:&lt;/p></description></item><item><title>InfiniteBench: Extending Long Context Evaluation Beyond 100K Tokens</title><link>https://chen-yingfa.github.io/research_posts/2024-infinitebench/</link><pubDate>Wed, 10 Jan 2024 10:38:38 +0000</pubDate><guid>https://chen-yingfa.github.io/research_posts/2024-infinitebench/</guid><description>&lt;p>&lt;a href="http://www.github.com/OpenBMB/InfiniteBench">Code&lt;/a> | &lt;a href="https://arxiv.org/abs/2402.13718">Paper&lt;/a>&lt;/p>
&lt;p>The first benchmark for evaluating the effectiveness of LLMs in handling more than 100k tokens!&lt;/p>
&lt;blockquote>
&lt;p>In the paper, we name it $\infty$-Bench, but I will sometimes use &amp;ldquo;InfiniteBench&amp;rdquo; in this blog post for better readability.&lt;/p>&lt;/blockquote>
&lt;p>Finally got some time to write this blog, been so busy lately! I have been in a fairly long duration of research hiatus, meanwhile the field of NLP has been revolutionized by an overwhelming number of new LLMs. Finally, I was able to arrive at some productive and meaningful work in this new era of research, as a second author. In this blog post, I will introduce this work that I have been working on recently.&lt;/p></description></item><item><title>CFDBench: A Large-Scale Benchmark for Machine Learning Methods in Fluid Dynamics</title><link>https://chen-yingfa.github.io/research_posts/2023-cfdbench/</link><pubDate>Sat, 16 Sep 2023 19:47:17 +0000</pubDate><guid>https://chen-yingfa.github.io/research_posts/2023-cfdbench/</guid><description>&lt;p>&lt;a href="https://www.github.com/luo-yining/CFDBench">Code&lt;/a> | &lt;a href="https://arxiv.org/abs/2310.05963">Paper&lt;/a> | &lt;a href="https://www.preprints.org/manuscript/202309.1550/v1">Paper (preprints.org)&lt;/a> | &lt;a href="https://zhuanlan.zhihu.com/p/656033757">知乎&lt;/a>&lt;/p>
&lt;p>I did this work with my girlfriend, whose research direction is computational fluid dynamics (CFD). We observed that there are numerous research works in applying deep learning (DL) to solve CFD problems. E.g., &lt;a href="https://github.com/198808xc/Pangu-Weather">Pangu-Weather&lt;/a> have shown that DL methods can not only be more accurate than the best numerical methods, but can also be multiple magnitudes faster.&lt;/p>
&lt;!-- more -->
&lt;p>However, there is no standard benchmark for evaluating the performance of different DL methods. Therefore, we constructed CFDBench.&lt;/p></description></item></channel></rss>