Research

Generalizing DeltaProduct

In DeltaProduct (Siems et al., 2025), they propose to improve DeltaNet (Yang et al., 2025) by updating the online memory with $n_h$ KVs for each token, which can be seen as performing multiple steps of gradient descent per token. I will explain how this method is almost the same as multi-KV DeltaNet and reveal a potential flaw in the design of DeltaProduct. Introduction DeltaNet We use row-vector notation. ...

Implementating Test-Time Training - Part 1

This blog post is part 1 of a series that describes my attempt in implementing the Test-Time Training (TTT) model proposed by Sun et al. (2024), and Titans, proposed by Behrouz et al., (2024). At the time of writing, these two are two strong recurrent language models, but they have not yet open-sourced their implementation (TTT has only open-sourced the Jax implementation). Introduction to Test-Time Training Briefly explained, Test-Time Training (TTT) is an RNN model whose hidden state is replaced with an online learner, whose parameters are updated updated through gradient descent during inference. The goal is that this online learner compress contextual information into its parameters. A TTT operator can be expressed as: ...

Implementating Test-Time Training - Part 1

2024 国庆之后

刚刚放完 🇨🇳 国庆假，从东莞回来了北京继续我的博士生涯。这几个月感觉事情特别多，虽然很充实，但也很累，刚好这个七天长假（实际只有五天）可以让我喘口气。很久没有写博客了，上一次关于我自己的博客内容好像就是去年国庆之后的。刚好过一年，也可以当作一个年度总结吧。今年最主要的几件事情如下。从硕士变成了博士。女朋友毕业后，去了东莞工作，并且跟她一起建设了一个小家，也把娃基本都带过去了（现在我宿舍只剩下大白和两个猪）。女朋友跟我一起回挪威见我的家长了，也是她第一次出国。家人来了中国参加我的毕业典礼，顺便和女朋友家人见了面，定了亲。当上了实验室里的研究小组组长，参与了公司的运行，很有打工人的感觉。认识了很多做科研的人，对科研的认知进步了超级多，也看了超级多论文，找到了自己喜欢的小领域，感觉得心应手，idea 也超级多。当上了 NLP 课的助教，是一个很有意思的体验。科研篇去年暑假被一位学姐拉到导师公司坐着，因为环境好又有钱，然后后面就顺理成章跟她一起做了科研项目，进入了新的小组（刚好一直带我的学长也快要毕业了）。后面，好像是五月份左右，这边的小组组长因为要出去实习，让我当上了组长，感觉非常不一样，一开始压力还挺大的，也觉得自己能力不够，德不配位。但是其实还行，大家也都是为了做科研而已，就是多了很多跟别的组拉扯的情况。同时，来到这边之后找到了自己的新方向了：RNN 和长文本。特别喜欢这种，有点小众，同时还影响力挺大的研究方向，就是一开始看论文有点吃力，毕竟很多基础理论跟现在火热的 Transformer 有比较大的出入，研究难点也很不一样。但是这样才好，同行少一点，看论文的压力也少一点（顺便吐槽一下，现在论文真的太多了，每次放完假都觉得错过了无数篇论文！）。另外，这段时间也把楚简论文投了 ARR，评分不是很好最近就改投 COLING 了。同时也挂了 arXiv，但是这种工作感觉影响力就不是很大，虽然也是首个相关数据集，肯定能拿到一些引用的。感觉我数据集的工作还挺多的，哈哈哈哈哈。同时这段时间还结束了之前一直做的工作，比如知识编辑的工作中了 COLING。我作为二作的 $\infty$-Bench 和双工交互模型也结束了，分别中了 ACL 和 EMNLP，都挺不错的，引用也很不错，抱上大腿了哈哈哈哈。博士开始这个暑假后我从硕士变成了博士了，名义上是普博，但是感觉在我实验室的人眼中我就是直博的。把中文和古文字相关的工作都放到硕士论文里面了，我的博士论文就是 long-context 和 continual learning 了。感觉也挺好的，喜欢这种环境的变化，感觉可以让我有点重获新生、保持新鲜感的感觉。同时还活的了新的宿舍，22 号楼，室友还是原来的。新装修的，环境不错，但是洗澡的地方有点恶心。前几天 10 月 2 日投了 ICLR，Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling，是我感觉比较满意的一个研究工作，做了也很久，感觉影响力应该会不错。然后今天凌晨两点把它放到 arXiv，争取一下靠前一点的位置。后面会单独写一篇博客整理和介绍。但是这篇工作只是开胃菜，是一个关于模型记忆能力的探索和一些崩溃现象的分析，后面还是得做实际的模型改动来提高模型性能，这才是我向往代表性工作，但是还是挺难的，虽然说 idea 很多，但是机器学习的研究就是一个反复试错的过程，大部分结果还是会跟猜想有很大的出入的。老师想要我训一个很强的 Mamba 版 MiniCPM，但是我觉得不做结构上的改动的这种训练没有什么科学贡献，个人还是希望做科学贡献，哈哈哈哈哈。生活篇之前最后一年跟 00 在学校每天都会见面，玩耍。国庆结束后没多久我们就 10 月 27 日到 31 日一起去了东莞参观公司，感觉环境很不错，就是东莞这个城市很破旧，人均素质也挺低。没办法。29 日去了深圳玩，见了已经工作了的于泽华和 00 的堂姐。11 月 17 日，跟 00 去了孝感市的安陆市参加她高中同学，金洁，的婚礼，好羡慕人家可以这么早结婚。但是习俗确实好麻烦……后面 00 找了个实习，是【比特大陆】，在丰台区，中关村壹号对面，离我们实验室相关公司的【启元实验室】挺接近的。有时候我也会去启元上班，然后就可以一起下班了。 ...

(EREN) Robust and Scalable Model Editing for Large Language Models

GitHub | Paper (upcoming) TL;DR: A reader is augmented with a growing notebook that caches all edits in natural texts, and the reader retrieves relevant edits and make inference based on them. This achieves SOTA in model editing in QA and fact-checking. NB: The COLING template in 2024 was very ugly. Introduction This work introduces a model editing method that addresses two issues with existing model editors: ...

InfiniteBench: Extending Long Context Evaluation Beyond 100K Tokens

Code | Paper The first benchmark for evaluating the effectiveness of LLMs in handling more than 100k tokens! In the paper, we name it $\infty$-Bench, but I will sometimes use “InfiniteBench” in this blog post for better readability. Finally got some time to write this blog, been so busy lately! I have been in a fairly long duration of research hiatus, meanwhile the field of NLP has been revolutionized by an overwhelming number of new LLMs. Finally, I was able to arrive at some productive and meaningful work in this new era of research, as a second author. In this blog post, I will introduce this work that I have been working on recently. ...

Safety and Ethical Concerns of Large Language Models

I will be holding a seminar at ModelBest (面壁智能) in Sep 20, 2023 in Beijing, Haidian, 科技园. The seminar will be in Chinese, and it’s called “大模型安全与伦理问题” (translation: Safety and Ethical Concerns of Large Language Models). Below is a list of references. Introduction Galactica: A Large Language Model for Science https://openai.com/research/gpt-4 SafetyBench: Evaluating the Safety of Large Language Models with Multiple Choice Questions Bias and Fairness in Large Language Models: A Survey A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation Evaluation Methods A General Language Assistant as a Laboratory for Alignment, Anthropic Safety Assessment of Chinese Large Language Models Semantics derived automatically from language corpora contain human-like biases StereoSet: Measuring stereotypical bias in pretrained language models Instruction Attacks Toxicity in CHATGPT: Analyzing Persona-assigned Language Models ⭐️ Large Language Models are Zero-Shot Reasoners ⭐️ On Second Thought, Let’s Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning ⭐️ Prompting GPT-3 To Be Reliable Universal and Transferable Adversarial Attacks on Aligned Language Models ⭐️ Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment ⭐️⭐️ Exaggerated Safety XSTEST: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models ⭐️ Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large Language Models that Follow Instructions ⭐️ Alignment Methods Aligning language models to follow instructions ⭐️ Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback ⭐️ SELF-INSTRUCT: Aligning Language Models with Self-Generated Instructions ⭐️⭐️ Pretraining Language Models with Human Preferences ⭐️ LIMA: Less Is More for Alignment https://openai.com/blog/our-approach-to-alignment-research (Aug 2022) https://openai.com/blog/our-approach-to-alignment-research (Jul 2023) ⭐️ ⭐️: important ...

更新个人主页

之前有过个人主页，但是一直没有弄好，更没有更新。最近我将自己的 GitHub 的用户名改了，导致之前的 GitHub Pages 失效了，就趁机重新搭建个人主页。兜兜转转，还是决定使用 Hexo。以前用过 Jekyll，觉得还行，但是真的不想用 Ruby，Hugo 又太麻烦。选了好久主题，Hexo 宣传说有很多主题，但是官网上不到 400 个主题，而且大部分都不符合我的审美或者要求。我想要的风格是简约，现代，需要同时支持黑暗和白亮模式，需要有代码高亮且是代码是等款字体。最接近我的要求就是Maple主题。可是仍然无法满足我的要求，所以我修改了一些格式（原版甚至有一些颜色 bug），添加了自己的一些内容，结果是一个叫做枫叶的主题。日记今天早上七点半起来 🛏，打电话 📱 叫醒00（终于有一次是我打电话了哈哈哈哈），然后去核研院俱乐部在综体打羽毛球 🏸，后来发现他们其实约了西体，但是我跟00自己在蹭一个空场就不管了，八点半左右有人来了我们就去过早，然后去我宿舍 🏡。之后点了库迪，然后去了学校南边的一个超市，买了一大包薯片和一个榴莲！然后就在宿舍没有吃午饭，直接待到晚饭。中午的时候还拍了视频 📷，中间还差点说到00emo了，哈哈哈哈。今天 00 下午四点和晚上七点都有直播课 👩🏻‍🏫，都是真正开课，下午的在我宿舍开的，好像很成功，虽然拖堂了一点点。晚上的在她自己宿舍，貌似也拖堂了，00 说有好多人。晚上九点去打羽毛球了 🏸，带上相机录了打球的视频，然后回去洗澡，晚上去林大北路的家 🏡。最近最近好忙，新学期马上就要开始了，这里总结一下暑假开始到此比较重要的事情吧。这个暑假搬出校又搬回来了，折腾了又费钱 💰，学校真的好恶心，之前说了大概率是不会有宿舍，现在就有很多空的房间。期末前跟导师确定了要读博了，我跟他我想要三年毕业，他说没有问题，希望真的是可以吧，我们实验室好像基本都是直博生，普博的应该都是四年吧。00 也确定了不会读博了，最近在投简历，Oppo 好像已经拿到了 offer，但是他们北京没有部门，所以 00 不想去，我也不想她去。好像互联网以外很多公司都不在北京…… 我的论文 📃 EREN（本来想叫做 EmoRen 的 😂）投出去了，上周 rebuttal 结果出来了，不是很理想，本来 soundess 是 433，Excitement 323，rebuttal 结束后第一个审稿人将 soundness 调低了。学长说主会议估计没有机会了，Findings 还有希望，我其实无所谓是不是 Findings，感觉学长反而有点介意。被实验室的学长学姐拉去面壁智能1去干活，跟公司的业务没啥关系，就是把我的工位搬了，可能不想占用隔壁实验室的位置吧 😂 但是我真的不想去 😭 不能跟 00 待在一起了。我现在就是一周可能去两三天 😂 ...

CFDBench: A Comprehensive Benchmark for Machine Learning Methods in Fluid Dynamics

Code | Paper (on hold by ArXiv) | Paper (preprints.org) | 知乎 I did this work with my girlfriend, whose research direction is computational fluid dynamics (CFD). We observed that there are numerous research works in applying deep learning (DL) to solve CFD problems. E.g., Pangu-Weather have shown that DL methods can not only be more accurate than the best numerical methods, but can also be multiple magnitudes faster. However, there is no standard benchmark for evaluating the performance of different DL methods. Therefore, we constructed CFDBench. ...

第一个帖子，瞎写点东西

现在是 2023 年五月十七，马上硕士一年级就结束，在清华园已经快五年了，感觉对我人生的影响真的巨大。这一年认识了很可爱的 00，希望可以一直走下去。我和 00 的孩子们：卧龙：调皮的肥猫 🐱 小绿：喜欢咬东西的鳄鱼 🐊 骆雁：超级大的土鸡！🐰 凤雏：不调皮的猫咪 🐱 黄帝：更大的巨兔 🐰 内存条：白色的熊熊 🐻 闪光灯：灰色的熊熊 🐻 现在要做的事情把 EmoRen 投了能不能行啊跑 CFD 的丹炉调好好难呀写完作业 NLP和DL的大作业！搞定去ACL的手续去加拿大，然后回挪威一两周，然后回来跟 00 去南京，我不用签证，但是还是有很多手续。写好开题报告还不知道做啥呢我的家乡 Lillesand 好久没有回去了，上一次回挪威也没有回去