Hi there! I'm a research intern at Microsoft Research Asia and am super fortunate to work with Dr. Li Dong. I received my bachelor's degree from Shanghai Jiao Tong University (SJTU).
I am passionate about building AI agents that can learn from limited supervision and interact with the real world.
I spent a wonderful summer in 2024 at UCLA NLP, advised by Prof. Kai-Wei Chang.
I also worked as a research assistant with Prof. Shuai Li, Prof. Haiming Jin, and Dr. Tong Yu.
[2025.05] 🎉 QLASS is accepted to ICML 2025! See you in Vancouver again! 🎉
Research Interests 🧋
I am interested in building generalized AI agents under limited supervision. Take my dream of developing a 🧋Boba-Agent that can make all kinds of boba as an example.
🤔To train a Boba-Agent, the most straightforward way is to collect a bunch of boba expert demos and train the agent via imitation learning. But data collection is too costly and the model can poorly generalize to making 🥭mango pomelo sago if the data is on 🍋lemon tea.
🤔Another choice is to allow the Boba-Agent learning from a large amount of trial-and-errors. The agent can keep exploring the kitchen, making different boba, receiving my reward, and improving its boba-making policy via reinforcement learning. Unfortunately, this process is friendly for neither me nor the kitchen.
πThe ideal case is that the agent can firstly search online and learn from offline video demos and textual instructions, extract reusable knowledge, and learn to safely navigate the kitchen, and quickly adapt to personal preferences. (For instance, learning that my highest praise is reserved for boba that's "not too sweet"!)
To summarize, my interests lie in how to enable effective internal world modeling to help the agent learn from large-scale boba-making offline data and extract reusable knowledge, as well as external world interaction, e.g., searching for new recipes online, exploring the kitchen, and getting feedback from me, so that it can quickly make delicious newly invented Boba under limited supervision.
Research Experiences
My research experiences mainly lie in the intersection of reinforcement learning, decision making and natural language processing. Specifically, I have been working on projects including:
Unsupervised Pre-training for Reinforcement Learning:
how to pretrain a unified model on unlabeled data that can quickly adapt to various downstream tasks.
Langugage Agent Self-improvement:
how to self-train a model iteratively improving itself by labeling and incorporating additional unlabeled data into the training process.
Conditional Generative Modeling for Decision Making:
how to solve decision-making through generative models when complete state information is inaccessible.
Publications
QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search
Zongyu Lin*,
Yao Tang*,
Xingcheng Yao*,
Da Yin*, Ziniu Hu, Yizhou Sun, Kai-Wei Chang
ICML 2025. [paper]
Learning Versatile Skills with Curriculum Masking Yao Tang*,
Zhihui Xie*,
Zichuan Lin,
Deheng Ye,
Shuai Li
NeurIPS 2024. [paper][code]
FutureDiffuser: Leveraging Future Priors for World Modeling in Offline Reinforcement Learning
Yao Tang,
Zhihui Xie,
Tong Yu,
Bokai Hu,
Shuai Li
Under Review.
Risk-Aware Constrained Reinforcement Learning with Non-Stationary Policies
Zhaoxing Yang,
Haiming Jin,
Yao Tang,
Guiyun Fan
AAMAS 2024. [paper]
Experience
Microsoft Research Asia
2025.03 ~ Present
General Artificial Intelligence Group, Beijing, China
Research Internship
Advisor: Dr. Li Dong, Yaru Hao
University of California, Los Angeles
2024.07 ~ 2024.10
Los Angeles, CA, USA
Visiting Student
Advisor: Prof. Kai-Wei Chang
Adobe Research
2023.01 ~ 2024.05
Shanghai, China
Remote Collaboration
Advisor: Prof. Shuai Li and Dr. Tong Yu