Hi there! I am an undergraduate from Shanghai Jiao Tong University (SJTU).
I am passionate about building AI agents that can learn from limited supervision and interact with the physical world.
I spent a wonderful summer at UCLA NLP, advised by Prof. Kai-Wei Chang.
Before that, I worked as an undergraduate research intern in SJTU, advised by Prof. Shuai Li, Prof. Haiming Jin, and Dr. Tong Yu.
I'm looking for a PhD position in 2025 Fall. Feel free to contact me if you are interested in my research!
[2024.09] 🎉 Our work CurrMask is accepted to NeurIPS 2024! See you in Vancouver! 🎉
Research Interests 🧋
I am interested in building generalized AI agents under limited supervision. Take my dream of developing a 🧋Boba-Agent that can make all kinds of boba as an example.
🤔To train a Boba-Agent, the most straightforward way is to collect a bunch of boba expert demos and train the agent via imitation learning. But data collection is too costly and the model can poorly generalize to making 🥭mango pomelo sago if the data is on 🍋lemon tea.
🤔Another choice is to allow the Boba-Agent learning from a large amount of trial-and-errors. The agent can keep exploring the kitchen, making different boba, receiving my reward, and improving its boba-making policy via reinforcement learning. Unfortunately, this process is friendly for neither me nor the kitchen.
πThe ideal case is that the agent can firstly search online and learn from offline video demos and textual instructions, extract reusable knowledge, and learn to safely navigate the kitchen, and quickly adapt to personal preferences. (For instance, learning that my highest praise is reserved for boba that's "not too sweet"!)
To summarize, my interests lie in how to enable effective internal world modeling to help the agent learn from large-scale boba-making offline data and extract reusable knowledge, as well as external world interaction, e.g., searching for new recipes online, exploring the kitchen, and getting feedback from me, so that it can quickly make delicious newly invented Boba under limited supervision.
Research Experiences
My research experiences mainly lie in the intersection of reinforcement learning, decision making and natural language processing. Specifically, I have been working on projects including:
Unsupervised Pre-training for Reinforcement Learning:
how to pretrain a unified model on unlabeled data that can quickly adapt to various downstream tasks.
Langugage Agent Self-improvement:
how to self-train a model iteratively improving itself by labeling and incorporating additional unlabeled data into the training process.
Conditional Generative Modeling for Decision Making:
how to solve decision-making through generative models when complete state information is inaccessible.
Publications
Learning Versatile Skills with Curriculum Masking Yao Tang*,
Zhihui Xie*,
Zichuan Lin,
Deheng Ye,
Shuai Li
NeurIPS 2024. [paper][code]
Q* Agent: Optimizing Language Agents with Q-Guided Exploration
Zongyu Lin*
Yao Tang*,
Da Yin*, Xingcheng Yao, Ziniu Hu, Yizhou Sun, Kai-Wei Chang
Under Review.
FutureDiffuser: Leveraging Future Priors for World Modeling in Offline Reinforcement Learning
Yao Tang,
Zhihui Xie,
Tong Yu,
Bokai Hu,
Shuai Li
Under Review.
Risk-Aware Constrained Reinforcement Learning with Non-Stationary Policies
Zhaoxing Yang,
Haiming Jin,
Yao Tang,
Guiyun Fan
AAMAS 2024. [paper]
Experience
University of California, Los Angeles
2024.07 ~ present
Los Angeles, CA, USA
Visiting Student
Advisor: Prof. Kai-Wei Chang
Adobe Research
2023.01 ~ 2024.05
Shanghai, China
Remote Collaboration
Advisor: Prof. Shuai Li and Dr. Tong Yu