Yao Tang 唐尧

Hi there! I'm a research intern at Microsoft Research Asia and am super fortunate to work with Dr. Li Dong. I received my bachelor's degree from Shanghai Jiao Tong University (SJTU). I am passionate about building AI agents that can learn from limited supervision and interact with the real world. I spent a wonderful summer in 2024 at UCLA NLP, advised by Prof. Kai-Wei Chang. I also worked as a research assistant with Prof. Shuai Li, Prof. Haiming Jin, and Dr. Tong Yu.

profile photo
News

[2024.09] 🎉 Our work CurrMask is accepted to NeurIPS 2024! See you in Vancouver! 🎉

[2025.02] QLASS is on arXiv!;

[2025.05] 🎉 QLASS is accepted to ICML 2025! See you in Vancouver again! 🎉

Research Interests 🧋

I am interested in building generalized AI agents under limited supervision. Take my dream of developing a 🧋Boba-Agent that can make all kinds of boba as an example.

🤔To train a Boba-Agent, the most straightforward way is to collect a bunch of boba expert demos and train the agent via imitation learning. But data collection is too costly and the model can poorly generalize to making 🥭mango pomelo sago if the data is on 🍋lemon tea.

🤔Another choice is to allow the Boba-Agent learning from a large amount of trial-and-errors. The agent can keep exploring the kitchen, making different boba, receiving my reward, and improving its boba-making policy via reinforcement learning. Unfortunately, this process is friendly for neither me nor the kitchen.

πŸ˜€The ideal case is that the agent can firstly search online and learn from offline video demos and textual instructions, extract reusable knowledge, and learn to safely navigate the kitchen, and quickly adapt to personal preferences. (For instance, learning that my highest praise is reserved for boba that's "not too sweet"!)

To summarize, my interests lie in how to enable effective internal world modeling to help the agent learn from large-scale boba-making offline data and extract reusable knowledge, as well as external world interaction, e.g., searching for new recipes online, exploring the kitchen, and getting feedback from me, so that it can quickly make delicious newly invented Boba under limited supervision.

Research Experiences

My research experiences mainly lie in the intersection of reinforcement learning, decision making and natural language processing. Specifically, I have been working on projects including:

  • Unsupervised Pre-training for Reinforcement Learning: how to pretrain a unified model on unlabeled data that can quickly adapt to various downstream tasks.
  • Langugage Agent Self-improvement: how to self-train a model iteratively improving itself by labeling and incorporating additional unlabeled data into the training process.
  • Conditional Generative Modeling for Decision Making: how to solve decision-making through generative models when complete state information is inaccessible.
  • Publications
    QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search
    Zongyu Lin*, Yao Tang*, Xingcheng Yao*, Da Yin*, Ziniu Hu, Yizhou Sun, Kai-Wei Chang
    ICML 2025.
    [paper]
    Learning Versatile Skills with Curriculum Masking
    Yao Tang*, Zhihui Xie*, Zichuan Lin, Deheng Ye, Shuai Li
    NeurIPS 2024.
    [paper] [code]
    FutureDiffuser: Leveraging Future Priors for World Modeling in Offline Reinforcement Learning
    Yao Tang, Zhihui Xie, Tong Yu, Bokai Hu, Shuai Li
    Under Review.
    Risk-Aware Constrained Reinforcement Learning with Non-Stationary Policies
    Zhaoxing Yang, Haiming Jin, Yao Tang, Guiyun Fan
    AAMAS 2024.
    [paper]
    Experience
    MSRA logo
    Microsoft Research Asia

    General Artificial Intelligence Group, Beijing, China
    Research Internship
    Advisor: Dr. Li Dong, Yaru Hao
    UCLA logo
    University of California, Los Angeles

    Los Angeles, CA, USA
    Visiting Student
    Advisor: Prof. Kai-Wei Chang
    Adobe logo
    Adobe Research

    Shanghai, China
    Remote Collaboration
    Advisor: Prof. Shuai Li and Dr. Tong Yu
    SJTU logo
    Shanghai Jiao Tong University

    Shanghai, China
    GPA till now: 3.83 (Top 20%)
    B.Eng. in Computer Science and Technology
    Honors
    Jing'e Overseas Study Funding (9 persons among university), 2023
    Shanghai Jiao Tong University Academic Progress Scholarship (5%), 2022
    Competition of Odyssy of the Mind, China, Second Prize (Top 5), 2021
    National Oil Corporation Scholarship (10%), 2021
    Misc.

    I enjoy Jazz Dance and am a beginner at guitar.

    I also love to play with my puppy Happy through Reinforcement Learning. Here is a video of our progress when she was 4 months old.