Yao Tang 唐尧

Hi there! I am a first-year CIS PhD student at University of Pennsylvania, advised by Prof. Jiatao Gu. I received my bachelor's degree from Shanghai Jiao Tong University (SJTU). I have worked at Microsoft Research Asia advised by Dr. Li Dong since March 2025. I spent a wonderful summer in 2024 at UCLA NLP, advised by Prof. Kai-Wei Chang. I also learned a lot from working with Prof. Shuai Li, Prof. Haiming Jin, and Dr. Tong Yu.

profile photo
News

[2025.08] Start my Ph.D. journey at Penn! 🎉

[2025.05] 🎉 QLASS is accepted to ICML 2025! 🎉

[2025.03] I graduated from SJTU and started my research internship at MSRA!

[2025.02] Our work QLASS is on arXiv!

[2024.09] 🎉 Our work CurrMask is accepted to NeurIPS 2024! See you in Vancouver! 🎉

Research Interests 🧋

I am interested in building generalized AI agents under limited supervision. Take developing a 🧋Boba-Agent that can make all kinds of boba as an example.

🤔To train a Boba-Agent, the most straightforward way is to collect a bunch of boba expert demos and train the agent via imitation learning. But data collection is too costly and the model can poorly generalize to making 🥭mango pomelo sago if the data is on 🍋lemon tea.

🤔Another choice is to allow the Boba-Agent learning from a large amount of trial-and-errors. The agent can keep exploring the kitchen, making different boba, receiving my reward, and improving its boba-making policy via reinforcement learning. Unfortunately, this process is friendly for neither me nor the kitchen.

πŸ˜€The ideal case is that the agent can firstly search online and learn from offline video demos and textual instructions, extract reusable knowledge, and learn to safely navigate the kitchen, and quickly adapt to personal preferences. (For instance, learning that my highest praise is reserved for boba that's "not too sweet"!)

To summarize, my interests lie in how to enable effective internal world modeling to help the agent learn from large-scale boba-making offline data and extract reusable knowledge, as well as external world interaction, e.g., searching for new recipes online, exploring the kitchen, and getting feedback from me, so that it can quickly make delicious newly invented Boba under limited supervision.

Research Experiences

My research experiences mainly lie in the intersection of reinforcement learning and large language model. Specifically, I have been working on projects including:

  • Unsupervised Pre-training for Reinforcement Learning: how to pretrain a unified model on unlabeled data that can quickly adapt to various downstream tasks.
  • Langugage Agent Self-improvement: how to self-train a model iteratively improving itself by labeling and incorporating additional unlabeled data into the training process.
  • Conditional Generative Modeling for Decision Making: how to solve decision-making through generative models when complete state information is inaccessible.
  • Selected Publications
    QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search
    Zongyu Lin*, Yao Tang*, Xingcheng Yao*, Da Yin*, Ziniu Hu, Yizhou Sun, Kai-Wei Chang
    ICML 2025.
    [paper] [code]
    Learning Versatile Skills with Curriculum Masking
    Yao Tang*, Zhihui Xie*, Zichuan Lin, Deheng Ye, Shuai Li
    NeurIPS 2024.
    [paper] [code]
    FutureDiffuser: Leveraging Future Priors for World Modeling in Offline Reinforcement Learning
    Yao Tang, Zhihui Xie, Tong Yu, Bokai Hu, Shuai Li
    Experiences
    MSRA logo
    Microsoft Research Asia

    General Artificial Intelligence Group, Beijing, China
    Research Internship
    Advisor: Dr. Li Dong, Yaru Hao
    UCLA logo
    University of California, Los Angeles

    Los Angeles, CA, USA
    Visiting Student
    Advisor: Prof. Kai-Wei Chang
    Adobe logo
    Adobe Research

    Shanghai, China
    Remote Collaboration
    Advisor: Prof. Shuai Li, Dr. Tong Yu
    SJTU logo
    Shanghai Jiao Tong University

    Shanghai, China
    GPA: 3.8
    B.Eng. in Computer Science and Technology
    Honors
    Jing'e Overseas Study Funding (9 persons among university), 2023
    Shanghai Jiao Tong University Academic Progress Scholarship (5%), 2022
    Competition of Odyssy of the Mind, China, Second Prize (Top 5), 2021
    National Oil Corporation Scholarship (10%), 2021
    Misc.

    I enjoy jazz dance and am a beginner at guitar.

    I am a fan of David Tao.

    My family dog Happy loves to play with humans through reinforcement learning. Here is a video when she was 3 months old.