Yao Tang

Yao Tang 唐尧

Hi there! I am a first-year CIS PhD student at University of Pennsylvania, advised by Prof. Jiatao Gu. I am broadly interested in agentic reasoning and reinforcement learning.

I received my bachelor's degree from Shanghai Jiao Tong University (SJTU). I had the privilege of working at Microsoft Research Asia in 2025, mentored by Dr. Li Dong. In 2024, I spent a wonderful summer at UCLA NLP, advised by Prof. Kai-Wei Chang, where I also learned a lot from working with Johnson Lin. During my undergrad, I was fortunate to work with Zhihui Xie, Prof. Shuai Li, Prof. Haiming Jin, and Dr. Tong Yu.

Email / Google scholar / Twitter

News

[2025.08] Start my Ph.D. journey at Penn! 🎉

[2025.05] 🎉 QLASS is accepted to ICML 2025! 🎉

[2025.03] I graduated from SJTU and started my research internship at MSRA!

[2025.02] Our work QLASS is on arXiv!

[2024.09] 🎉 Our work CurrMask is accepted to NeurIPS 2024! See you in Vancouver! 🎉

Research Interests 🧋

I am interested in building generalized AI agents under limited supervision. Take developing a 🧋Boba-Agent that can make all kinds of boba as an example.

🤔To train a Boba-Agent, the most straightforward way is to collect a bunch of boba expert demos and train the agent via imitation learning. But data collection is too costly and the model can poorly generalize to making 🥭mango pomelo sago if the data is on 🍋lemon tea.

🤔Another choice is to allow the Boba-Agent learning from a large amount of trial-and-errors. The agent can keep exploring the kitchen, making different boba, receiving my reward, and improving its boba-making policy via reinforcement learning. Unfortunately, this process is friendly for neither me nor the kitchen.

😀The ideal case is that the agent can firstly search online and learn from offline video demos and textual instructions, extract reusable knowledge, and learn to safely navigate the kitchen, and quickly adapt to personal preferences. (For instance, learning that my highest praise is reserved for boba that's "not too sweet"!)

To summarize, my interests lie in how to enable effective internal world modeling to help the agent learn from large-scale boba-making offline data and extract reusable knowledge, as well as external world interaction, e.g., searching for new recipes online, exploring the kitchen, and getting feedback from me, so that it can quickly make delicious newly invented Boba under limited supervision.

Research Experiences

My research experiences mainly lie in the intersection of reinforcement learning and large language model. Specifically, I have been working on projects including:

Unsupervised Pre-training for Reinforcement Learning: how to pretrain a unified model on unlabeled data that can quickly adapt to various downstream tasks.

Langugage Agent Self-improvement: how to self-train a model iteratively improving itself by labeling and incorporating additional unlabeled data into the training process.

Conditional Generative Modeling for Decision Making: how to solve decision-making through generative models when complete state information is inaccessible.

Selected Publications

	QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search Zongyu Lin, Yao Tang, Xingcheng Yao, Da Yin, Ziniu Hu, Yizhou Sun, Kai-Wei Chang ICML 2025. [paper] [code]
	Learning Versatile Skills with Curriculum Masking Yao Tang, Zhihui Xie, Zichuan Lin, Deheng Ye, Shuai Li NeurIPS 2024. [paper] [code]
	FutureDiffuser: Leveraging Future Priors for World Modeling in Offline Reinforcement Learning Yao Tang, Zhihui Xie, Tong Yu, Bokai Hu, Shuai Li

Experiences

	Microsoft Research Asia 2025.03 ~ 2025.08 General Artificial Intelligence Group, Beijing, China Research Internship Advisor: Dr. Li Dong, Yaru Hao
	University of California, Los Angeles 2024.07 ~ 2024.10 Los Angeles, CA, USA Visiting Student Advisor: Prof. Kai-Wei Chang
	Adobe Research 2023.01 ~ 2024.05 Shanghai, China Remote Collaboration Advisor: Prof. Shuai Li, Dr. Tong Yu
	Shanghai Jiao Tong University 2020.09 ~ 2025.03 Shanghai, China GPA: 3.8 B.Eng. in Computer Science and Technology

Honors

Jing'e Overseas Study Funding (9 persons among university), 2023

Shanghai Jiao Tong University Academic Progress Scholarship (5%), 2022

Competition of Odyssy of the Mind, China, Second Prize (Top 5), 2021

National Oil Corporation Scholarship (10%), 2021

Misc.

I enjoy jazz dance and am a beginner at guitar.

I am a fan of David Tao.

My family dog Happy loves to play with humans through reinforcement learning. Here is a video when she was 3 months old.