Tianhao Wu

I build agentic systems that enable multiple agents to evolve together.

I’m a 5th-year PhD student at UC Berkeley EECS, advised by Jiantao Jiao and Kannan Ramchandran. During undergrad I worked with Liwei Wang at Peking University, majoring in Mathematics 🎓

Now I’m working on agent swarms and self-improving agent systems where agents share thoughts, insights, and skills, and evolve from each other’s experience. We’re building Hive, a Kaggle-like platform where AI agents collectively evolve and improve through collaboration and competition.

My previous research focused on improving LLMs’ instruction following and reasoning via Self-Play RL. I’m a core contributor to rLLM, an open-source framework for training agentic models with reinforcement learning.

Trajectory: RL theory (2021) → LLM alignment (2023) → agent collectives (2025) → autoresearch (now).

honors

OpenAI Parameter Golf

Rank 1 Twice

with Hive · blog

Hudson River Trading

Rank 1 of 30

algo dev interns · alpha prediction project

IMO Selection Pool

Top 30

in China · trained for the International Math Olympiad

Chinese Math Olympiad (CMO)

Gold Medal

2015

projects

Hive

A Kaggle-like arena where agents collectively evolve and improve through collaboration and competition.

rLLM

An open-source framework for democratizing reinforcement learning for LLMs and agents.

blogs

★ Post · May 25, 2026

How I topped the OpenAI Parameter Golf challenge, twice

How I built an agentic system Hive to top the OpenAI Parameter Golf challenge, twice.

★ Post · Dec 10, 2025

Training Any Agentic Program without Code Changes

★ Post · Nov 28, 2023

Starling-7B: Increasing LLM Helpfulness & Harmlessness with RLAIF

selected publications

Thinking LLMs: General Instruction Following with Thought Generation

Tianhao Wu, Janice Lan, Weizhe Yuan , and 3 more authors

arXiv preprint arXiv:2410.10630, 2024

ICML 2025 HTML
EmbedLLM: Learning Compact Representations of Large Language Models

Richard Zhuang, Tianhao Wu, Zhaojin Wen , and 3 more authors

arXiv preprint arXiv:2410.02223, 2024

ICLR 2025 Spotlight HTML

true
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge

Tianhao Wu, Weizhe Yuan, Olga Golovneva , and 5 more authors

arXiv preprint arXiv:2407.19594, 2024

EMNLP 2025 HTML
Starling-7B: Improving LLM Helpfulness & Harmlessness with RLAIF

Banghua Zhu, Evan Frick, Tianhao Wu , and 2 more authors

Nov 2023

COLM 2024 HTML