publications | Tianhao Wu

2024

Thinking LLMs: General Instruction Following with Thought Generation

Tianhao Wu, Janice Lan, Weizhe Yuan , and 3 more authors

arXiv preprint arXiv:2410.10630, 2024

ICML 2025 HTML
EmbedLLM: Learning Compact Representations of Large Language Models

Richard Zhuang, Tianhao Wu, Zhaojin Wen , and 3 more authors

arXiv preprint arXiv:2410.02223, 2024

ICLR 2025 Spotlight HTML

true
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge

Tianhao Wu, Weizhe Yuan, Olga Golovneva , and 5 more authors

arXiv preprint arXiv:2407.19594, 2024

EMNLP 2025 HTML
From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline

Tianle Li, Wei-Lin Chiang, Evan Frick , and 5 more authors

arXiv preprint arXiv:2406.11939, 2024

ICML 2025 HTML
RouteLLM: Learning to Route LLMs with Preference Data

Isaac Ong, Amjad Almahairi, Vincent Wu , and 5 more authors

arXiv preprint arXiv:2406.18665, 2024

ICLR 2025 HTML

2023

Starling-7B: Improving LLM Helpfulness & Harmlessness with RLAIF

Banghua Zhu, Evan Frick, Tianhao Wu , and 2 more authors

Nov 2023

COLM 2024 HTML
Statistical Inference on Multi-armed Bandits with Delayed Feedback

Lei Shi, Jingshen Wang, and Tianhao Wu

Nov 2023

Preprint
A Reduction-based Framework for Sequential Decision Making with Delayed Feedback

Yunchang Yang, Han Zhong, Tianhao Wu , and 3 more authors

arXiv preprint arXiv:2302.01477, Nov 2023

NeurIPS 2023
Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment

Tianhao Wu, Banghua Zhu, Ruoyu Zhang , and 3 more authors

Nov 2023

COLM 2024 HTML

2022

Nearly optimal policy optimization with stable at any time guarantee

Tianhao Wu, Yunchang Yang, Han Zhong , and 3 more authors

In International Conference on Machine Learning , Nov 2022

ICML 2022 HTML

2021

On reinforcement learning with adversarial corruption and its application to block mdp

Tianhao Wu, Yunchang Yang, Simon Du , and 1 more author

In International Conference on Machine Learning , Nov 2021

ICML 2021
A unified framework for conservative exploration

Yunchang Yang, Tianhao Wu, Han Zhong , and 5 more authors

arXiv preprint arXiv:2106.11692, Nov 2021

ICLR 2021

2020

Sanity-checking pruning methods: Random tickets can win the jackpot

Jingtong Su, Yihang Chen, Tianle Cai , and 4 more authors

Advances in neural information processing systems, Nov 2020

NeurIPS 2021