publications by categories in reversed chronological order. generated by jekyll-scholar.


  1. starling.png
    Starling-7B: Improving LLM Helpfulness & Harmlessness with RLAIF
    Banghua Zhu, Evan Frick, Tianhao Wu, Hanlin Zhu, and 1 more author
    Nov 2023
  2. Statistical Inference on Multi-armed Bandits with Delayed Feedback
    Lei Shi, Jingshen Wang, and Tianhao Wu
    Nov 2023
  3. A Reduction-based Framework for Sequential Decision Making with Delayed Feedback
    Yunchang Yang, Han Zhong, Tianhao Wu, Bin Liu, and 2 more authors
    arXiv preprint arXiv:2302.01477, Nov 2023
  4. wave-mechanics.gif
    Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment
    Tianhao Wu, Banghua Zhu, Ruoyu Zhang, Zhaojin Wen, and 2 more authors
    Nov 2023


  1. brownian-motion.gif
    Nearly optimal policy optimization with stable at any time guarantee
    Tianhao Wu, Yunchang Yang, Han Zhong, Liwei Wang, and 2 more authors
    In International Conference on Machine Learning, Nov 2022


  1. On reinforcement learning with adversarial corruption and its application to block mdp
    Tianhao Wu, Yunchang Yang, Simon Du, and Liwei Wang
    In International Conference on Machine Learning, Nov 2021
  2. A unified framework for conservative exploration
    Yunchang Yang, Tianhao Wu, Han Zhong, Evrard Garcelon, and 4 more authors
    arXiv preprint arXiv:2106.11692, Nov 2021


  1. Sanity-checking pruning methods: Random tickets can win the jackpot
    Jingtong Su, Yihang Chen, Tianle Cai, Tianhao Wu, and 3 more authors
    Advances in neural information processing systems, Nov 2020