publications
2024
-  Thinking LLMs: General Instruction Following with Thought GenerationarXiv preprint arXiv:2410.10630, 2024
-  EmbedLLM: Learning Compact Representations of Large Language ModelsarXiv preprint arXiv:2410.02223, 2024
-  Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-JudgearXiv preprint arXiv:2407.19594, 2024
-  From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder PipelinearXiv preprint arXiv:2406.11939, 2024
-  
2023
-    
-  Statistical Inference on Multi-armed Bandits with Delayed FeedbackNov 2023
-  A Reduction-based Framework for Sequential Decision Making with Delayed FeedbackarXiv preprint arXiv:2302.01477, Nov 2023
-  
2022
-  Nearly optimal policy optimization with stable at any time guaranteeIn International Conference on Machine Learning , Nov 2022
2021
-  On reinforcement learning with adversarial corruption and its application to block mdpIn International Conference on Machine Learning , Nov 2021
-  A unified framework for conservative explorationarXiv preprint arXiv:2106.11692, Nov 2021
2020
-  Sanity-checking pruning methods: Random tickets can win the jackpotAdvances in neural information processing systems, Nov 2020