We released Starling-7B, an open-source large language model leveraging RLAIF

Starling-LM-7B (generated by DALL·E 3)

I’m thrilled to share that I am a co-first author of Starling-7B (click to view our blog). This groundbreaking open-source large language model (LLM) is developed through Reinforcement Learning from AI Feedback (RLAIF). Our model leverages the innovative GPT-4 labeled ranking dataset, Nectar, along with our advanced training and policy tuning pipeline.

In a significant achievement, Starling-7B-alpha scores 8.09 in MT Bench with GPT-4 as a judge, outperforming every model to date on MT-Bench except for OpenAI’s GPT-4 and GPT-4 Turbo.