We released Starling-7B, an open-source large language model leveraging RLAIF
I’m thrilled to share that I am a co-first author of Starling-7B (click to view our blog). This groundbreaking open-source large language model (LLM) is developed through Reinforcement Learning from AI Feedback (RLAIF). Our model leverages the innovative GPT-4 labeled ranking dataset, Nectar, along with our advanced training and policy tuning pipeline.
In a significant achievement, Starling-7B-alpha scores 8.09 in MT Bench with GPT-4 as a judge, outperforming every model to date on MT-Bench except for OpenAI’s GPT-4 and GPT-4 Turbo.