ORPO: Monolithic Preference Optimization without Reference Model (Paper Explained)

Channel:

Yannic Kilcher

Subscribers:

291,000

Published on May 1, 2024 3:03:14 PM ● Video Link: https://www.youtube.com/watch?v=52kMBrAI_IM

Duration: 33:25

24,468 views

657

Paper: https://arxiv.org/abs/2403.07691

Abstract:
While recent preference alignment algorithms for language models have demonstrated promising results, supervised fine-tuning (SFT) remains imperative for achieving successful convergence. In this paper, we study the crucial role of SFT within the context of preference alignment, emphasizing that a minor penalty for the disfavored generation style is sufficient for preference-aligned SFT. Building on this foundation, we introduce a straightforward and innovative reference model-free monolithic odds ratio preference optimization algorithm, ORPO, eliminating the necessity for an additional preference alignment phase. We demonstrate, both empirically and theoretically, that the odds ratio is a sensible choice for contrasting favored and disfavored styles during SFT across the diverse sizes from 125M to 7B. Specifically, fine-tuning Phi-2 (2.7B), Llama-2 (7B), and Mistral (7B) with ORPO on the UltraFeedback alone surpasses the performance of state-of-the-art language models with more than 7B and 13B parameters: achieving up to 12.20% on AlpacaEval2.0 (Figure 1), 66.19% on IFEval (instruction-level loose, Table 6), and 7.32 in MT-Bench (Figure 12). We release code and model checkpoints for Mistral-ORPO-α (7B) and Mistral-ORPO-β (7B).

Authors: Jiwoo Hong, Noah Lee, James Thorne

Links:
Homepage: https://ykilcher.com
Merch: https://ykilcher.com/merch
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://ykilcher.com/discord
LinkedIn: https://www.linkedin.com/in/ykilcher

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Other Videos By Yannic Kilcher

2024-12-10	Safety Alignment Should be Made More Than Just a Few Tokens Deep (Paper Explained)
2024-11-23	TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters (Paper Explained)
2024-10-19	GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
2024-10-12	Were RNNs All We Needed? (Paper Explained)
2024-10-05	Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (Paper)
2024-08-04	Privacy Backdoors: Stealing Data with Corrupted Pretrained Models (Paper Explained)
2024-07-08	Scalable MatMul-free Language Modeling (Paper Explained)
2024-06-26	Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools (Paper Explained)
2024-06-01	xLSTM: Extended Long Short-Term Memory
2024-05-21	[ML News] OpenAI is in hot waters (GPT-4o, Ilya Leaving, Scarlett Johansson legal action)
2024-05-01	ORPO: Monolithic Preference Optimization without Reference Model (Paper Explained)
2024-04-30	[ML News] Chips, Robots, and Models
2024-04-28	TransformerFAM: Feedback attention is working memory
2024-04-27	[ML News] Devin exposed \| NeurIPS track for high school students
2024-04-24	Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
2024-04-23	[ML News] Llama 3 changes the game
2024-04-17	Hugging Face got hacked
2024-04-15	[ML News] Microsoft to spend 100 BILLION DOLLARS on supercomputer (& more industry news)
2024-04-13	[ML News] Jamba, CMD-R+, and other new models (yes, I know this is like a week behind 🙃)
2024-04-08	Flow Matching for Generative Modeling (Paper Explained)
2024-04-06	Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping (Searchformer)

Tags:

deep learning

machine learning

arxiv

explained

neural networks

artificial intelligence

paper

Channel	Latest
GaplekBehemoth	6 hours ago
MisrraVB - League of Legends	6 hours ago
backyarD_D Play Records	6 hours ago
Comics, Toys & Travels	6 hours ago
Thiodar	6 hours ago
Jaeger Supreme	6 hours ago
Rebel Reindeer	7 hours ago
MrHeroGames	7 hours ago
StupidlyEPIC	7 hours ago
Cawiska	7 hours ago
Ibai	7 hours ago
NeoSuko	7 hours ago
SoosKratoS	7 hours ago
mixwell	7 hours ago
Levinci	7 hours ago
migiwaeden	7 hours ago
Vaush	7 hours ago
Wandi Tutorial	7 hours ago
Koiaku	7 hours ago
Paulo Damasio TV	8 hours ago
Stuck Smilin'	8 hours ago
THE KING OF OOMS	8 hours ago
CloverBells	8 hours ago
Math Terê	8 hours ago
Diário de Bordo	8 hours ago