Train Your Own LLM – Tutorial

Subscribers:
10,700,000
Published on ● Video Link: https://www.youtube.com/watch?v=9Ge0sMm65jo



Duration: 0:00
83,375 views
3,897


This course is designed to help beginners learn how to train a language model from start to finish. Imad will guide you through the whole process, using Moroccan Darija as an example.

In this course, you will learn:

How to load text data
How to train a tokenizer from scratch using the Byte Pair Encoding (BPE) method
How to use the tokenizer to encode text data
How the Transformer architecture works in language models
How to pre-train a model
How to create a supervised fine-tuning dataset
How to fine-tune the model and build an AI assistant that you can chat with

You can find the slides, notebook, and scripts in this GitHub repository:
https://github.com/ImadSaddik/Train_Your_Language_Model_Course

The supervised fine-tuning dataset is available here:
https://github.com/ImadSaddik/BoDmaghDataset
https://huggingface.co/datasets/ImadSaddik/BoDmaghDataset

The tokenizers trained on AtlaSet can be found here:
https://github.com/ImadSaddik/DarijaTokenizers

You can access the AtlaSet on HuggingFace here:
https://huggingface.co/datasets/atlasia/Atlaset

To connect with Imad Saddik, check out his social accounts:
LinkedIn: https://www.linkedin.com/in/imadsaddik/
YouTube:    / @3codecampers  
Discord: imad_saddik

❤ ️ Support for this channel comes from our friends at Scrimba – the coding platform that's reinvented interactive learninghttps://scrimba.com/freecodecampmp

⭐ ️ Course Contents ⭐ ️
(0:00:00) About the Course
(0:03:03) Introduction
(0:07:24) Training Data
(0:15:33) Tokenization
(0:29:00) The Transformer Architecture
(0:52:21) Pre-training
(1:24:46) Fine-tuning Dataset
(1:33:05) Instruction Fine-tuning
(2:06:17) Fine-tuning with LoRA
(2:20:39) Let's Scale Everything
(3:09:40) Bonus
(3:27:10) Conclusion

🎉 Thanks to our Champion and Sponsor supporters:
👾 Drake Milly
👾 Ulises Moralez
👾 Goddard Tan
👾 David MG
👾 Matthew Springman
👾 Claudio
👾 Oscar R.
👾 jedi-or-sith
👾 Nattira Maneerat
👾 Justin Hual

--

Learn to code for free and get a developerhttps://www.freecodecamp.org/mp.org

Read hundreds of articles on programhttps://freecodecamp.org/newsg/news




Other Videos By freeCodeCamp.org


2025-04-22Essential Machine Learning and AI Concepts Animated
2025-04-21From fast food worker to cybersecurity engineer with Tae'lur Alexis [Podcast #169]
2025-04-17Learn Laravel by Building a Medium Clone – Tutorial
2025-04-16Data Engineering with Python and AI/LLMs – Data Loading Tutorial
2025-04-15From Accountant to Data Engineer with Alyson La [Podcast #168]
2025-04-10Train Your Own LLM – Tutorial
2025-04-09Lynx Tutorial – JS Framework for Cross Platform Development
2025-04-08C++ Setup and Installation Tools – CMake, vcpkg, Docker & Copilot
2025-04-04From drop-out to software architect with Jason Lengstorf [Podcast #167]
2025-04-02Full Stack Instagram Clone with Laravel and MongoDB – Tutorial
2025-04-01Code DeepSeek V3 From Scratch in Python - Full Course
2025-03-28From broke musician to working dev. How college drop-out Ryan Furrer learned to code [Podcast #166]
2025-03-27Excel Formulas & Functions You Should Know [Full Course]
2025-03-25Microservices in Nest.js – JavaScript Tutorial
2025-03-21From hating coding to programming satellites at age 37 – Francesco Ciulla interview [Podcast #165]
2025-03-19Learn ANY Language with AI (Learn English, Learn Spanish, Learn Mandarin Chinese, and more)
2025-03-18Build a Full Stack AI Note Taking App with Next.js and Supabase – Tutorial
2025-03-14How to become a self-taught developer while supporting a family [Podcast #164]
2025-03-13AWS Cognito Course – Authentication and Authorization
2025-03-12JavaScript Essentials Course
2025-03-11DeepSeek R1 Theory Tutorial – Architecture, GRPO, KL Divergence