How Do You choose between training, fine-tuning, and using small models?

Channel:

LLMs Explained - Aggregate Intellect - AI.SCIENCE

Subscribers:

22,600

Published on March 19, 2024 10:52:01 AM ● Video Link: https://www.youtube.com/watch?v=B_DVpDUq1jo

Duration: 2:53

102 views

AF: You talked about a few different approaches, you talked about domain adaptation, combination of a few different models, DeepSpeed for training your own model. In what scenarios would I use each of these different ones?

AD: The topic of whether we combine small language model and large language model without doing fine tuning, since it's cheap and easy to do, as long as you're experienced with interfacing the output of the model that is going to be the input of the LLM, then it makes sense to do it for baseline. This is cheaper and faster but there is complexity of interfacing.

In any case that using the API doesn't make sense, you have to at least fine tune and then serve your model. You don't want your inference time to be less performant in terms of the round trip.

DeepSpeed for distributed training is great. If you can't have a cluster of eight H100 GPUs, then using DeepSpeed and 32 NVIDIA T4 or V100 GPUs, which are cheaper, which are more available, would enable you to do fine tuning, as well as serving. You just get more GPUs, you gain that distributed training capabilities. It reduces the training time as well. If your data set is too big.

If you just want to experiment, don't bother with DeepSpeed. Try PEFT, Q-LORA, quantization plus low rank adaptation. That helps you fit large models into one GPU. You gain memory efficiency and speed up and scale. Hugging Face has all the codes. Worst case, you have to make your own adapter, which is not too hard.

AF: If I want to summarize, it is less about where and more about at what a stage you would use them. Most probably, early in the project, you should be very focused on interfacing your LLM with other types of models because, training a BERT is probably way cheaper and easier, and you can repeat it a hundred times until you figure out the right way to do it. That's a very important freedom to have. There are complexities around interfacing, but that's completely worth it compared to the alternatives like DeepSpeed.

Once you hit the ceiling of argumenting the large language model using other models, you probably will look into fine tuning the large language model itself, let's say, if it needs to, spit out a bunch of thought processes before doing something else. In that case, PEFT and LORA are probably the right place to go.

Once you hit the ceiling of that, like you have a very interesting, niche and rare use case if that's happening to you, that's where you would probably consider to train the model from scratch.

Other Videos By LLMs Explained - Aggregate Intellect - AI.SCIENCE

2024-04-10	Large Language Models as a Building Blocks
2024-04-04	Competitive Advantage for Startups in era of LLMs
2024-04-02	Intersection Between LLMs and Products
2024-03-28	What is the right team composition in era of LLMs?
2024-03-28	Building an LLM Teacher-bot
2024-03-27	What is the relationship between LLMs and multi-modality?
2024-03-26	What are the system level considerations for using LLMs?
2024-03-22	What is the relationship between language and intelligence?
2024-03-21	How do you improve your RAG pipeline?
2024-03-20	Are long context LLMs the death of RAG?
2024-03-19	How Do You choose between training, fine-tuning, and using small models?
2024-03-15	Multi-agent LLMs Course #business #startup https://maven.com/forms/30a683
2024-03-15	LLM Evaluation, Validation, and Verification
2024-03-14	How Do You Validate LLM Systems Beyond Benchmarks?
2024-03-13	Can Sherpa (multi-agent llm) Handle Multi-modality?
2024-03-12	What Kind of Risks Are Specific to LLMs?
2024-03-08	LLMs, What Skills to Learn? and What a Time to be Alive!
2024-03-07	How do you Force an LLM to Keep Track of the Assumptions a Document Makes?
2024-03-06	How to Annotate Data for LLM Applications
2024-03-05	What is the Role of Data Quality and Diversity in LLM Systems?
2023-12-16	Testing Strategies for LLMs - SHERPA - Open Source Project Update, 2023-12-08

Tags:

deep learning

machine learning

Channel	Latest
Fandy DS	8 hours ago
W4knu Official	8 hours ago
DOOMFESTER	8 hours ago
Jokerd	8 hours ago
ありなみパイセン	8 hours ago
Tomokachi	8 hours ago
Evelone Rofls	9 hours ago
key.amra🌹	9 hours ago
Tekken 8 Re Plays	9 hours ago
Mr. WEN	9 hours ago
ransmo5	9 hours ago
상상상상	9 hours ago
Nando-Friki	9 hours ago
Enoch Hui 2 (鐵路丶巴士丶Switch & 迷你公仔迷)	9 hours ago
Petiru	9 hours ago
ANGGUN PAKSI (ANGGUN)	9 hours ago
Drunken Disciple	9 hours ago
Flame-Of-Justice	9 hours ago
Al Pachino vs 5	9 hours ago
Dividen 365	9 hours ago
judz style	9 hours ago
ASURA_REMIL	9 hours ago
Gamers Pettai	10 hours ago
철권엠아재(MBCtekken)	10 hours ago
Jam jest Jakub	10 hours ago