How do you improve your RAG pipeline?

Channel:

LLMs Explained - Aggregate Intellect - AI.SCIENCE

Subscribers:

22,600

Published on March 21, 2024 12:00:28 PM ● Video Link: https://www.youtube.com/watch?v=MxTaJskT84E

Duration: 7:13

111 views

AF: Practically, if you have a bunch of modules that are not deterministic in how to interact with each other, it can get out of hand and chaotic very quickly.

IY: I think eventually the word modular RAG or RAG itself as a term, might change later on. What you're designing here is not just a model anymore. What you're trying to do here is not just a specific workflow anymore. What you're trying to do here is an actual system.

Each of the modules, you can see them as a primitive, vector database itself can become a primitive, you can have multiple primitives created. There are so many possible primitives, and some of them may be not generative.

There are many ways of designing these primitives, and I see two paradigms. Currently, a lot of people are just creating RAG within their stand-alone service. But if you start to work with other teams, like application developers and DevOps, you might want to start thinking about larger picture in terms of your primitives. Sometimes, as a scientist or engineers, you can expose these different primitives for application developers and it's very interesting when you start seeing how other people design those orchestration.

AF: Nikhil spoke about post processing of retrieved results in an advanced RAG setup. He spoke about, re-ranking that you can do, summarization that you can do. I'm seeing scenarios where the retrieved chunks have so much content, or they're very sparse, let's say there are a bunch of tables, to the point that the model gets confused when it is generating the end results. There's so much or so little in a weird structure. So, summarization is not useful. Reranking is not useful. What else can be used to process the information after retrieval if that type of confusion is the problem that the RAG system has?

SP: That's a scenario that is quite often seen by people. It depends on what kind of queries are being issued. In RAG 101 scenarios, maybe those are FAQs where the retrieval results have the direct responses to the question, and that's all okay. But as you go up the complexity ladder, the queries become more ambiguous and also the retrieved context become more ambiguous.

I call this the "refine" stage of the RAG pipeline: you have the retrieved results and you refine them. "Refining" can include many things, including summarization or including creating notes. Summarization actually can help if the summarization is targeted, if it's not like, "hey, just generate a summary out of these retrieval results", but instead having some more specific instructions for the summarization. In the end, you would want the result to convey certain information, like whether the answer to that question is directly included in the retrieval results or not, or whether it just contains some additional context that can be used by the LLM to generate the answers, or whether it does not contain anything at all.

I think there needs to be an intermediate step where you have that spelled out using a smaller LLM and then you generate that refined text and that goes to your larger LLM.

AF: Quickly with the last question, I have a RAG system, it doesn't work. How do I go about figuring out what's wrong? What are the pitfalls that I might face and what's the secret sauce of dealing with this problem?

NV: It's the million dollar question!

The key is evaluation. Whenever you have some version of RAG setup, you need to know what works and what doesn't work. If you have established that, then you have a clear, transparent view into what is working, not working. Then you can easily figure out, okay, do I need to make a component change? Do I need to use a different LLM, make some prompt changes, or do I need to fundamentally change a pattern of operation that happens within the pipeline? That's really the first decision you need to make.

The trick actually is to be disciplined and not make changes haphazardly, because when you do that, you have no idea what works. There are so many different moving pieces. You can keep building on all of these versions of RAG over time. You can always fall back to one of them later if you feel that there is something new I can do. The secret sauce is evaluation.

The pitfall is you might not be able to find the right pattern. There are fundamentally two different patterns that you could use. Both of them could be good. Or not. So which one do I invest in? At some point you need to take a leap of faith and see, okay, there are infinite patterns that I could use, but this is the one that I can move fast on, so I picked that. That's your greedy search strategy.

One of the things I explained in the talk is the reason RAG exploded: the barrier to entry was low and you didn't need to fine-tune. But if you're moving in a direction where those advantages really become kind of shackles on your process then it really defeats the purpose. So you want to focus on agility, evaluation, and quick decision making.

Other Videos By LLMs Explained - Aggregate Intellect - AI.SCIENCE

2024-04-16	Building a LLM Testing API
2024-04-11	LLMs - Chunking Strategies and Chunking Refinement
2024-04-10	Large Language Models as a Building Blocks
2024-04-04	Competitive Advantage for Startups in era of LLMs
2024-04-02	Intersection Between LLMs and Products
2024-03-28	What is the right team composition in era of LLMs?
2024-03-28	Building an LLM Teacher-bot
2024-03-27	What is the relationship between LLMs and multi-modality?
2024-03-26	What are the system level considerations for using LLMs?
2024-03-22	What is the relationship between language and intelligence?
2024-03-21	How do you improve your RAG pipeline?
2024-03-20	Are long context LLMs the death of RAG?
2024-03-19	How Do You choose between training, fine-tuning, and using small models?
2024-03-15	Multi-agent LLMs Course #business #startup https://maven.com/forms/30a683
2024-03-15	LLM Evaluation, Validation, and Verification
2024-03-14	How Do You Validate LLM Systems Beyond Benchmarks?
2024-03-13	Can Sherpa (multi-agent llm) Handle Multi-modality?
2024-03-12	What Kind of Risks Are Specific to LLMs?
2024-03-08	LLMs, What Skills to Learn? and What a Time to be Alive!
2024-03-07	How do you Force an LLM to Keep Track of the Assumptions a Document Makes?
2024-03-06	How to Annotate Data for LLM Applications

Tags:

deep learning

machine learning

Channel	Latest
DOOMFESTER	6 hours ago
ありなみパイセン	6 hours ago
Evelone Rofls	6 hours ago
key.amra🌹	6 hours ago
Tekken 8 Re Plays	6 hours ago
Mr. WEN	6 hours ago
ransmo5	6 hours ago
상상상상	6 hours ago
Nando-Friki	6 hours ago
Enoch Hui 2 (鐵路丶巴士丶Switch & 迷你公仔迷)	7 hours ago
Petiru	7 hours ago
Drunken Disciple	7 hours ago
Flame-Of-Justice	7 hours ago
Al Pachino vs 5	7 hours ago
Dividen 365	7 hours ago
ASURA_REMIL	7 hours ago
Gamers Pettai	7 hours ago
철권엠아재(MBCtekken)	7 hours ago
Jam jest Jakub	7 hours ago
Wicked LC	8 hours ago
RUNEMASTER-	8 hours ago
XINNN	8 hours ago
PyrepPhoenix	8 hours ago
Heroth	8 hours ago
Kerr9000	8 hours ago