Can Sherpa (multi-agent llm) Handle Multi-modality?

Published on ● Video Link: https://www.youtube.com/watch?v=Kau7Gk2Olqo



Duration: 1:58
73 views
2


Check out my essays: https://aisc.substack.com/
OR book me to talk: https://calendly.com/amirfzpr
OR subscribe to our event calendar: https://lu.ma/aisc-llm-school

AF: Can Sherpa handle multimodality?

PC: Inside the Sherpa library, what we try to do is implement different agent execution strategy. The execution strategy itself is multimodal because it doesn't really care about what kind of tasks you're handling.

In the demo, I was able to generate images, code and so on. Those things happened through "actions". For example, the diagram generation tool from the intermediate representation.

You need to describe the data in a string format so that a large language model can handle it. We have some default actions. Most of them handle text. But you can create your own actions to deal with images with a customized model for images to text.

AF: That's an opinionated design choice that we have made, which is Sherpa only handles the task orchestration at the agent level and all of the data specific activities are delegated to the tools. That way we have a separation of responsibilities.

Let's say if a user provides an image to Sherpa, and say: extract this information and do this math, then the agents in Sherpa know to call a specific image to text type of tool to get the text description of the image. Then, use that text and send it to the math tool that writes Python code related to whatever the operation is and get the result back and, then send it to another tool that does the summarization, maybe at the end, it calls another tool yet again and sends it the results and it reads it out with text to speech kind of tool, right?

All of those handling the modalities are just completely delegated to the tools to make the system very generalizable and scalable. You can just completely focus on building the right tools and delegate all the LLM handling to Sherpa because it will take care of those. And you can just create an army of various specialized models and systems and just make them available through APIs to the system.







Tags:
deep learning
machine learning