Empirical - Open Source LLM Evaluation UI

Subscribers:
5,330
Published on ● Video Link: https://www.youtube.com/watch?v=oXPh0MJv0UM



Duration: 44:38
300 views
7


Had a great conversation with Empirical's CEO, Arjun Attam today.

He has built a great open source tool to enable anyone to evaluate across any LLM, dataset and workflow procedure, as all you have to do is to put the LLM prompt / python script to a .json file, as well as whatever input/output dataset you would be using to evaluate.

Essentially, Empricial's business model is to provide value for a generic audience, and then help consult customers to aid them in integrating LLMs in an optimised fashion in their workflow :)

Super easy to use too. Check out their GitHub for more information:
https://github.com/empirical-run/empirical

As a side note, we both share the same goals of helping others, and making sure the value is brought to the table first, before even thinking of compensation. That is the reason why I did this YouTube channel too - to share knowledge, encourage discussion, and I have enjoyed the journey from the very beginning :)

~~

Empirical Repo: https://github.com/empirical-run/empirical

My projects that are mentioned:
StrictJSON Repo: https://github.com/tanchongmin/strictjson
TaskGen Repo: https://github.com/simbianai/taskgen

~~

0:00 Introduction
1:03 Empirical Demo to evaluate LLM parsing JSON
6:03 empiricalrc.json configuration
17:16 How to use Empirical CLI
19:11 Results of gpt-3.5-turbo vs Llama 3 for JSON parsing (using StrictJSON for Llama 3)
20:54 Evaluating LLM output via Empirical UI
25:50 How to use Empirical for your workflow
28:56 Why Open Source?
31:40 How does Empirical Monetise?
35:08 Empirical’s Target Customers
38:36 Arjun’s Life Motivation - Empowering People via Technology
43:38 Concluding Remarks

~~

AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.

Discord: https://discord.gg/bzp87AHJy5
LinkedIn: https://www.linkedin.com/in/chong-min-tan-94652288/
Online AI blog: https://delvingintotech.wordpress.com/
Twitter: https://twitter.com/johntanchongmin
Try out my games here: https://simmer.io/@chongmin




Other Videos By John Tan Chong Min


2024-08-21AriGraph (Part 2) - Knowledge Graph Construction and Retrieval Details
2024-08-13alphaXiv - Share Ideas, Build Collective Understanding, Interact with ANY open sourced paper authors
2024-07-30AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents
2024-07-23NeoPlanner - Continually Learning Planning Agent for Large Environments guided by LLMs
2024-07-17Intelligence = Sampling + Filtering
2024-07-12Michael Hodel: Reverse Engineering the Abstraction and Reasoning Corpus
2024-07-02TaskGen Conversational Class v2: JARVIS, Psychology Counsellor, Sherlock Holmes Shop Assistant
2024-06-04CodeAct: Code As Action Space of LLM Agents - Pros and Cons
2024-05-28TaskGen Conversation with Dynamic Memory - Math Quizbot, Escape Room Solver, Psychology Counsellor
2024-05-21Integrate ANY Python Function, CodeGen, CrewAI tool, LangChain tool with TaskGen! - v2.3.0
2024-05-11Empirical - Open Source LLM Evaluation UI
2024-05-07TaskGen Ask Me Anything #1
2024-04-29StrictJSON (LLM Output Parser) Ask Me Anything #1
2024-04-22Tutorial #14: Write latex papers with LLMs such as Llama 3!
2024-04-16SORA Deep Dive: Predict patches from text, images or video
2024-04-09OpenAI CLIP Embeddings: Walkthrough + Insights
2024-03-26TaskGen - LLM Agentic Framework that Does More, Talks Less: Shared Variables, Memory, Global Context
2024-03-18CRADLE (Part 2): An AI that can play Red Dead Dedemption 2. Reflection, Memory, Task-based Planning
2024-03-11CRADLE (Part 1) - AI that plays Red Dead Redemption 2. Towards General Computer Control and AGI
2024-03-05TaskGen - A Task-based Agentic Framework using StrictJSON at the core
2024-02-27SymbolicAI / ExtensityAI Paper Overview (Part 2) - Evaluation Benchmark Discussion!