Empirical - Open Source LLM Evaluation UI
Had a great conversation with Empirical's CEO, Arjun Attam today.
He has built a great open source tool to enable anyone to evaluate across any LLM, dataset and workflow procedure, as all you have to do is to put the LLM prompt / python script to a .json file, as well as whatever input/output dataset you would be using to evaluate.
Essentially, Empricial's business model is to provide value for a generic audience, and then help consult customers to aid them in integrating LLMs in an optimised fashion in their workflow :)
Super easy to use too. Check out their GitHub for more information:
https://github.com/empirical-run/empirical
As a side note, we both share the same goals of helping others, and making sure the value is brought to the table first, before even thinking of compensation. That is the reason why I did this YouTube channel too - to share knowledge, encourage discussion, and I have enjoyed the journey from the very beginning :)
~~
Empirical Repo: https://github.com/empirical-run/empirical
My projects that are mentioned:
StrictJSON Repo: https://github.com/tanchongmin/strictjson
TaskGen Repo: https://github.com/simbianai/taskgen
~~
0:00 Introduction
1:03 Empirical Demo to evaluate LLM parsing JSON
6:03 empiricalrc.json configuration
17:16 How to use Empirical CLI
19:11 Results of gpt-3.5-turbo vs Llama 3 for JSON parsing (using StrictJSON for Llama 3)
20:54 Evaluating LLM output via Empirical UI
25:50 How to use Empirical for your workflow
28:56 Why Open Source?
31:40 How does Empirical Monetise?
35:08 Empirical’s Target Customers
38:36 Arjun’s Life Motivation - Empowering People via Technology
43:38 Concluding Remarks
~~
AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.
Discord: https://discord.gg/bzp87AHJy5
LinkedIn: https://www.linkedin.com/in/chong-min-tan-94652288/
Online AI blog: https://delvingintotech.wordpress.com/
Twitter: https://twitter.com/johntanchongmin
Try out my games here: https://simmer.io/@chongmin