Building a LLM Testing API

Published on ● Video Link: https://www.youtube.com/watch?v=4vG18daD5ao



Duration: 18:55
1,216 views
13


Check out my essays: https://aisc.substack.com/
OR book me to talk: https://calendly.com/amirfzpr
OR subscribe to our event calendar: https://lu.ma/aisc-llm-school
OR sign up for our LLM course: https://maven.com/aggregate-intellect/llm-systems

⃝ Challenges of testing Conversational AI systems:

🟒 There's no single agreed-upon approach for unit testing or regression testing in the world of chatbots.
🟒 Traditional metrics (accuracy, precision, recall) might not capture user-facing issues like prompt leakage or language drift.
🟒 Annotating data for testing is expensive and time-consuming.

⃝ Framework for automated testing:

🟒 Leverages generative models to automatically create question-answer pairs for testing a knowledge base system.
🟒 Users can define prompts and the system generates questions and checks the responses for accuracy against the knowledge base.
🟒 The framework can be used for integration testing as well as evaluating responses from large language models.

⃝ Key Learnings:

🟒 Most users still test chatbots manually.
🟒 It's important to focus on testing that reflects real-world use cases and business goals.
🟒 Start with a Minimum Viable Product for testing internally and iterate based on user feedback.
🟒 Consider a human-in-the-loop approach for data annotation where humans curate outputs from generative models.

⃝ Open questions and future directions:

🟒 How to effectively incorporate human feedback into the testing process, considering factors like cultural norms and brand voice.
🟒 How to balance the trade-offs between different large language models (e.g., conversational fluency vs. factual accuracy).







Tags:
deep learning
machine learning