Automatic Evaluation of Dialogue Systems using LLMs

Published on ● Video Link: https://www.youtube.com/watch?v=S9rkMqK3YNE



Category:
Vlog
Duration: 18:56
169 views
6


Speaker: Benedicte Pierrejean

Summary
-------
Ada is a customer support company, whose goal is to deliver an AI platform that enables businesses to automatically resolve customer service conversations with minimal effort. They have expanded their services beyond chatbots to include other channels like social media, SMS, and emails. Recently, they have been working on the automatic evaluation of their dialogue systems that are built using LLMs. Ada faces challenges with traditional evaluation metrics like BLEU and ROUGE, which limit creativity and do not evaluate conversations end-to-end.

Topics:
-------
⃝ Recent Developments
* University of Stanford's HELM dataset and Chatbot Arena are recent developments in the field of evaluating dialogue systems.
* Focus of this work is on enabling users to achieve their goals and evaluating the overall conversation between the LLM and the user

⃝ The BotvBot Testing Framework
* This work proposes the BotvBot testing framework for evaluating dialogue systems
* The framework consists of an offline phase and a test phase
* The offline phase of the BotvBot framework involves generating scenarios and users using different tasks (questions and actions), personalities, conversational styles etc.
* The online phase consist of running and evaluating the conversations between the simulated users and the target bot
* Performance is evaluated by measuring the percentage of Automated Resolutions

⃝ Automated Resolution Rate
* This measures the success rate of resolving customer inquiries without human involvement
* The automated resolution rate helps clients identify areas for improvement and can be applied to various customer support tasks







Tags:
deep learning
machine learning