Evaluating Retrieval System Effectiveness

Channel:

Subscribers:

349,000

Published on September 6, 2016 5:07:24 AM ● Video Link: https://www.youtube.com/watch?v=Tw4guy9X8U0

Duration: 1:11:30

1,941 views

One of the primary motivations for the Text REtrieval Conference (TREC) was to standardize retrieval system evaluation. While the Cranfield paradigm of using test collections to compare system output had been introduced decades before the start of TREC, the particulars of how it was implemented differed across researchers making evaluation results incomparable. The validity of test collections as a research tool was in question, not only from those who objected to the reliance on relevance judgments, but also from those who were concerned as to how they could scale. With the notable exception of Sparck Jones and van Rijsbergen's report on the need for larger, better test collections, there was little explicit discussion of what constituted a minimally acceptable experimental design and no hard evidence to support any position. TREC has succeeded in standardizing and validating the use of test collections as a retrieval research tool. The repository of different runs using a common collection that have been submitted to TREC enabled the empirical determination of the confidence that can be placed in a conclusion that one system is better than another based on the experimental design. In particular, the reliability of the conclusion has been shown to depend critically on both the evaluation measure and the number of questions used in the experiment. This talk summarizes the results of two more recent investigations based on the TREC data: the definition of a new measure, and evaluation methodologies that look beyond average effectiveness. The new measure, named bpref for binary preferences, is as stable as existing measures, but is much more robust in the face of incomplete relevance judgments, so it can be used in environments where complete judgments are not possible. Using average effectiveness scores hampers failure analysis because the averages hide an enormous amount of variance, yet more focused evaluations are unstable precisely because of that variation.

Other Videos By Microsoft Research

2016-09-05	Folklore of Network Protocol Design (Anita Borg Lecture)
2016-09-05	Toolkit for Construction and Maintenance of Extensible Proof Search Tactics
2016-09-05	ME++
2016-09-05	Structural Comparison of Executable Objects
2016-09-05	Indifference is Death: Responsibility, Leadership, & Innovation
2016-09-05	TQFTs and tight contact structures on 3-manifolds┬á┬á┬á┬á┬á┬á
2016-09-05	Wireless Embedded Networks/The Ecosystem and Cool Challenges
2016-09-05	Data Mining & Machine Learning to empower business strategy
2016-09-05	Some uses of orthogonal polynomials
2016-09-05	Approximation Algorithms for Embedding with Extra Information and Ordinal Relaxation
2016-09-05	Evaluating Retrieval System Effectiveness
2016-09-05	Exploiting the Transients of Adaptation for RoQ Attacks on Internet Resources
2016-09-05	Specification-Based Annotation Inference
2016-09-05	Emotion Recognition in Speech Signal: Experimental Study, Development and Applications
2016-09-05	Text summarization: News and Beyond
2016-09-05	Data Streaming Algorithms for Efficient and Accurate Estimation of Flow Size Distribution
2016-09-05	Learning and Inferring Transportation Routines
2016-09-05	Raising the Bar: Integrity and Passion in Life and Business: The Story of Clif Bar, Inc.
2016-09-05	Revelationary Computing, Proactive Displays and The Experience UbiComp Project
2016-09-05	The Design of A Formal Property-Specification Language
2016-09-05	Data Harvesting: A Random Coding Approach to Rapid Dissemination and Efficient Storage of Data

Tags:

microsoft research

Channel	Latest
最終の哈姆	6 hours ago
【索爾遊戲】	6 hours ago
Agro Squirrel Narrates	6 hours ago
HANZROB TV	6 hours ago
Alfian Fikri Prananda	6 hours ago
Ray noa	6 hours ago
Trophy Tom	7 hours ago
Legion	7 hours ago
jukaholics	7 hours ago
PesMe	7 hours ago
Mister TS	7 hours ago
Garena Free Fire VN	7 hours ago
Futebol Virtual Games 🎮	7 hours ago
Garena Free Fire TH	7 hours ago
Spepticle	7 hours ago
The Cricket Fire	7 hours ago
Bettypvp	7 hours ago
ねみゅさん	7 hours ago
POWER OF GAME	7 hours ago
Soccer Gameplay	7 hours ago
Anton gamer	7 hours ago
Kenshin	7 hours ago
The Unnaturals	7 hours ago
Borinda Playstation	7 hours ago
Omni Ender	7 hours ago