Summaries in System Evaluation

Thursday, 5th March 2009

Submission Type: Long Presentation

In batch evaluation of retrieval systems, performance is calculated based on predetermined relevance judgements, which are applied to a list of documents returned by the system for a query. This evaluation paradigm, however, ignores the current standard operation of search systems, where users are required to view summaries of documents prior to reading the documents themselves.

We modify popular IR metrics such as MAP and P@10 to incorporate the summary reading step of the search process, and investigate the effects on system rankings using TREC data. Likely disagreement levels between relevance judgements of summaries and of documents are established based on a pilot user study, and these values are used to seed simulations of summary relevance in the TREC data. Re-evaluating the runs submitted to the TREC Web Track, we find the average correlation between system rankings and the original TREC rankings is 0.8, which is lower than commonly accepted for system orderings to be considered equivalent.

Abstract: Summaries in System Evaluation slides

Authors: Falk Scholer

Event: Fourth HCSNet Next-Generation Search Technology Workshop (NGS09)

← View all submissions for this event.