Friday, February 14, 2014

Week 6 Reading Notes: Evaluation

Information retrieval has developed as a highly empirical discipline, requiring careful and thorough evaluation to demonstrate the superior performance of novel techniques on representative document collections.

1.       Test collection:
A document collection;
A test suite of information needs, expressible as queries;
A set of relevance judgments, standardly a binary assessment of either relevant or nonrelevant for each query-document pair.
Standard test collections: Cranfield, Text Retrieval Conference (TREC)
2.       Evaluation of unranked retrieval sets
(1)     Precision (P) is the fraction of retrieved documents that are relevant
(2)     Recall (R) is the fraction of relevant documents that are retrieved
(3)     accuracy is the fraction of its classifications that are correct
(4)     F measure, which is the weighted harmonic mean of precision and recall
3.       Evaluation of ranked retrieval sets
precision-recall curve, interpolated precision
4.       Assessing relevance
Pooling: where relevance is assessed over a subset of the collection that is formed from the top k documents returned by a number of different IR systems
marginal relevance: whether a document still has distinctive usefulness after the user has looked at certain other documents
5.       System quality and user utility
(1)     User utility: a way of quantifying aggregate user happiness, based on the relevance, speed, and user interface of a system

(2)     Refining a deployed system: A/B TEST

No comments:

Post a Comment