Monday, January 13, 2014

Week 1 Reading Notes: Introduction

1. What is Information Retrieval and how it is developed
"Information retrieval deals with the representation, storage, organization of, and access to information items such as documents, Web pages, online catalogs, structured and semi-structured records, multimedia objects. The representation and organization of the information items should be such as to provide the users with easy access to information of their interest."
It is first used in the field of Libraries. After Web was invented in 1989, IR is heavily used in web search since then. Other search applications including desktop and file system search.

2. Architecture and  Process of IR System
Major components of IR System are information need, query, search engine, index, document and results. System query is the parsed and expanded user query.Its performance could be evaluated based on efficiency and effectiveness. Efficiency can be measured in terms of time and space; while effectiveness depends entirely on human judgement of relevance.
When processing, the ranking algorithm of search engine is of great importance. The purpose of ranking is to identify the documents that are mostly likely to be considered relevant by the user, and constitutes the most critical part of the IR system. One method of improve ranking is collecting feedback from the users. In the web the most abundant form of user feedback are the clicks. Another way is to identify sites of high authority.

3. The Web
How Web change search:
Crawling: The inherent distributed nature of the Web collection requires collecting all documents and storing copies of them in a central repository, prior to indexing.
Larger size of collection; larger volume of user queries.
Harder for prediction accuracy.
Difficulty in identifying structured data associated with business object(such as e-commerce).
Filtering web spam.

No comments:

Post a Comment