Information Retrieval

At the core of human success is the ability to make informed decisions. In fact, information is the critical input to all decision making processes of organic systems or designed devices, and can be described as any pattern that influences the formation or transformation of other patterns. At this point one should not confuse information with data, as data are the meaningless building blocks of information that can be formed into patterns. From a human perspective, our success has been fuelled by the ability to not only store and retrieve information internally as memories (specifically declarative and procedural memory [1]) but also externally in both hard and soft formats like books and the plethora of technology based storage devices. The term "information retrieval", coined by Mooers[2] in 1948, was spawned by the use of these external information storage systems.

Since humans have been able to record information they have been searching for and retrieving it. However as a field of research and development it only dates back to the 1960s with the advent of computer storage devices. IR as a term refers to more than just the retrieval process, it also deals with processes and problems associated with "the presentation, storage, organization of, and access to, information items"[3]. IR can be said to fall generally under the umbrella of Information Science, however due to its multifaceted nature, IR is recognised as a broad interdisciplinary field [4], covering such disciplines as librarianship, information science, computer science, cognitive psychology, human behaviour, linguistics, iconography & semiotics.

IR has been driven by the explosion of stored information in data-repositories and on the Internet. Current research mainly focuses on solving the problem of returning contextually appropriate information for an individual's query given the quantities of information involved. This problem is demonstrated by the common scenario we have all experienced, that of data-avalanche, which sees a simple query result in so much approximately appropriate information that unacceptable cognitive and time overheads are induced, making it impossible for the user to locate the most appropriate information.

These problems are not currently well addressed as demonstrated by the non-contextual ranking of returns by most popular search-engines (eg; Infoseek, Google & Yahoo!) and sees ongoing research in such areas as classification, clustering and cluster presentation, human computer interaction (HCI) and cognition in visual search.

Keywords

query expansion, word filtering (eg: stemming & removing stop-words, prefixes and suffixes), cluster analysis, classification, indexing, information extraction, information retrieval models, term weighting, TFIDF, smoothing, noise filtering, dimension reduction, modelling, taxonomic development, user interface theory, human factors, query languages, cognitive load theory, cognition in visual search, relevance judgement, precision and recall, f-measure, mean average precision, Bookmaker, standard boolean, extended boolean, fuzzy retrieval, vector space models, generalized vector space models, topic-based vector space models, extended boolean, enhanced boolean, enhanced topic-based vector space model, latent semantic indexing, binary independence retrieval, uncertain inference, language models, divergence for randomness models.

References

[1] Ryle, G., “The concept of mind”, New York: Barnes and Noble, 1949.
[2] Mooers, C. Application of Random Codes to the Gathering of Statistical Information, Bulletin 31, Zator Co., Cambridge, Mass. Based on M.S. thesis, MIT, 1948.
[3] Baeza-Yates, R., & Ribeiro-Neto, B., “Modern Information Retrieval”, New York, NY: ACM Press; 1999, p.1; ISBN: 0-201-39829-X.
[4] Saracevic, T. 1997. Users lost: reflections on the past, future, and limits of information science.SIGIR Forum 31, 2 (Dec. 1997), 16-27. DOI= http://doi.acm.org/10.1145/270886.270889

Recommended text Book

Ricardo Baeza-Yates & Berthier Ribeiro-Neto, “Modern Information Retrieval”, New York, NY: ACM Press; 1999; ISBN: 0-201-39829-X.
Shneiderman, B., (1998) “Designing the User Interface. Strategies for Effective Human-Computer Interaction”, Addison-Wesley, Reading, Massachusetts.

Major figures in the Field

C. Mooers, E. Garfield, G. Salton, B. Croft, C. J. van Rijsbergen, S. E. Robertson, S. Dominich and A. Bookstein.

Summary written by

Darius Pfitzner
Artificial Intelligence Laboratory
Flinders University of South Australia
March 2006