![]() |
User loginSearch the SiteSearch Member Database |
Natural Language Processing and Language TechnologyNatural Language Processing (NLP), initiated in the 1950s as research on machine translation, refers to the area of science with the goal of processing data expressed in natural (human) languages, such as English, German, Japanese or Chinese. It is quite similar to Computational Linguistics, but the latter is more concerned with exploring the nature of language by analysing the properties of linguistic theories, either theoretically or through implementations, rather than using them in applications. NLP is an interdisciplinary field of science combining computer science, linguistics, mathematics, logic, psychology and philosophy. During its first three decades, research in NLP was based on symbolic models including formal rule systems and logic; since the late 1980s there has been a shift in interest towards statistical models based on analysis of large amounts of language data (corpora). Nowadays, both approaches are actively under development, with various attempts to merge the two. The data that are processed can be speech or text. For research purposes they are collected in corpora usually of a well-defined type: news, medical reports, stock exchange announcements, emails, internet chat archives, telephone conversations etc; or for a well-defined purpose, such as part-of-speech recognition, temporal information extraction, question answering or text summarisation. Corpora may consist not only of pure text, but also can have annotations associated with the content of documents. These annotations carry useful information about the text, such as part-of-speech tags and syntactic analysis; more task-related annotations, such as named entity identification and coreference relations are also common. The full analysis of speech and text is a complex task and involves many processing levels: phonology, morphology, syntax, semantics, pragmatics and discourse analysis. Recently, the field has developed dramatically for a number of reasons: the development of the internet (leading to new applications and a rich source of data); the increase in computer power (text processing is computationally demanding); the growing interest of business in applications of NLP; and the development of linguistic resources in digital format which allow for further research in the field. Some of the most important applications are: machine translation, information extraction, information retrieval, summarization, question answering, text generation, computer-assisted language learning, human-computer interaction, computer assistance for the disabled, and knowledge acquisition. Reference BooksJurafsky D. and Martin J.H., An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Prentice-Hall, 2000 Mitkov R., The Oxford Handbook of Computational Linguistics, Oxford University Press, 2005 Crystal D., A Dictionary of Linguistics & Phonetics, Blackwell Publishing, 2003 Recommended Text BooksJurafsky D. and Martin J.H., An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Prentice-Hall, 2000 Mitkov R., The Oxford Handbook of Computational Linguistics, Oxford University Press, 2005 Summary written byCentre for Language Technology April 2006 |