![]() |
Search the SiteSearch Member Database |
NGS07 - List of presentersKeynote SpeakerBrett Poole (Head of Search of Yahoo!7)AbstractHumans are hot again, when it comes to search. But this time, it's not the old school method of using a small group of human editors to categorise the web. Instead, search engines are tapping into human knowledge, social networking and aggregated results more widely. Learn about the social search revolution that's underway in this session. Short BiographyBrett Poole began his career with a variety of roles in commercial web development where he was responsible for all facets of production including planning, designing, building and marketing of client websites. In 2001, Brett was appointed as search product manager at Hitwise. During this time he focused on creating unique search marketing strategies for key clients by integrating website consulting and sponsored search solutions. From there, Brett joined Yahoo! Australia & NZ in July 2004 as Search Producer, responsible for the technical integrity and local relevance of the Yahoo!'s search offering. In January 2005, he was appointed to Product Manager of Search under the newly formed Yahoo!7, and in 2006 Brett became the Head of Search. Brett is charged with the ongoing development and direction of Yahoo! search, and his achievements include the successful launch of Yahoo!7 Answers and Video Search in the Australian market. 30-minute projectsDian Tjondronegoro (QUT)Cross-Modal Web Searching using XML-based Data Mining AbstractAs broadband Internet is becoming affordable for home and business users, multimedia contents are increasingly used in attracting user's attention to (text-based) Web information. With the rapid growth of multimedia contents on the Web, people continually expect to obtain information from all types of media at once. However, most of the currently available sophisticated search engines, including Google, are only able to support single-modality queries (i.e. there are separate tabs for each modality such as text-only, image-only, and video-only searches). Single-modal search are not efficient in solving many common day-to-day problems such as the following task: {Using text, search for all the news broadcasts from the Web to find video interviews with US President George W. Bush that describes his responses after the "September 11th terrorists attack". Next, use query-by-example to filter the video key frames with some sample images that can be found using keywords "George W. Bush" and "Portrait", for refining the search relevancy and getting major highlights of his speeches.} To solve this challenge, users currently are required to manually switch and buffer query results between separate search engines, such as Google Web, Google Image, and Google Video (with the help of some ad-hoc image retrieval search engines that support visual similarity searches such as Yotofoto.com). Therefore, it is evident that we need to pioneer new algorithms and methods to support a more integrated search that will allow users to exploit all types of media documents interchangeably. Data mining and integration techniques are essential in order to extract and discover knowledge from the current Web contents, forming them into XML-based structures to enable grouping of similar documents and speed up the searching process based on the semantic. This capability will become even more critical in searching the next generation Web which will increasingly contain XML documents. The majority of the standard description schemes for multimedia structures and semantics are also based on XML such as MPEG-7 (http://www.chiariglione.org/mpeg/), which is standard for describing multimedia content data with some degree of interpretation of meaning; and core metadata (i.e. NewsML and SportsML) from International Press Telecommunications Council (IPTC, http://www.iptc.org) which is media-independent structural framework for multimedia news and sports. Moreover, XML is often used to describe folksonomy (community based taxonomy) which is becoming popular for an unstructured and large collection of user-generated multimedia contents such as in YouTube and Google Blog. Consequently, XML is the glue that holds the new Web together; from traditional Web (textual) contents to multimedia (image and video) and users-generated contents. This research aims to apply data integration and mining techniques to XML-based Web documents, and develop a new XML-based information retrieval framework that assists the integration of semantic Web contents in the forms of multi-modal documents (text, image, and video). Research InterestsI am a Lecturer of Interactive Digital Media at QUT, currently teaching and researching in the area of Multimedia and Interaction Design. I have so far published 25 papers in the field that were published in high-quality peer-reviewed international journals and conference proceedings, including ACM TOMCCAP, IEEE Multimedia, and World Wide Web Journal. I am currently leading the visual information analysis and retrieval research theme at QUT, while focusing on: Content-based Video Indexing, Multimedia Information Retrieval (on the Web and Mobile Devices), Interaction Design, Web Applications, Database Applications, Information Systems, Mobile Computing, XML, XQuery, and MPEG-7. My PhD thesis (awarded in 2005), entitled "Content-based Video Indexing for Sports Applications using Integrated Multi-Modal Approach", presented innovative results on three major components of multimedia retrieval Peter Bailey (CSIRO)Enterprise 2007 -- task modelling, CERC collection creation, and system building AbstractThe TREC Enterprise track has been in existence for 2 years prior to 2007, and addresses challenges associated with enterprise search. The previous enterprise collection studied was gathered from W3C web pages and mailing list archives. In 2007, a new information seeking and retrieval task is being explored -- which models CSIRO science communicators preparing a "missing page" to cover some overview on a topic of interest to the public. To do this, they require access to a set of useful documents and experts. Novel factors in the track include:
The CSIRO Enterprise Research Collection (CERC) is being created throughout 2007. The corpus is available for download as of mid May and has been requested by over 30 participating organisations worldwide. Fifty topics are being released at the end of June, and judging of results will commence in August. A number of experiments will be conducted to validate aspects of the judging methods in use. In particular, we are interested in comparing the correlation of judgements by original science communicators with participants to assess the validity of community-judging practices for such IS&R tasks. CSIRO's Search and Delivery project is involved in creating both the collection and a system to address issues of document result diversity and expert identification for this task. The CERC is expected to: form a highly useful IR test collection for task-contextual enterprise search; be the basis for additional experiments on evaluation methods; and be used as a vital ingredient to explore interesting and challenging language generation tasks (e.g. actually generate the "missing page" overview). Research InterestsPeter Bailey has a number of research interests, particularly in the area of web and enterprise search and evaluation. As project leader of CSIRO ICT Centre's Search and Delivery project, he is particularly interested in the cross-over between contextually-influenced information retrieval and natural language generation technologies. He is also interested in large scale web crawling and mining information from changes in time. Alexandra L. UitdenbogerdNext Generation Music Search AbstractThe current state of the art for music information retrieval consists of robust systems that can take a sung melody query and retrieve relevant answers from collections containing music in a symbolically defined form such as MIDI. Published research has shown reasonable results for collections of tens of thousands of pieces of real music, and up to 100,000 of artificial music snippets. For audio collections, search for a recording given a -- possibly noisy or distorted -- digital excerpt of the recording also yields good reliability of answers. However, a task that is currently still elusive is the retrieval of audio recordings given sung -- or even notated -- queries. The reason this is highly challenging is the inaccuracy of the music transcription process that converts a recording of a complex musical work into a symbolic format. Musical instrument timbre, acoustic effects and incidental noises can all be confused with notes causing more errors than correct notes in pieces containing multiple notes at a time as well as various instruments. Our Music IR team at RMIT are currently tackling these problems. Current approaches include:
This research is the core component of a range of music retrieval topics being tackled by our group, including timbre and mood-related retrieval. The music retrieval group is part of the RMIT Search Engine group, which focuses on all aspects of search technology. Research InterestsI'm interested in many aspects of search, with my main focus being music information retrieval. Within that field my primary projects currently are the retrieval of audio with note-based queries, as well as subjective music retrieval via mood or genre. Related to this topic is my interest in recommender system technology, in which different types of evidence (ratings, demographic data, audio similarity) are combined to best recommend new music to users. My other main research interest is computer-assisted language learning. Within this field I'm interested in techniques for building tools that make use of the web as a corpus for recommending reading (readability), and for presenting concordances that represent language usage. Diego Molla-Aliod (Macquarie University)AnswerFinder -- question answering and related tasks AbstractAnswerFinder is a question answering system being developed at the Centre for Language Technology at Macquarie University. Directed by Diego Molla, AnswerFinder's members have worked on various aspects of question answering. In the presentation we will look at some of the work made by its members. The main aim of AnswerFinder is to answer fact-based questions and with this aim in mind the project has participated in the Text REtrieval Conference during various years. The system has been designed so that it can be easily adapted to various tasks and it can be tested module by module. A quick overview of the architecture of AnswerFinder will be presented. AnswerFinder's main innovation is the use of machine learning techniques to learn the mapping between types of questions and the location of the exact answers in the answer sentences. The logical contents of both the questions and the answer sentences are expressed in graphs, and the learning component uses graph manipulations to do the mapping. Machine learning has also been used in Afner, the named-entity recogniser we are developing. Afner is designed with the aim to maximise the likelihood of detecting entites that could be answers. We have also done some preliminary work on the use of machine learning for question analysis. Recently we have experimented with extensions of question answering for summary-based questions (and participated in the Document Understanding conference), multilingual question answering (we are participating in the Cross-lingual Evaluation Forum), and question answering from speech transcripts (at the time of writing this abstract we are preparing the system for participation in the CLEF track on question answering on speech transcripts). Research InterestsMy current research interests focus on the development of question answering technology. I am particularly interested in the combination of linguistic information beyond bags of words, with machine learning techniques. Question answering requires a wide range of technologies and consequently I have been involved in research on information retrieval, named-entity recognition, question analysis, answer extraction, and answer presentation. Hong Liang Qiao, Yidong Yuan, Tony Pham (Lexxe Pty Ltd)Towards Lexxe Beta Version AbstractThe core issue of search is content and search content is mostly represented in the form of language. Natural Language Processing falls naturally in the range of core technology of search. Current search technologies offered for public use do not address this issue adequately. This is where Lexxe comes in in order to make search results more relevant. Since the launch of Lexxe Alpha version in July 2005, it has grown from a single search engine to a more complex system offering a variety of service, e.g. site search, search by country and news search. Lexxe has also developed vertical search and Chinese search. Lexxe search engine has received attention from world media and experts in search technology, as well as a community of users from around the world. Lexxe Beta Version is scheduled to be launched in October 2007. In this talk, we will give an exclusive preview of the new search engine. The differences between Lexxe Alpha and Beta versions can best be characterised in three major issues: 1) Lexxe Beta version will have its own index; 2) the search engine will be re-written in C/C++ under Unix; 3) it will expand its Natural Language search technologies to cover more aspects of search. Apart from Short Question Answering, Phrase Recognition in key word search, Clustering and Screening out Irrelevant Content, Lexxe's Beta version will implement a semantic ranking method based on the analysis of the meaning of each web page. Research InterestsSearch engine, question answering, natural language interface to databases, text categorisation, word sense disambituation, parsing, POS tagging, very large databases, statistics, etc. David MartinezILIAD (Improved Linux Information Access by Data Mining) AbstractThe ILIAD project aims to apply deep linguistic processing techniques for information delivery from segmented textual data streams. That is, we detect the underlying information structure of a multi-document text discourse, which we then distill into a factoid-based summary, filtering out any subjective information content in the process. The individual data streams are then classified according to a multidimensional model of the information type described therein, allowing a user to query the data using constrained boolean queries. A given query will then produce a ranked list of factoid summaries, linked to the original data streams. The particular domain we propose to exemplify this general task in is Linux user web forums. Consider the following scenario: Kim, a Debian GNU/Linux user, notices that as a result of the latest upgrade on her laptop, she realises she can no longer start up the GNOME desktop environment. She goes to Google to troubleshoot the problem and tries inputting the version details of various X packages and the hardware particulars of her laptop, along with different combinations of keywords such as broken, not working and won't start; all of the top-ranking hits are either outdated and inapplicable to the latest version packages, or irrelevant to the task at hand. She checks the archives of a selection of relevant-sounding Debian mailing lists without luck. Finally after searching the web for 2 hours she stumbles across a series of pages describing an error with gtk and the method for correcting the problem. This example (based on real-world experience) is intended to illustrate the fact that, while web search engines such as Google are remarkably successful at locating individual documents/sites typifying a given information type, they are largely unable to track data streams spanning multiple documents as found, e.g., in mailing list archives. Additionally, typical web search engines have no facility to specify the time span of documents to search over, and have only limited means of picking up on lexical variants on a given query, such that the exact wording of the query can be crucial to arriving at the desired document. Imagine instead that there were a system which could crawl the main English-based Linux web forums throughout the world, analyse each thread to arrive at a conceptual representation specified for package, version and system information, and pre-classified according to the particular problem type. Imagine further that the evolution of the proposed solutions and diagnostics in a given thread were distilled into a single succinct list of factoids, and that the various threads on the web pertaining to a particular problem were then combined together into a single ranked list of possible solutions, with links to the original web data. To access the desired solution, a user would simply go to a web interface and select the type of problem experienced (e.g. some component of a package is broken or the user wishes to configure a package in a particular way), the package or component type that applied to (e.g. emacs or the X window system in general), and (optionally) the system and hardware configuration (e.g. Debian 3.1, or ALSA 1.1 on a ThinkPad T22), and the system would return information relevant to that query in an easily-applicable form. This research is aimed at developing and integrating the components of this end-to-end system. Research InterestsMy main interest in this area lies in the processing of multi-document discourses (e.g. newsgroup-style data streams) for information delivery. These information sources present important challenges for simple term matching methods, and require new approaches that take into account the structure of the data. More informed tools would perform a linguistic analysis of the texts and obtain a conceptual representation of the segmented data streams, linking them to a factoid-based summary that could be more easily accessed by the user. I am also interested in the resolution of semantic ambiguity in specific domains, such as biomedicine. This requires of sophisticated tools to process the text, and it would allow for more efficient indexing of the big volumes of information stored in journals and other text resources. Jon Patrick (University of Sydney)Information Extraction from Clinical Notes AbstractWe are in the testing phase of a project at the Royal Prince Alfred Hospital that does information extraction from clinical notes in the Intensive Care Unit. The language processing is part of a system to support clinicians complete their ward rounds more efficiently and ease the burden of administration in record keeping.. In the first stage the NLP demonstrates the automatic computation of SNOMED CT codes as clinicians write their progress notes. The system computes a tailored extract of the patient's clinical record from the ICU's information system, CareVue, relevant to the needs of reviewing the patient's case. The extract is presented to the clinician on a screen who then types in the relevant progress notes they wish to make. The system computes the SNOMED CT codes in real-time after analysing the progress notes and then they are stored back into CareVue. The system will be of significant advantage to the clinician in their ward rounds. The automatic extraction of relevant content will give considerable time savings in not having to manually search the clinical information system, considered to be a saving of up to 10 minutes per patient (up to 50 patients in the ward visited twice per day). Post data entry the conversion of clinical records into a coded system will ensure more efficient and more reliable data analytics. The work is expected to progress in two directions, namely to improve the accuracy of the information extraction process and to develop a restricted data analytics natural language grounded in the SNOMED CT coding scheme. Research InterestsOur work is principally focused around semantic processing with application to the Health sector. Current research includes information extraction from clinical notes and published case study reports, named entity recognition linked to clinical terminologies such as SNOMED CT and ICD 10AM and mapping between ontologies and classifications. Other work links NLP to hospital information systems. As an infrastructure for this work we research on the software engineering of Language Technology systems. Enrico Coiera, Annie Lau, Farah Magrabi (UNSW)Knowledge based approaches to improving clinical search behaviours and decision outcomes -- the Quick Clinical Project AbstractThe Centre for Health Informatics has had an ongoing research program into the design and impact of search technologies on clinical care, commencing in 2000. As part of this program we have developed the Quick Clinical information retrieval system (QC) which provides users a task based search interface, and which utilises a knowledge-based approach to automatically formulating the most appropriate search strategies for each task. We model typical clinical tasks like 'diagnosis' or 'prescribing' and attempt to ensure that only the most relevant evidence is retrieved. In this presentation, we will walk through the QC user model and the underlying rule-based search mechanism which targets specific literature resources, and translates and enhances user queries into the respective query languages of each resource. The QC user model inherently guides users to structure their query and improves the chances that they will ask a well-formed query and receive an appropriate answer. The rule-based mechanism, or Meta-search filter (MSF), combines the power of meta-search systems and predefined search filters. It can also be thought of as an encoding of search strategies that capture expert knowledge on where and how to search for answers. QC has undergone multiple stringent evaluations between 2001-6, both in controlled laboratory settings, and in routine use in a clinical care setting. We have demonstrated large and statistically significant improvements in the speed and accuracy with which users can answer questions post search using our approach. Our largest trial involved over 200 general practitioners across Australia shows that general practitioners use the system in routine clinical setting. In addition, QC has provided us a rich platform to study user search behaviours. The talk will thus also summarise some of our recent results in understanding and modifying user search patterns. For example, we have embarked on possibly the first study that investigates and provides evidence that people experience cognitive biases while searching for information. A series of 'debiasing' interventions on the search user interface were designed and trialled and we have shown that using them search behaviours can be debiased, influence decision outcome, affect user confidence and alter information searching behaviour. Research InterestsProfessor Enrico Coiera is the Foundation Chair in Medical Informatics within the Faculty of Medicine at the University of New South Wales (UNSW). He is also Director of the Centre for Health Informatics, and an Adjunct Professor in Computer Science and Engineering at UNSW. Enrico's research interests lie in developing advanced knowledge management and decision support tools to assist with clinical decision making, including advanced search engine technologies, as well as models of human-computer interaction to support the rational, appropriate and sustainable development of technological interventions in health care. Annie Lau is a Postdoctoral Research Fellow at the Centre for Health Informatics, University of New South Wales. Her research interests lie in modelling and designing strategies that enhance human cognitive performance for information retrieval and decision support technologies. For her doctoral research, she conducted possibly the first study that provides evidence that people can experience cognitive biases while searching for information. She also designed and trialled a series of interventions on the search user interface to modify the impact of these biases during search. Annie is currently working in the decision support research stream at CHI, using machine learning and knowledge acquisition methods to learn and model strategies for searching and extracting online information. Dr Farah Magrabi is a Research Fellow at the Centre for Health Informatics, University of New South Wales. Her research is focused on engineering safe clinical information systems based on accident models that specifically describe failures associated with the use of record keeping, clinical decision support and messaging functions in routine care. Farah has extensive practical experience in the evaluation of clinical information retrieval technology. She played a key role in designing and managing, internationally the first nation-wide study to directly measure individual clinicians' patterns of using information retrieval in routine general practice settings. Amir Hadad, Tom Gedeon (ANU)A new Fuzzy logic based method to discover similarity between document words AbstractInformation retrieval is one of the most demanding areas in modern computer science. There are millions of documents available all over the world and retrieving the proper documents on the basis of user needs and preferences is a major problem. There are several reasons for this problem. One of them is the imprecise nature of written documents which is related to the nature of human language. Most of the available methods of information retrieval are based on precise mathematical methods. In this presentation, our aim is to review traditional methods of document retrieval based on crisp logic and their weaknesses and introduce our new fuzzy based vision for fuzzy based information retrieval. In our method we introduce a fuzzy way of detecting keywords of a document via several steps. During this process we need to create fuzzy meaning on the basis of statistical information and we will show how to create linguistics meaning out of crisp and precise information using fuzzy clustering and trapezoid estimation method in a fuzzy manner. We use these meanings to create a matrix of word similarity in which similarity is a fuzzy value not a number. We will use this fuzzy matrix of similarity and other fuzzy values of the words to detect relations of the words to each other. This presentation can help the audience to get a better understanding of the fuzzy logic method in comparison with crisp logic based methods. Additionally it will give the participants the whole idea about the reason of the effectiveness of fuzzy methods for information retrieval and how fuzzy logic methods can create meanings out of statistical data. Research InterestsI am interested in modern ways of information retrieval. I do believe that the old and traditional methods of information retrieval needs to be changed and reconsiders, because of the complexity of users' needs and demands from an IR system (such as a search engine or a database) recently. I have implemented a Meta-Search Engine as my bachelor project which can be assumed as post processing of different search engine results. I am also interested in applying Fuzzy based methods in different researches related to information retrieval such as automated keyword extraction, user preference modelling and keywords similarity detection. My reason for that is by using precise methods for an IR system, because of the imprecise nature of the human literature, you will lose a part of the knowledge related to documents processing. Additionally, I am interested in black box modelling by using fuzzy modellers and using available training set of a system which can be applied for user preference modelling in an IR system. Robert Dale, Brett Powley (Macquarie University)Supercharging the ACL Anthology AbstractThe ACL Anthology is a digital archive of around 12500 conference papers and journal articles in the area of natural language processing, sponsored and maintained by the Association for Computational Linguistics. The archive, which is freely available to all at http://acl.ldc.upenn.edu/, contains the proceedings of ACL and Coling conferences and associated workshops going back as far as 1965, and the contents of the Computational Linguistics journal back to 1974 ; much of the early material has been scanned from hard copies, but since 2000, the Association's conference proceedings have been 'born digital', and are added to the anthology directly. So far the Anthology provides a text-searchable and browsable research repository. In collaboration with researchers at the University of Michigan, the National University of Singapore, the University of Cambridge and others, we are embarking on an initiative to add value to the Anthology content by applying techniques from natural language processing and information retrieval. So, for example, one key focus of work both at Macquarie University and at the University of Michigan has been the hyperlinking of citations to the cited works; and work at NUS has explored aligning slide presentations to the papers they correspond to. In this presentation, we'll provide a sketch of the various things that weare hoping to do with the Anthology, and the issues that arise in dealing with this data set. Our aim is to encourage others to get involved, and to seek further ideas for useful things that can be done using this dataset. Research InterestsMy research falls into three areas: intelligent text processing, natural language generation, and spoken dialog systems. The first of these is most relevant to the present workshop. I have a number of funded projects and student projects looking at various aspects of information extraction and named entity recognition; in particular, I am working with a PhD student, Brett Powley, on the automatic detection and linking of citations and references in scholarly documents, with a longer term aim of adding qualitative assessment to citations, so that we can automatically determine -why- one author has cited another, thus improving current rather impoverished metrics based purely on citation counts with some qualitative information. 10-minute speed papersWillem-Jan Jansen (Fairfax Digital)Playing easy to get -- how search is transforming the searchee AbstractAlthough search technology has become amazingly good at matching syntax to meaning and intent, it's still algorithms. One of the fastest growing businesses online currently is search marketing: optimising websites, content and strategies to gain a competitive advantage in search engines, the major drivers of traffic which can make or break an online business. In the publishing world, we're now experiencing that this form of reverse engineering and leveraging the search algorithms is actually starting to change the object of the search, the content itself. How can quality content remain appealing to people rather than just search bots? I'll highlight some of the problems we're confronted with and possible avenues of investigation. Research InterestsHuman computer interaction, cognitive science (learning & creativity), web search. I currently work as a product & marketing manager at Fairfax Digital, the largest publisher of online news & information in Australia. Due to my academic background as M.Sc. in Industrial Design Engineering/Innovation Management and M.A. in Cognitive Science (UNSW), I have a special interest in human computer interaction, web search and the way it's transforming the business landscape. I'd like to contribute a view 'from the other side of search'. Girija Chetty and Michael Wagner (University of Canberra)Information Retrieval from Broadcast Video Based on Multimodal Fusion AbstractConsiderable research has been devoted to utilizing multimodal features for better understanding multimedia search and retrieval. However, two core research issues have not yet been adequately addressed. First, given a set of features extracted from multiple media sources (e.g., extracted from the visual, audio, and caption track of videos), how do we determine the best modalities? Second, once a set of modalities has been identified, how do we best fuse them to map to semantics? In this work, we proposed a multi-modal information retrieval approach from broadcast video with a multimodal fusion of caption text read on the screen through OCR, face recognition from the visual track and speech recognition from the audio track. The proposed technique used a two-step approach. The first step extracts low level features from audio, visual and text captions of the video track based on latent semantic analysis. In the second step, we use score-level fusion to determine the optimal combination of individual modalities. We carefully analyse the tradeoffs between three design factors that affect fusion performance: modality independence, curse of dimensionality, and fusion-model complexity. The two step multimodal approach shows that the low-level features alone are now becoming insufficient to build efficient content-based retrieval systems. The interest of users is not anymore to retrieve visually similar content, but they expect that retrieval systems find documents with similar semantic content. Bridging the gap between low-level features and semantic content is a challenging task necessary for future retrieval systems. The use of Latent Semantic Analysis (LSA) of audio, visual and text features allows us to efficiently represent the audio-visual content of video shots for semantic content detection. We demonstrate through experimental results on UCBN that our two-step approach using latent semantic analysis and score fusion of audio, visual and caption text, improves the class-prediction accuracy over other traditional information retrieval techniques. Research InterestsMy research interests are in multimodal fusion, biometric identity recognition, content based information retrieval, video coding and analysis and biomedical image analysis. Nathalie Colineau, Cecile Paris and Ross Wilkinson (CSIRO)The Next Generation of Search -- a Whole-of-System Perspective AbstractInformation seeking is a complex process: It is still sometimes difficult to know where to find information, or even what precisely needs to be found, as that may be unspecified, only partially known or yet to be discovered. There is thus a need to provide additional support to the users to better answer their information needs. As the next generation of search tools are becoming more context sensitive and often go beyond pure search, we also need to change the way we evaluate them. The traditional way of measuring the accuracy of an information retrieval system in returning relevant information (i.e., recall and precision) must be complimented by other measures, for example measures for navigation or information interpretation and assimilation. We propose a whole-of-system view of evaluation, one which considers both benefits and costs and multiple aspects of a system (e.g., range of queries supported, ease of assimilation of the information delivered, the usability of the system itself, the ease of implementation). Research Interests (first author)I am interested in helping people find the information that will assist them in their tasks and in their decision making. This places my research at the intersection of Information Retrieval, User Modelling and Language Technology, in particular, Natural Language Generation. More specifically, I'm looking at providing users with understandable information appropriate to their needs and situation, i.e. organised, expressed and presented in such a way as to enable them to readily and effectively make use of the information. My approach is to develop and evaluate solutions from a whole-of-system perspective — considering the system's benefits and costs in the context of its stakeholders, their goals and tasks, and the information sources that the system requires. Menno van Zaanen (Macquarie University)Learning patterns to improve multi-modal information retrieval AbstractMost current multi-modal information retrieval systems (especially those concentrating on text and images) use low-level features to describe the documents to search. These low-level features are used independently, even though there might have been initial dependencies. For example, taking average colour values of different areas in a picture has a natural ordering in itself (i.e. the location of the area in the image), which is disregarded when used as features. I propose to use machine learning methods to find significant patterns in low-level features, taking the dependencies into account. This results in higher-level information that can then again be used as features. This requires enhancements in the use of the features and also requires new machine learning methods that can find significant patterns in multi-dimensional data. Research InterestsI am interested in the use of structure in search and retrieval. How can we extract and use the structure that is inherent in many documents (of different modalities) in a useful way? Most systems that find patterns work on sequences. However, finding patterns in modalities, such as visual data, may require algorithms working with higher-dimensional data. These algorithms may perhaps work on the raw data of the documents or on sequences (in whatever dimensionality) that represent the documents. Even when we can extract significant patterns, we still need to adjust our current search methods to be able to use this information. Marcus Butavicius (DSTO)A psychological approach to dealing with large corpora AbstractWhile there are many software tools available for information search, less attention has been paid to the human aspects of search behaviour and how to model human representations of document spaces. In this talk, we will present an overview of research (previous, current and planned) that focuses on the psychological aspects of navigating and searching large document sets. The work draws on elements of cognitive science and in particular visual perception to examine fundamental aspects of software design. Specifically, the research to be discussed includes examination of the psychological aspects of:
Our approach is based on two tenets. First, that the design of software tools needs to be sympathetic to cognitive and perceptual principles. Secondly, assessment of the usefulness of these tools requires empirical evaluation using an empirical psychological framework. Research InterestsHuman visual perception and decision-making particularly with respect to software tools. Jose Lay (CSIRO)Content-based retrieval of Lecture Videos AbstractThis talk presents our work in content-based retrieval for presentation videos. More specifically, we consider a multimodal approach for indexing lecture videos. In operation, a presentation video is partitioned into a sum of more manageable segments; these segments are then subject to alignment with slides from the associated presentation files. Each registered segment is then treated as a document and indexed by using text extracted from the corresponding presentation slides and speech transcript. Research InterestsJose's work focuses on video search and delivery. In particular, he is interested in the use of concept-languages to tackle the semantic gap problem. A concept-language can be formally explicated into a lexicon of elemental concepts and a concept grammar. Non-verbal information in images and video documents can then be indexed by using elemental concepts, while queries are dealt with by post-indexing coordination of the elemental concepts and the concept grammar. James Hogan, Asgeir Frimannsson (QUT)Translation Re-Use and Cross Lingual Search [Contact James Hogan for presentation slides] AbstractTranslation re-use relies upon maintenance of a Translation Memory (TM), a collection of previously translated segments which can be mined at a phrase or sentence level in new translation projects, significantly reducing the load and associated cost of translation. Existing translation memory systems work on domain-specific content, and reuse is usually based on simple string matching. This project is concerned with improving search performance based on document and domain context, thereby enhancing TM re-use and bridging to more general cross lingual search. Our approach is based not only on direct phrase similarity, but on a document-level context match between a corpus of existing translated content and the content being translated. Some preliminary success has been achieved in the software messaging domain using probabilistic topic models (notably Latent Dirichlet Allocation) and we are evaluating these methods against more traditional IR and discriminative approaches to context identification. Our long-term aim is to apply these approaches to mined multilingual web-content, enabling large-scale cross-domain matching and translation reuse. Research InterestsJim Hogan's research interests centre on machine learning and its application in biological and cognitive domains, with a current focus on bacterial genomics and NLP (and supporting standards) for software and content internationalisation. Robert McArthur (CSIRO)Open source datasets of human communication AbstractOpen source datasets of human communication are vital sources, both now and in the future, for search - mailing lists, blogs and email. Such data is often full of ungrammatical and misspelt text. Long questions, when available to search, often also have these unfortunate features. How can information retrieval, with the help of linguistic and cognitive knowledge, deal with these problems and retrieve appropriate results (often in the context of exploratory search)? If a person can understand the text, how can we get closer to effective automatic systems acting correctly? Examples of such queries, from real life, can be investigated using socio-cognitively motivated techniques for extracting knowledge. Research InterestsRobert has been working in the conjoint set of the areas of cognitive science, knowledge management, text mining and philosophy. Specifically, this has meant bringing together for the first time theoretical work on the socio-cognitive representation of meaning in people's communication, with specific cognitive algorithms that have been proven, in closed experiments, to represent meaning. The ideas have been successfully tested in a number of domains such as health-related mailing lists, intelligence-related and corporate emails, and post-hoc fraud analysis. Robert is interested in socio-cognitively motivated text mining, semantic spaces, and online communities. He is also interested in analysis of email, blogs and mailing lists in novel ways. Annie Lau, Enrico Coiera (UNSW)Cognitive biases during information searching AbstractDecisions are improved by better access to relevant information, and searching for documents on the Web is increasingly an important source of that information. However, decision making research has for a long time identified that people experience cognitive biases and that these biases can have adverse impact on their decision outcomes. In this presentation, we will be presenting our investigation which suggests that people can experience cognitive biases while searching for information. Biases investigated are anchoring, order, exposure and reinforcement. Our results show that people can experience significant anchoring, order and exposure effects while searching for information and these biases may influence the quality of decision making during and after the use of information retrieval systems Research InterestsAnnie Lau is a Postdoctoral Research Fellow at the Centre for Health Informatics, University of New South Wales. Her research interests lie in modelling and designing strategies that enhance human cognitive performance for information retrieval and decision support technologies. For her doctoral research, she conducted possibly the first study that provides evidence that people can experience cognitive biases while searching for information. She also designed and trialled a series of interventions on the search user interface to modify the impact of these biases during search. Annie is currently working in the decision support research stream at CHI, using machine learning and knowledge acquisition methods to learn and model strategies for searching and extracting online information. Professor Enrico Coiera is the Foundation Chair in Medical Informatics within the Faculty of Medicine at the University of New South Wales (UNSW). He is also Director of the Centre for Health Informatics, and an Adjunct Professor in Computer Science and Engineering at UNSW. Enrico's research interests lie in developing advanced knowledge management and decision support tools to assist with clinical decision making, including advanced search engine technologies, as well as models of human-computer interaction to support the rational, appropriate and sustainable development of technological interventions in health care. Grace Chung, Enrico Coiera (UNSW)Text Mining For Clinical Decision Making in Healthcare AbstractMedical practitioners are increasingly applying evidence-based medicine (EBM) to support decision-making in patient treatments. The aim of EBM is to provide improved healthcare through locating evidence for a clinical problem, evaluating the quality of the evidence, and then applying to a current problem at hand. However, the adoption of EBM is hampered by an overwhelming amount of available information. Scientific evidence is mostly documented in free unstructured textual descriptions within randomized controlled trials (RCTs), cohort studies and case-control studies, and clinicians lack both the time and skills to locate and synthesize the best and most appropriate evidence. To alleviate the information overload, resources such as the Cochrane Collaboration and Clinical Evidence (British Medical Journal) have employed human expert authors to systematically review and summarize knowledge documented within RCTs through extensive searches and critical assessments. A key question is whether language technologies can alleviate the intensive efforts required in the exhaustive searches and subsequent appraisals of the caliber of study methodologies, and the apparent strength of evidence based on experimental observations. The Centre for Health Informatics is now addressing the use of natural language processing technologies to help clinicians sift through vast amounts of available information on healthcare interventions. We aim to identify and extract from the published literature the entities and relations that form the critical pieces of evidence sought by a clinician. In this talk, we will examine how natural language technologies could be used in applications geared towards supporting the clinician's need for decision making about the appropriateness and effectiveness of treatments. We will present preliminary work on processing abstracts of randomized control trials. Research InterestsDr Grace Chung is a Senior Research Fellow at the Centre for Health Informatics (CHI), University of New South Wales. Grace's current research interests concern the development of natural language processing techniques that support information extraction, information fusion and knowledge discovery in the biomedical domain. She is particularly interested in the development of new methods and tools that help bioscience researchers and clinicians search for, and synthesize scientific evidence from textual content towards more effective decision making and hypothesis formulation. Grace has also had extensive experience in developing conversational agents that allow natural spoken human-computer communication for accessing and navigating online information. Professor Enrico Coiera is the Foundation Chair in Medical Informatics within the Faculty of Medicine at the University of New South Wales (UNSW). He is also Director of the Centre for Health Informatics (CHI), and an Adjunct Professor in Computer Science and Engineering at UNSW. Enrico's research interests lie in developing advanced knowledge management and decision support tools to assist with clinical decision making, including advanced search engine technologies, as well as models of human-computer interaction to support the rational, appropriate and sustainable development of technological interventions in health care. Wei Wang (UNSW)SPARK: top-k keyword search on relational databases AbstractA large portion of textual information is stored in relational databases. Typical examples include commercial systems like Customer Relation Management Systems (CRM) and personal/social applications like Web blogs and wiki sites. Consequently, there is a demand for retrieving rele- vant information from databases using the keyword search interface. Due to the normalisation and the inherent connections among tuples in relational tables, traditional IR-style ranking and query evaluation methods cannot be directly applied in this context. In this workshop, we present our recent results on improving the effectiveness and the efficiency issues of systems that answer top-k keyword query on top of relational databases. We propose a new ranking formula by adapting existing IR techniques on a natural notion of $B!Hvirtual document$B!I. Compared with previous approaches, our new ranking method is simple yet effectiveness, and agrees with the human judgement better. We also study efficient query processing methods based on the new ranking method, and propose highly efficient retrieval algorithms. We have conducted extensive experiments on large-scale real datasets (bibliographic data and movie data). The experimental results demonstrate significant improvement to the alternative approaches in terms of retrieval effectiveness and efficiency. Research InterestsMy research interests, relevant to the workshop, include integration of database and information retrieval technologies (DB + IR), XML information retrieval, data cleaning, and data mining. 10-minute studentsAndrew Lampert (CSIRO / Macquarie University)Managing Obligations and Commitments in Email AbstractThe volume of textual conversation, including email, web forum discussions, and instant messaging is growing rapidly. This conversational data differs from written documents. It involves interaction between multiple participants, and its structure includes patterns borrowed from verbal conversation. Despite these differences, many search engines and email systems treat textual conversation as simple bags-of-words. Conversational structure is not exploited when searching, navigating or summarising. Our vision is to provide intelligent, automated assistance to email users. We are developing tools to identify actionable obligations and commitments, drawing on ideas from Speech Act Theory, to assist with the task of email triage in the workplace. Towards this goal, we present findings of an initial experiment building a statistical speech act classifier, using the Verbal Response Modes (VRM) taxonomy of speech acts. Research InterestsEmail, Conversation Analysis, User Modelling, Speech Act Theory, Summarisation, Human-Computer Interaction Luiz Augusto Pizzato (Macquarie University)Information Retrieval for Question Answering AbstractMost information retrieval strategies work on the assumption that results should be delivered directly to the end user who with a brief examination will have the ability to discern between relevant and irrelevant results. My PhD research focuses in building a model that allows IR results to be more relevant for the specific task of question answering. In such a system, receiving documents containing extractable answers is likely to be better than receiving documents that just deal with the relevant subject. To approach this problem I have implemented a set of different strategies, one regarding a feedback mechanism and another using a simplified semantic markup. In my talk I will briefly describe both approaches and some of their latest results. Research InterestsMy research is closely related to the workshop theme, therefore I am interested in every aspect of the information retrieval field that will be discussed at the workshop. In particular, I am interested in how researchers are currently using natural language processing techniques to enrich search technology. Su Nam Kim (University of Melbourne)Multiword Expressions AbstractMultiword Expressions (MWEs) have been studied to provide lexical knowledge for NLP applications. Particularly, semantic relations (SRs) in compound nouns (CNs) have been suggested for machine translation (MT) and question-answering (QA) which is a particular type of IR. SRs in compound nouns are the representation to combine modifier(s) and head noun in CNs. For example, "Fuji apple" has SR, LOCATION which modifier, "Fuji" is location of head noun, "apple". On the other hand, the CN, "morning apple" has different SR, TIME which modifier is time of head noun. Despite similar constituents in NCs, they have different SRs so that these relations play an important role to find out the answers for QA or sentence type of queries in IR. With same examples, the question, "where is this apple from?" answers location, "Fuji". In conclusion, studying to interpret these semantic differences in CNs could provide reliable knowledge to answer the questions in QA and IR. Research InterestsMy interest in Natural Language Processing is to acquire syntactic and semantic lexical knowledge for applications. Not only these lexical information are interested to understand languages, but they are also critical to resolve the problems in NLP applications. My particular interest is lexical semantics on Multiword expressions(MWEs). Related to search engine, modeling syntax and semantics of MWEs provides the method to handle lexical items in a certain semantic degree. Also, the compositionality of MWEs can be used for generating rich queries in IR and QA. Sukanya Manna (ANU)A Fuzzy Relational Model to Calculate the Relatedness between Words AbstractInformation Retrieval is still at its growing phase for finding semantic and structural analysis of texts. There is much research in this field exploiting textual information. One of the basic tools for these kinds of analysis is the frequency measure of keywords. But a major problems in finding the relatedness of words based only on the frequency of occurrences of words is that this leads to extraction of irrelevant relations. Hence, we present an approach of modularization of a text to find a better way of relating words. The text used here is considered to be composed of modules with their own relatedness. We fuzzified the contents of each paragraph and calculated the probability of this fuzzy set to have a better insight into the textual semantics. This approach is useful in finding the relations between concepts when there is no pre-existing patterns or idea available to proceed. This can further be improved when hierarchical concepts of document structure are embedded in this structure. Research InterestsIt has been since last few years, I started working in text mining related fields. Initially I focussed on data compression models, along with different pre-processing algorithms (like star encoding). I applied them on Indian languages to find out how far standard algorithms can work on them at dictionary levels. Then I gradually shifted towards co-citation indexing with biological data and evolutionary computational models. I was successful in obtaining some interesting relations between enzymes with respect to web documents. Finally, started my PhD explicitly in the Information Retrieval field where I am using fuzzy techniques to find relatedness between different contexts, when there is no pre-existing information available. For certain events like terrorist attacks, no predefined pattern can be found. Predictive data mining approaches even did not turn out to be fruitful in this case and hence, I aim to overcome this drawback and find some technique more suitable for these kind of cases. Tom Rowlands (ANU / CSIRO)Using Folksonomies to Improve Search AbstractTraditional information retrieval systems work primarily on document content. Web search engines have, however, achieved major advances using data external to the target document to determine both relevance to a particular query (e.g. anchortext) and generally (e.g. indegree, PageRank). Numerous other examples of this external evidence are available, such as which documents users click on and what pages users bookmark. One such example is folksonomy tags. The presentation will give a (very) brief overview of the tags used on a bookmarking system for academics and initial investigation in their use in information retrieval. Research InterestsCurrently, information retrieval and in particular external evidence and evidence combination. Outside of this area, human computer interaction and operating systems. |