Computational Agents in Multimodal Multiparty Interaction (CAMMI)

This proposed proposal is modelled on existing projects in Europe and the US such as AMI (Augmented Multiparty Interaction http://www.amiproject.org/) and CHILL (Computers in the Human Interaction Loop http://chil.server.de/servlet/is/101/) and the current focus of work in multi-modal language processing in the meeting room context. The overall aim of the project would be to develop computational agents that could monitor and take part in meetings of two or more people. These agents would provide services during the meeting such as recording the topics discussed (automatic minute taking), retrieving information on request, retrieving information based on topics being discussed, interjecting at appropriate times, recalling previous meetings.

There is obviously a lot of scope for inter-disciplinary research in this context; the hope is that the meeting room context will bring together researchers from different areas to work on a common problem of interest but also that it will provide raw data for a variety of areas of research. Examples might be:

  • Speech technology: at the core of the meeting room system is the need to recognise speech and track speakers based on their speech patterns. The project needs a strong speech recognition group to underpin the work of others interested in the lexical content of the meeting.
  • Multi-modal input analysis: the meeting room can involve audio, video, position, handwriting and possibly other modes of interaction. All can be analysed and used together to enhance the information available to later stages of analysis.
  • Dialogue analysis: we need to understand the dialogue in order to be able to take part in it. There is potential here to provide a platform for research in human-human and human machine dialogue where hypotheses can be tested in a working system.
  • Agents: here is a fertile application area for agent research. How does the meeting room agent interact with other agents (eg. on participants handheld devices) to facilitate the meeting? How does the agent represent and reason about the rhetorical structure of the dialogue going on in the room?
  • Language Generation: can we generate automatic summaries of meetings? How does an agent respond to questions, present the results of information retrieval?
  • Information Retrieval: can we find relevant parts of meeting archives based on what's going on now in the meeting? How do you search a meeting archive? How do you cope with the inevitable speech recognition errors?
  • Emotion analysis: can we recognise emotional states in participants from audio and visual cues? Is it useful to do so, how should that affect the responses of the agent in the room?

I've probably missed many potential areas of investigation which might be relevant to this overall problem area. Cognitive Science and Linguistics would seem to have some potential here but I can only enumerate things I know about.

If you are interested in this project or something close to it, please get in touch with Steve Cassidy:

Steve.Cassidy@mq.edu.au
02 9850 9581
http://www.ics.mq.edu.au/~cassidy