Multisensory Communication

MSC Multisensory Prediction

The Multisensory Communication program at The MARCS Institute investigates how information from our different senses contributes to communication.

We study how information from the senses is combined, represented and acted upon. We investigate such things as how seeing a talker helps make speech perception more robust.

We also study how speech and gestures are related; how emotion is conveyed by face, voice and gesture; and how information processing changes across the lifespan. We investigate processing at the signal, brain and behavioural levels.

More information on specific research themes can be found below.

Person recognition

In this project area, we examine the signals that contribute to a person's identity using motion capture techniques along with measures of the person's voice. The aim is to understand which signals best specify identity and how different signals combine to influence the perception of identity.


  1. Dynamic Identity Signatures: This project exams the use of multisensory data: Biometric Identification (BI) for recognition or verification. The collection of motion data for speech and gesture and the correlation between signals (speech and motion) will establish the viability of multisensory BI in terms of performance (identification accuracy, speed, robustness, resource requirements); acceptability (the extent to which the measurement of biometric identifiers is acceptable) and circumvention (whether the system can be fooled by fraudulent methods)
  2. Smart computing application.

Auditory-Visual speech in multi-person environments

Auditory-Visual speech in multi-person environments

When the interlocutor's face (visual speech) can be seen, speech perception occurs based on visual as well as auditory information. Auditory-visual speech processing has been shown to lead to better perception of speech.

In this project, we examine the extent to which talker's faces can influence speech processing both in central vision and in the visual periphery.

This project is part of a broader program that examines speech processing in situations that go beyond those typically used in lab-based experiments.

Visual prosody and gestures

Visual prosody and gestures

Variations to speech that both support and modify the meaning of utterances (prosody) contain both linguistic and non-linguistic information.

In this project, we investigate the interaction of these two types of information; particularly how visible gestures shape interpretation.

Robust recognition

We aim to determine how visual speech perception develops and how multisensory processing may promote language acquisition. A further aim is to determine how people adjust speech to cope with changing transmission environments.

Multisensory signals provide both redundant and complementary information. A feature of this project is the exploration of how the auditory and visual speech signals change and reinforce each other according to the situation of the talker, listener and communicative context and in tonal languages.

The research program is structured by:

  • quantifying properties of the AV signal;
  • determining neural responses;
  • measuring behavioural responses and developing applications.


  1. Visual speech facilitation in the Deaf
  2. Multisensory interactions in assisted hearing
  3. Visual cues to tone perception
  4. Scaffolding the development of communication in infants
  5. Second Language Learning: observing visible speech movements aids in speech perception and the learning of foreign sounds.

Exaggeration techniques developed here may promote more accurate pronunciation and recollection of foreign speech and assist in the treatment of speech and language learning disorders (aphasia and stuttering).

Multisensory prediction

In this project we investigate how information derived from different sources (that have characteristic timing profiles), may enable common events to be signalled. We aim to incorporate findings in a formal model of how cognition uses stored knowledge for perception.

This is the most theory rich of the projects. The aim is to provide a strong theoretical base for modelling multisensory interactions.

We aim to uncover, possibly through Bayesian modelling, the role of prediction and expectation in auditory-visual processing and how these processes interact with stored knowledge.

Traditional research has focused on bottom-up, feed-forward, inductive mechanisms, analysis by synthesis as a heuristic model emphasizes a balance of bottom-up and knowledge-driven, top-down, predictive steps.