The Chinese/Japanese-Australian Oral Corpus of ESL


Key people and contact

Satomi Kawaguchi, Bruno Di Biase,  Mary Ma @ Western Sydney University

Ma Xiaomei, Xu Mei; Gao Ling @ Xi'an Jiaotong University

Technical officer: Shednakie Yi @WSU


The aim of collecting a spoken learner corpus of English as a second language is to create an instrument for understanding the developmental course of English in Chinese and Japanese learners including particular difficulties in oral performance which may be specific to Chinese and or Japanese speakers or to particular groups of learners

This research instrument is intended as a contribution towards the improvement of tertiary students' learning and teaching  of English as a second language, including the creation of assessment, diagnostic and self-evaluation tools

Design features:

  • Cross-sectional
  • Oral data elicitation via tasks


  • Chinese (Mandarin) L1; Tertiary students (Xi'an Jiaotong, WSU & other Australian tertiary institutions), UWS can do Chinese students learning abroad (10-15 students)
  • Mainly cross-sectional but with some longitudinal it would be better
  • Japanese L1; Others (NS controls, Spanish L1) 50-60? Female/Male

Mandarin, Cantonese, Other

Undergraduate (Year 1, 2, 3, 4)

Graduate students

Study abroad


  • Recordings
  • High Quality digital recording (Technical details) (not MP3)
  • Standard recording  condition: Recording laboratory, as  sound proof as possible
  • digitized with a sampling rate of 96 kHz (24bits), at least 33MB per minute
  • Camera  (one or two) Video focusing on task, not faces – identity of learners should not be revealed.
  • Data: Oral
  • In each interview, The participant is voluntary and  knows that they are

being recorded. That may initially affect oral fluency

The volunteer will sign a consent form, which must be archived. We will not be able to doresearch formally if we do not have written EVIDENCE of consent

  • The length of each recording is about 20-25 minutes
  • Recordings must not be manipulated. Some basic editing may need to be made to amplify the volume, or to eliminate chunks with excessive noise. Personal information must be treated with extreme caution: delete or obscure inappropriate or irrelevant personal information that may lead to the identification of the informant.
  • Data: non-oral
  • Lexical size test (40 minutes)
  • Some other  test results as reference
  • Recording sessions: Interview and tasks (maybe two recording sessions)
  • Self-paced tasks

            The interview is set up as a semi-spontaneous dialogue between the researcher and the learner. Some tasks require dyads (two learners).

            Learner's introduction. Meet your partner (establish themes for questions)

            A story-retelling task with pictures (better a short video)

            Spot the differences,

            Role play, canvassing opinion about uncontroversial issues Etc  (this will depend partly on the kind of structures we wish to elicit covering various genres, structural groups  and Speech acts

Time-constrained tasks

Fish film –

dmdx – slides

Oral translation  task 

  • Transcription conventions
  • They must be uniform – we will be hopefully uploading videos and recordings (synchronised) in

ELAN – Linguistic Annotator

Created by the Max Plank Institute.

  • Word counts and frequency lists
  • Compleat lexical tutor
  • Ant conc
  • Ant Profiler
  • Activities: who will be doing what

Contacts: Satomi Kawaguchi, Bruno Di Biase, Xiaomei Ma, Xu Mei etc.

Please no publicity around until we have something formal on board

  • Lexical size task (not oral, multiple choice)
  • Compleat Lexical tutor
  • Communicative tasks
  • Profiling tasks. These are used to profile the learner's grammar: eg meet your partner, interview, spot the differences, story telling
  • Self-paced tasks.

            These are used to explore some area(s) of grammar at the pace of the learner. They may involve the researcher with one learner or with two

  • Time-limited tasks. These are used to assess the need for further explanation, further practice in a specific structural area
  • Communicative tasks: Interview
    (bio-data, tense, 3rd PS –s, etc)
  • Spot the differences
  • Learners dyad (or researcher learner) facing each other so that neither sees the other's picture.

Researcher says that the two pictures are similar but there are some differences.

  • You ask the learner(s) to describe the picture and then ask question from each other (in alternation, one each) to find 5 differences)
  • Picture 1A is simpler than 1b

Researcher gives the more complex one to the learner (or the the learner who may be weaker of the two)

  • Eliciting a specific structure:
    e.g., Causative constructions
  • Active-passive alternation Tasks
  • Fishfilm (Time-constrained event description task)