Validation Corpus


Mark VanDam
Department of Speech and Hearing Sciences
Washington State University

https:/labs.wsu.edu/vandam/

Participants: 52
Recordings: 9360
Type of Study: naturalistic
Location: USA
Media type: audio
DOI:

Media folder

Citation information

VanDam, M., & Silbert, N. H. (2016). Fidelity of automatic speech processing for adult and child talker classifications. PLOS ONE, 11(8): e0160588. doi:10.1371/journal.pone.0160588

VanDam, M., De Palma, P., & Silbert, N. (May, 2016). Fidelity of Automatically Coded Family Speech of Mothers, Fathers, and 30 month-old Children with and without Hearing Loss. Talk presented at the paper symposium Studying Language Development Through Human and Automated Annotation of Infantsí Natural Auditory Environments at the 2016 International Conference on Infant Studies (ICIS), New Orleans, LA.

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Project Description

These data are raw WAV audio files collected from daylong audio recordings using the LENA recorder. These are segments of daylong audio files that the LENA automated system identified as 'adult female' ("FAN"), 'adult male' ("MAN"), or 'target child' ("CHN"). In this dataset, all the recordings are from families with a toddler with documented mild- to severe-hearing loss constituting the hard-of-hearing (HH) group or from a (control) family with a child who is typically-developing (TD).

There are 2340 total audio stimulus files in each of four groups/directories, totaling 368.0 minutes (6.133 hours) of audio. Each group contains 30 segments (collected from equal intervals throughout the day) from each of the three talker labels for 26 families (30 x 3 x 26 = 2340). There is one set of files from the HH group (age of target child: M=30.8 mos, range=25-38 mos, 93.6 minutes), and three sets for the same group of TD families varying by the age of the target child ("younger" M=28.0 mos, range=23-35 mos, 90.4 minutes; "middle" M=29.4 mos, range=25-35 mos, 90.0 minutes; "older" M=40.0 mos, range=34-46 mos, 93.9 minutes).

These data were used as stimuli in perceptual experiments to assess the quality of automatic speech encoding using the LENA system.

There were 30 segments collected from equal intervals throughout the daylong recording. For example, if there were 300 MAN segments marked by the software, every 10th segment would be collected to obtain 30 segments. Each segment is sequentially ordered as it was collected.

There are no duplicate file names across all directories; that is, there are 2340x4=9630 unique file names. Naming convention for all files: talker label ('c'=child, 'm'=adult male, 'f'=adult female; alpha=1), ordinal value of recording from full day (range=01-30; num=2), subject code (alpha=2), age of target child in months at time of recording (num=2), hearing status ('hh', 'td'; alpha=2). Example, file "c02MT26td.wav" is machine-labeled as "child", the second utterance of 30 collected for that day, subject identifier 'MT', target child is 26 months of age at the time of the recording, and the child is a typically-developing child.

Supported by the following: NSF/SBE-RIDIR: 1539133, 1539129, 1539010; NIH/NIDCD: R01DC009569, DC009560-01S1; WSU Seed Grant: 124172-001; Washington Research Foundation

Usage Restrictions: Email Mark VanDam to discuss how you plan to use these data and for citation requirements. Please advise Mark VanDam (mark.vandam@wsu.edu) of publications/presentations to append to the list of referring scientific work below.