Kaya de Barbaro Department of Psychology University of Texas at Austin kaya@austin.utexas.edu website |
Priyanka Khante Department of Electrical and Computer Engineering University of Texas at Austin priyanka.khante@utexas.edu |
Participants: | 14 |
Recordings: | 5-second recordings |
Type of Study: | naturalistic |
Location: | Austin TX, USA |
Media type: | audio |
DOI: | doi:10.21415/GH60-SK83 |
Khante, P., Thomaz, E., & de Barbaro, K. Auditory Chaos Classification in Real-world Environments. (Revise & resubmit). Frontiers in Digital Health: Special Issue on Artificial Intelligence for Child Health and Wellbeing (2023).
In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.
This dataset was created to develop a model to detect and classify four levels of auditory household chaos (0-3) in naturalistic environments. The dataset consists of audio segments from LENA recordings from 14 children: 13 infants with age ranging from 1.1 to 6.4 months old and 1 toddler of 33 months. Parents were instructed to place the LENA in a vest worn by the child and record up to 72 hours total of audio data in their home, including two weeknights and a weekend.
A team of trained research assistants annotated 5 second audio segments sampled from the daylong audio recordings according to best practices in behavioral sciences (inter-rater reliability kappa score: 0.76). We define the auditory chaos levels based on descriptions of chaotic environments in the developmental psychology literature, specifically, using the gold-standard CHAOS questionnaire measures that are most commonly used to assess household chaos. Periods of silence and sounds that are low in volume or contain only a single source of sound are classified as relatively low auditory chaos (Chaos 0 or 1, respectively). Time periods with sounds that are high in volume, potentially jarring, or cacophonous in nature are classified as high in auditory chaos (Chaos 3). The complete auditory chaos annotation scheme will be found in the Supplementary Materials of the associated publication (Khante et al, revise & resubmit; see Citations section). The shared dataset totals 39.4h of labelled data (28364 5s segments) with 7.1h of no chaos (0), 10.3h of low chaos (1), 12.7h of medium chaos (2) and 9.3h of high chaos (3).
All children in this dataset are from Austin, Texas. The sample has a range of family annual incomes (n=1 under $25,000, n=2 $25,000-$49,999, n=4 $50,000-$74,999, n=3 $100,000-$124,999, n=4, $125,000 and above) and above-average maternal education levels (n=1 high school or less, n=6 some college, n=3 college, n=4 graduate degree or higher). The majority of the caregiver participants are married (n=12) with 2 living with a partner without marriage. Child race is predominantly White (n=8), then multiracial (n=2) and Hispanic/Latinx (n=4). All children heard majority English at home and had no known vision or hearing issues at birth. These data were collected in participants' homes by the University of Texas at Austin where the data continues to be analyzed.
Further details of the project are available on our lab website. Please contact Kaya de Barbaro directly to discuss further aspects of the sample design, annotation, and analysis (kaya@austin.utexas.edu).