Resource Description

This dataset is a collection of ELAN (eaf) annotation files. Annotations were created using the ACLEW Annotation System, as part of the Analyzing Child Language Experiences Around the World (ACLEW) project whose homepage is here . That project included selected recordings from several HomeBank Corpora, the annotations for which are included in this ACLEW HomeBank Annotations dataset: Casillas, McDivitt, Warlaumont, and Winnipeg.

Due to datasharing or other restrictions on some of the ACLEW corpora, this dataset is a subset of the larger ACLEW dataset that includes transcriptions.

CHAT version

ACLEW transcription in ELAN uses a method that allows for easy conversion of the .eaf files to .cha format using the ELAN2CHAT program in CLAN. Once in CLAN, the ACLEW addressee codes are on a %xds line. The single-letter codes has these meanings:


In addition to the VanDam et al. (2017) HomeBank citation, products that have used these data should cite the Soderstrom et al. (2021) Collabra paper (see below) as well as at least one citation from each corpus used. Where space is unavoidably and extremely limited (e.g. brief conference proceedings) just the Collabra paper may be cited, although this is strongly discouraged as it does not include (and therefore denies credit to) important contributors to the individual corpora that make up the ACLEW MetaCorpus.

Primary Project Citation

Soderstrom, M., Casillas, M., Bergelson, E., Rosemberg, C. R., Alam, F., Warlaumont, A. S., & Bunce, J. P. (2021). Developing A Cross-Cultural Annotation System and MetaCorpus for Studying Infants’ Real World Language Experience. Collabra: Psychology, 7(1), 23445.

Casillas Corpus publications

Casillas, M., Brown, P., & Levinson, S. C. (2017). Casillas HomeBank Corpus. doi:10.21415/T51X12

McDivitt Corpus publications

McDivitt, K., & Soderstrom, M. (2016). McDivitt HomeBank Corpus. doi:10.21415/T5KK6G

Winnipeg Corpus publications

Soderstrom, M., Grauer, E., Dufault, B., & McDivitt, K. (2018). Influences of number of adults and adult: child ratios on the quantity of adult language input across childcare settings. First Language, 38(6), 563-581. https://doi.org/10.1177/0142723718785013

Soderstrom, M., & Wittebolle, K. (2013). When Do Caregivers Talk? The Influences of Activity and Time of Day on Caregiver Speech and Child Vocalizations in Two Childcare Environments. PloS one, 8(11), e80646.

Soderstrom, M. (2016). Winnipeg HomeBank Corpus. doi:21415/T58P6Q

Warlaumont Corpus publications

Ritwika, V. P. S., Pretzer, G. M., Mendoza, S., Shedd, C., Kello, C. T., Gopinathan, A., & Warlaumont, A. S. (2020). Exploratory dynamics of vocal foraging during infant-caregiver communication. Scientific Reports, 10, 10469. doi:10.1038/s41598-020-66778-0

Warlaumont, A. S., Pretzer, G. M., Mendoza, S. & Walle, E. A. (2016). Warlaumont HomeBank Corpus. doi:10.21415/T54S3C

Usage Restrictions

Data restrictions for the individual corpora and for HomeBank in general apply to these files in the special collection. In brief: 1) the authors request to be informed about data usage prior to submission for publication (email M_Soderstrom@umanitoba.ca) 2) The additional data access restrictions for the Casillas and McDivitt/Soderstrom files are in place 3) publication must take particular care not to reveal identifying information of particular participants 4) raw audio and eafs must not be publicly redistributed or presented unless this is explicitly allowed for the participant in question (check individual corpus restrictions or contact the corpus holder).


The ACLEW transcriptions are located in an ACLEW folder within the transcript folder for each corpus. Here are links that will take you directly to each:

Metadata and other materials that span all corpora are located here .