HomeBank English Bergelson Seedlings Corpus

Elika Bergelson
Psychology and Neuroscience
Duke University


Participants: 44
Recordings: 87
Type of Study: naturalistic
Location: USA
Media type: audio
DOI: doi:10.21415/T5PK6D

Browsable transcripts

Download CHAT transcripts, ITS files, and metadata

Media folder

Citation Information

Bergelson E, Amatuni A, Dailey S, Koorathota S, Tor S. (2018) Day by day, hour by hour: Naturalistic language input to infants. Developmental Science, e12715. doi: 10.1111/desc.12715

Bergelson, Elika & N. Aslin, Richard. (2017). Nature and origins of the lexicon in 6-mo-olds. Proceedings of the National Academy of Sciences, 114, 201712966. doi: 10.1073/pnas.1712966114

Bergelson, Elika (2017). Bergelson Seedlings HomeBank Corpus. doi:10.21415/T5PK6D

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references. Where enough space is available, please cite both a paper and the corpus, to direct audiences both to the publication and to where they can locate the dataset. Where space is limited, please prioritize citation of one of the journal articles.

Corpus Description

The Study of Environmental Effects on Developing Linguistic Skills (SEEDLingS) is a dataset that investigates how infants' early linguistic and environmental input plays a role in their learning. We focus on understanding how babies learn words between 6-to 18-months old from the visual, social, and linguistic world around them. By looking at the complex environment that babies are exposed to, from their perspective, we can attempt to decode how the developing mind interprets and organizes the objects and words it faces. SEEDLingS is unique in that it combines well-controlled studies in the lab that assess what words infants know, with in-the-home audio and video recordings of what words infants hear, and what they see when they hear these words. The goal of this study was to assess infants' language growth over this time period, particularly in the word learning domain.

These were recordings generated in the home every month for a set of 44 infants, from 6-18 months of age (and some pilots). This corpus includes day-long home audio recordings, hour-long home video recordings, and in-lab eyetracking data. The audio recordings are shared here. Presently, only files from the 6 and 7 month visits are available. 8-17 month home visit data will be added at a future time. Please email Elika with any questions about access to other months' recordings and metadata. Videos and audios are also available to Databrary members here and here . The audio recordings are generated from one single LENA audio recording, converted and annotated in CHA format. Only sections of the files that have been verified to contain no extremely personal content by human listeners (or from which such info has been scrubbed) are shared here. Our personal information guidelines are described here.

Infants in this sample are from the Rochester, New York area. The sample is generally middle class, with a range of income and an above-average maternal education level. The sample is predominantly Caucasian. All infants heard majority English at home (>75%) and had no known vision or hearing issues at birth. These data were collected at the University of Rochester and will continue to be analyzed at Duke University starting summer 2016.

Further details of the project are available on our website, wiki, and github repo. Please contact Elika Bergelson directly to discuss further aspects of the sample design, annotation, and analysis (elika.bergelson@duke.edu).


This corpus was collected by Elika Bergelson with the unflagging help of Sharath Koorathota, Shaelise Tor, Shannon Dailey, Josh Schneider, and Andrei Amatuni. We would also like to acknowledge help and guidance from Richard Aslin and Holly Palmeri. Finally, the wonderful RAs who helped annotate the first pass through this data in Rochester are Bella Clemente, Tessa Eagle, Jayde Homer, Valerie Langlois, Dustyn Levenson, Ashwini Manjunatha, Sarah Markowitz, Leah Nason, Adina Poras, Alexandra Rickwood, Haley Weaver, and Sophie Werk.

Usage Restrictions

Please contact Elika Bergelson (elika.bergelson@duke.edu) before submitting publications based on these data, if not sooner.