HomeBank Tseltal Casillas Corpus

Marisa Casillas
Max-Planck Institut, Nijmegen


Penelope Brown
Max-Planck Institut, Nijmegen


Stephen Levinson
Max-Planck Institut, Nijmegen


Participants: 54
Recordings: 54
Type of Study: naturalistic
Location: Chiapas
Media type: video
DOI: doi:10.21415/T51X12

Browsable transcripts

Download CHAT transcripts, ITS files, ACLEW annotations, and metadata

Media folder

Citation Information

Casillas, M., Brown, P., & Levinson, S. C. (2017). Casillas HomeBank Corpus. doi:10.21415/T51X12

In accordance with TalkBank rules, any use of data from this corpus must be accompanied by at least one of the above references.

Corpus Description

This collection was made over the course of multiple fieldtrips, beginning 2015 and planned until 2018, to a Tseltal Mayan village (Tenejapa, Chiapas, Mexico; 2015) and a cluster of hamlets on a remote Island in Papua New Guinea (Rossel Island, PNG; 2016). Right now, only the Tseltal Mayan data are available on HomeBank but the rest of the data will be added in time.

In each site, ~55 children between 0;0 and 4;0 were recorded for an average of 9–10 hours of a typical day. In a few cases, children were older than 4;0, but this was only discovered after the recording had started. Recordings were made "at home" for all children. Because we recorded more than one child on most days and because we often recorded more than one child within a single family/cluser of homes, the target children for a single day may sometimes interacted with each other.

Because children in subsistence farming communities often help with work around the house and field, children often appear in multiple locations throughout the day, and in changing company. Many children in both communities grow up in multi-genertional households with 6+ other people present, including many other children. The availability of particular speakers varies with the day's routines.

Equipment and data formats

Children who were already walking on their own wore an elastic vest equipped with an Olympus audio recorder (either WS-832 or WS-835) and a small camera (Narrative Clip 1) that took photos a few times per minute. There was also a fisheye lens attached to the camera that gave a nearly 180-degree view of the frontal environment during the recording. Infants who were not yet walking wore their audio recorder in a chest pocket added to a onesie while their primary caregiver wore an adult-sized elastic vest onto which the cameras were attached. The vest was equipped with a camera-covering flap that provided participating families with visual privacy when desired (seen as a floral or polka dot pattern that takes over the entire image).

The resulting audio and photos are in stereo MP3 and JPG format, which have also been combined into an MP4 video file. Note that the audio and video files are named for the date of recording, the photos are named with the timestamp of the capture time.

Participant information

Participating families were asked for the target child, mother, father, and siblings’ dates of birth. Unless reported otherwise, the families in these recordings are typically subsistence famers. We also tracked mother and father education and native language(s). Education is recorded as last completed school level: none, primary, secondary, preparatory, or university (in ascending order). Note that education here should not be taken as SES in a Western sense but is instead useful as a measure of westernization.

A word of warning: Users of this data should take dates of birth and reports of typical hearing/vision/development with a pinch of salt. We asked about these attributes before each recording, but formal medical interactions are limited in both sites. In some cases, parents were unsure about their date of birth, in which the date listed in the metadata is an estimate. These cases are marked with an asterisk.

Tseltal Mayan sub-corpus

In this community, children typically monolingually acquire Tseltal until they enter school at age 5 or 6. The Western lingua franca of the region is Spanish, which is sometimes heard on the recordings. This sub-corpus contains 58 recordings from 55 children between age 00;01.25 and 04;04.15, with partial data missing for a few recordings. All children appeared to be typically developing, with no reports of language delay or hearing or vision problems, though sporadic reports of malnutrition (unfortunately this last factor was not tracked individually for the participating families). All children were acquiring Tseltal as their only/primary language. We tried to make two recordings per day on recording days. The camera was set to take photos every 30 seconds, but this varies somewhat in reality. The display duration for each image in the videos is therefore limited to 30 seconds.


We owe enormous thanks to Humbertina (Beti) Gomez Perez and Rhonda (Taakeme) Namono played in aiding in the recruitment of participants and collection of data. We also gratefully acknowledge the Digiteam (Jeroen Geerts and Nick Wood) and Elan developer Han Sloetjes at the Max Planck Institute for Psycholinguistics, who have played an integral role in processing these media files. The video-weaving software and much other technical expertise came from the support of Shawn Cameron Tice. Last, but not least, this project was funded by an ERC Advanced Grant (269484) to Stephen C. Levinson and a Veni Innovational Scheme grant (275-89-033) to Marisa Casillas.

Usage Restrictions

I would like to be notified before the data are used; a short paragraph about why these data are needed for analysis would be appreciated. All metadata except the following should be kept private: participant gender, age at recording(s), number of older siblings, and language input.

Note that these data are in a special subsection of HomeBank for sensitive datasets. You can request access by filling out this application

This project also contains some human-generated annotations, created as part of the ACLEW project, which are found within the 0aclew folder. Please see this page for additional requirements related to use of those annotations.