HomeBank Tseltal Casillas Corpus
|
Marisa Casillas
Max-Planck Institut, Nijmegen
mcasillas@uchicago.edu
website
|
|
Penelope Brown
Max-Planck Institut, Nijmegen
middycasillas@gmail.com
website
|
|
Stephen Levinson
Max-Planck Institut, Nijmegen
Stephen.Levinson@mpi.nl
website
|
Participants: | 54 |
Recordings: | 54 |
Type of Study: | naturalistic |
Location: | Chiapas |
Media type: | video |
DOI: | doi:10.21415/T51X12 |
In accordance with TalkBank rules, any use of data from this corpus
must be accompanied by at least one of the above references.
Corpus Description
This collection was made over the course of multiple fieldtrips,
beginning 2015 and planned until 2018, to a Tseltal Mayan village
(Tenejapa, Chiapas, Mexico; 2015) and a cluster of hamlets on a remote
Island in Papua New Guinea (Rossel Island, PNG; 2016). Right now, only
the Tseltal Mayan data are available on HomeBank but the rest of the
data will be added in time.
In each site, ~55 children between 0;0 and 4;0 were recorded for an
average of 9–10 hours of a typical day. In a few cases, children were
older than 4;0, but this was only discovered after the recording had
started. Recordings were made "at home" for all children. Because we
recorded more than one child on most days and because we often recorded
more than one child within a single family/cluser of homes, the target
children for a single day may sometimes interacted with each other.
Because children in subsistence farming communities often help with
work around the house and field, children often appear in multiple
locations throughout the day, and in changing company. Many children in
both communities grow up in multi-genertional households with 6+ other
people present, including many other children. The availability of
particular speakers varies with the day's routines.
Equipment and data formats
Children who were already walking on their own wore an elastic vest
equipped with an Olympus audio recorder (either WS-832 or WS-835) and a
small camera (Narrative Clip 1) that took photos a few times per minute.
There was also a fisheye lens attached to the camera that gave a nearly
180-degree view of the frontal environment during the recording. Infants
who were not yet walking wore their audio recorder in a chest pocket
added to a onesie while their primary caregiver wore an adult-sized
elastic vest onto which the cameras were attached. The vest was equipped
with a camera-covering flap that provided participating families with
visual privacy when desired (seen as a floral or polka dot pattern that
takes over the entire image).
The resulting audio and photos are in stereo MP3 and JPG format,
which have also been combined into an MP4 video file. Note that the
audio and video files are named for the date of recording, the photos
are named with the timestamp of the capture time.
Participant information
Participating families were asked for the target child, mother, father,
and siblings’ dates of birth. Unless reported otherwise, the families in
these recordings are typically subsistence famers. We also tracked
mother and father education and native language(s). Education is
recorded as last completed school level: none, primary, secondary,
preparatory, or university (in ascending order). Note that education
here should not be taken as SES in a Western sense but is instead useful
as a measure of westernization.
A word of warning: Users of this data should take dates of birth and
reports of typical hearing/vision/development with a pinch of salt. We
asked about these attributes before each recording, but formal medical
interactions are limited in both sites. In some cases, parents were
unsure about their date of birth, in which the date listed in the
metadata is an estimate. These cases are marked with an asterisk.
Tseltal Mayan sub-corpus
In this community, children typically monolingually acquire Tseltal
until they enter school at age 5 or 6. The Western lingua franca of the
region is Spanish, which is sometimes heard on the recordings. This
sub-corpus contains 58 recordings from 55 children between age 00;01.25
and 04;04.15, with partial data missing for a few recordings. All
children appeared to be typically developing, with no reports of
language delay or hearing or vision problems, though sporadic reports of
malnutrition (unfortunately this last factor was not tracked
individually for the participating families). All children were
acquiring Tseltal as their only/primary language. We tried to make two recordings per day on recording
days. The camera was set to take photos every 30 seconds, but this
varies somewhat in reality. The display duration for each image in the
videos is therefore limited to 30 seconds.
Acknowledgements
We owe enormous thanks to Humbertina (Beti) Gomez Perez and Rhonda
(Taakeme) Namono played in aiding in the recruitment of participants and
collection of data. We also gratefully acknowledge the Digiteam (Jeroen
Geerts and Nick Wood) and Elan developer Han Sloetjes at the Max Planck
Institute for Psycholinguistics, who have played an integral role in
processing these media files. The video-weaving software and much other
technical expertise came from the support of Shawn Cameron Tice. Last,
but not least, this project was funded by an ERC Advanced Grant (269484)
to Stephen C. Levinson and a Veni Innovational Scheme grant (275-89-033)
to Marisa Casillas.
Usage Restrictions
I would like to be notified before the data are used; a short paragraph
about why these data are needed for analysis would be appreciated.
All metadata except the following should be kept private: participant
gender, age at recording(s), number of older siblings, and language input.
Note that these data are in a special subsection of HomeBank for
sensitive datasets. You can request access by filling out this application
This project also contains some human-generated annotations, created
as part of the ACLEW project, which are found within the 0aclew folder.
Please see this page for additional
requirements related to use of those annotations.