HomeBank Special Projects

HomeBank currently hosts five datasets from projects that are based on HomeBank data. The first four require HomeBank membership to access while the fifth is publicly available.

ACLEW Annotations

This dataset is a collection of ELAN (eaf) annotation files. Annotations were created using the ACLEW Annotation System, as part of the Analyzing Child Language Experiences Around the World (ACLEW) project whose homepage is here . That project included selected recordings from several HomeBank Corpora, the annotations for which are included in this ACLEW HomeBank Annotations dataset: Casillas, McDivitt, Warlaumont, and Winnipeg.

The Challenge dataset

This is the data that were used for the 2017 ComParE Challenge, Addressee Subchallenge. Although the official challenge is over, please feel welcome to try your hand at improving on the baseline model's performance. If you are able to do so, we'd like to hear from you! Please see the following paper for details:

Schüller, B., Steidl, S., Batliner, A., Bergelson, E., Krajewski, J., Janott, C., Amatuni, A., Casillas, M., Seidl, A., Soderstrom, M., Warlaumont, A. S., Hidalgo, G., Schnieder, S., Heiser, C., Hohenhorst, W., Herzog, M., Schmitt, M., Qian, K., Zhang, Y., Trigeorgis, G., Tzirakis, P., & Zafeiriou, S. (2017). The INTERSPEECH 2017 computational paralinguistics challenge: Addressee, cold, & snoring. INTERSPEECH 2017. doi: 10.21437/Interspeech.2017-43

The IDSLabel dataset

This dataset is the result of the project that generated the Challenge dataset as well as additional labels regarding adult speakers' genders. For more details please see these papers:

Casillas, M., Amatuni, A., Seidl, A., Soderstrom, M., Warlaumont, A., & Bergelson, E. (2017). What do babies hear? Analyses of child- and adult-directed speech. INTERSPEECH 2017. doi: 10.21437/Interspeech.2017-1409

Bergelson, E., Casillas, M., Soderstrom, M., Seidl, A., Warlaumont, A. S., & Amatuni, A. (in press). What do North American babies hear? A large-scale cross-corpus analysis. Developmental Science. A pdf is available here with supplementary materials available here .

The MendozaMusic dataset

This is a resource of everyday music in infancy. Infants between the ages of 6 and 12 months wore LENA recorders at home. Each of the 3988 clips from 28 families with infants contains a segment of music identified in larger recordings, with a 3 second buffer before and after the music.

The Validation dataset

This dataset includes 9360 short sound samples taken from 52 LENA recordings. These snippets have been used for validation of LENA diarization.

VanDam, M., & Silbert, N. H. (2016). Fidelity of automatic speech processing for adult and child talker classifications. PLOS ONE, 11(8): e0160588. doi:10.1371/journal.pone.0160588

If you have a project that uses HomeBank data that you would like to share on via HomeBank and have linked from this page, please email 4homebank@gmail.com