HomeBank Special Projects

HomeBank currently hosts four datasets from projects that are based on HomeBank data. The first three require HomeBank membership to access while the fourth is publicly available.

The Challenge dataset

This is the data that were used for the 2017 ComParE Challenge, Addressee Subchallenge. Although the official challenge is over, please feel welcome to try your hand at improving on the baseline model's performance. If you are able to do so, we'd like to hear from you! Please see the following paper for details:

Schüller, B., Steidl, S., Batliner, A., Bergelson, E., Krajewski, J., Janott, C., Amatuni, A., Casillas, M., Seidl, A., Soderstrom, M., Warlaumont, A. S., Hidalgo, G., Schnieder, S., Heiser, C., Hohenhorst, W., Herzog, M., Schmitt, M., Qian, K., Zhang, Y., Trigeorgis, G., Tzirakis, P., & Zafeiriou, S. (2017). The INTERSPEECH 2017 computational paralinguistics challenge: Addressee, cold, & snoring. INTERSPEECH 2017. doi: 10.21437/Interspeech.2017-43

The IDSLabel dataset

This dataset is the result of the project that generated the Challenge dataset as well as additional labels regarding adult speakers' genders. For more details please see these papers:

Casillas, M., Amatuni, A., Seidl, A., Soderstrom, M., Warlaumont, A., & Bergelson, E. (2017). What do babies hear? Analyses of child- and adult-directed speech. INTERSPEECH 2017. doi: 10.21437/Interspeech.2017-1409

Bergelson, E., Casillas, M., Soderstrom, M., Seidl, A., Warlaumont, A. S., & Amatuni, A. (in press). What do North American babies hear? A large-scale cross-corpus analysis. Developmental Science. A pdf is available here with supplementary materials available here .

The MendozaMusic dataset

This is a resource of everyday music in infancy. Infants between the ages of 6 and 12 months wore LENA recorders at home. Each of the 3988 clips from 28 families with infants contains a segment of music identified in larger recordings, with a 3 second buffer before and after the music. The full collection is available under password and a subset of the materials is in public access.

The Validation dataset

This dataset includes 9360 short sound samples taken from 52 LENA recordings. These snippets have been used for validation of LENA diarization.

VanDam, M., & Silbert, N. H. (2016). Fidelity of automatic speech processing for adult and child talker classifications. PLOS ONE, 11(8): e0160588. doi:10.1371/journal.pone.0160588

If you have a project that uses HomeBank data that you would like to share on via HomeBank and have linked from this page, please email 4homebank@gmail.com