Review of Hirschberg and Nakatani 1996

2007 Sep 11 in Reviews | Comments (0)

Hirschberg, J. and Nakatani, C. (1996). A prosodic analysis of discourse segments in direction-giving monologues. In Proceedings of the 34th ACL.
online at Citeseer
online at Google Scholar

This article describes an analysis of annotation reliability for the hierarchical discourse segmentation of the Boston Directions Corpus. The Boston Directions Corpus is a set of direction-giving monologues collected by Hirschberg and Grosz. Speakers were prompted to tell another person how to accomplish 9 navigation tasks around Boston. Oral reading samples for the same tasks were also obtained by having the subjects return several weeks later to read aloud transcripts of their own monologues. The transcripts were annotated by linguists familiar with ToBI prosodic annotation conventions and Grosz and Sidner’s theory of discourse structure. In this study, the sub-corpus from one speaker is used to compare the reliability of annotation with versus without access to the audio corpus. Using raw agreement, the kappa coefficient and Flammia’s generalized kappa, the annotation with the audio corpus was shown to be markedly better, bumping kappa from “unreliable” levels near .5 to “reliable” levels near .7. Furthermore, they report on acoustic features that were found to correlate with a phrase’s position within its discourse segment. On average, initial phrases were found to be higher pitch and louder, and have longer pauses after and shorter pauses before. Medial phrases and final phrases had lower pitch and volume and shorter pauses before. They differ in that final phrases are spoken faster and have long pauses after, where medial phrases have short pauses.

None of the results they report are surprising, but they do confirm a wide number of previous studies, using a fairly reliable and quantitative methodology. Of course using audio files helps with identifying discourse structure! But here we see how much it helps — without the audio files, they would not be able to achieve reliable annotation, whereas with it they do. On the other hand, the segments where there is agreement show much the same intonation correlations in the text-only annotation and the text+speech annotation. Again, of course we know there are pauses at discourse boundaries and it’s not too hard to notice that pitch and loudness descend during a discourse segment. But here by being quantitative and comprehensive, they provide a foundation for further studies.

Comments (0)

RSS feed for comments on this post.

Leave a comment

Sorry, the comment form is closed at this time.