I’ve been finding Sean Zdenek’s Reading Sounds interesting on a few fronts. For one, I love pieces that dig into something that I often taken for granted–like captions.
But extending this, I confess that my “taking for granted” largely came down to a glib acceptance of caption as a subtitle equivalent, what Zdenek calls “undercaptioning.” I didn’t really take stock in the nonspeech sounds, like birdsong and grunts, or the nonspeech information (NSI), like character names and emotion. But even more deeply, I completely missed the deeper, more rhetorical understanding that Zdenek brings to captioning. As he writes in his preface:
“Definitions of closed captioning too often stress the technology of ‘displaying’ text on the screen over the complex practice of selecting sounds and rhetorically inventing words for them. In most definitions, the practice itself is simplified, reduced to a mechanical process of unreflective transcription. No one has really treated captioning as a significant variable in multimodal analysis, on par with image, sound, and video. No one has considered the possibility that captions might be as potent and meaningful as other kinds of texts we study in the humanities. In short, we don’t yet have a good understanding of the rhetorical work captions do to construct meaning and negotiate the constraints of space and time.”
While this is a lengthy block quote, I think it does a great job capturing the gist of his argument, or what I’ve read so far. Zdenek wants to replace a conception of transcription as “unreflective” and “mechanical”–as simply putting a script on a screen–with the rhetorical impact of picking and choosing the words that communicate the context, narrative, feeling, etc.
These lead to a variety of claims and examples, which Zdenek particularly details through interviews, examples, and a corpus analysis of closed captions. On the one hand, more straightforward elements of captioning structure it, which regulations often address: who gets it, how it’s formatted, good, readability, basic NSI, etc. Zdenek discusses how his survey participants value these elements–and closed captioning itself as a technology–despite flaws, like garbled letters or poor pacing.
But Zdenek also looks at the more rhetorical aspects of captioning. For example, when he describes that “[c]aptions conceptualize,” he argues, “Captioning is about meaning, not sound per se. Captions don’t describe sounds so much as convey the purpose and meaning of sounds in specific contexts. The meaning of a sound in a particular context may transcend its origins.” For example, he discusses an outtake of breath that Bella has in the Twilight movie. A “gasp,” or “breath” could mean a variety of things, from fear, to exhaustion, to a romantic surprise. The “noise” written in the caption sets this scene, contextualizing what the noise signifies in the story.
Furthermore, as the “Hypnotoad” shows, the scene itself often gives clarity to the caption. Without the scene, the sound of the Hypnotoad is a bit like a distorted guitar droning, but many of the captions note that the “eyes” are “thrumming” or “buzzing” or “droning,” contextualizing the eyes as a source for the noise and the quality of the noise as a “droning” hypnotism. This example also highlights that no urtext exists for these caption and captions like it, with captioning acting as an art with a variety of interpretations.
Through contextualizing, captions can also create “captioned thematics.” Describing this, Zdenek writes, “When the same caption is repeatedly associated with a specific character or recurring context, it comes to serve as a kind of leitmotif for that character or context” (103). With this, he, for example, describes how the caption (Spritzes) ends up echoing other sounds or dialogue in a movie, as when one character asks another to “spritz” a room.
Captions often highlight what is meaningful. As a series of sounds may coexist at the same time, like ambient background, dialogue, music, or overlaying interruptions, and the caption must contend with them. As Zdenek notes, captions must often “equalize” and “linearize,” expressing each sound through print in a linear fashion. Conversely, multiple sounds may coexist in a nonlinear ways with different volumes. Captions don’t have this luxury.
This was especially interesting when he discusses a PA announcement. While it is speech, it may be less valuable than other sounds. It is not dialogue, and other noises or details may be better-captured in the captions. As he writes, “overcaptioning unnaturally elevates speech sounds. When indistinct speech sounds become distinct through verbatim captioning, captioners play god. Just because speech sounds can be discerned through the careful and repeated listening of a trained captioner does not mean that these sounds should always be captioned verbatim.” (p. 116). This is also clear with the Kenny example from South Park. As Kenny’s mumbling incomprehensibility often informs jokes, a captioner can ruin the humor by spelling it out verbatim.
This all connects to access. Captions are often seen through simple lenses, like with the trend to increase the quantity of captions over the quality, or the push for logocentric captions that mainly address language differences and not hearing differences. Recognizing the rhetorical complexity of captions recognizes the rhetorical complexity of improving access.
Collectively, what I most gain from this text–so far–is how captions, pairing text with screen, act in rhetorically unique ways as texts worthy of study in themselves. With this in mind, I wonder what other ways that versions of a text, like the inclusion of various captions, changes the assemblage that constitutes that text and what that text does in the world.