Skip to main content

Swearing and automatic captions

Swearing is the spice of language. Throw in a little to make that fucking point you want to make pop. Throw in a shitload to create a motherfucking verbal curry, layers of delicious goddamn meaning to unpack.

Swearing is also present in every culture I can think of, with culture being the force that designates words as inappropriate. To swear is to be human.

Context is fuckin’ king

The designation of a word as inappropriate is contextually applied. It also requires knowledge of the culture to successfully navigate its use. For example, I’m going to get a lot more grief if I call something or someone a motherfucker in church than I am at my local dive bar.

Swearing at work is even more complicated. It’s barely done in written communication. It is also not considered appropriate to do in meetings with your superiors, at least until you reach a certain level of promotion.

However, swearing at work also happens all the damn time. In my experience, light swearing in less-formal conversation with your peers can also be a good way to build camaraderie. Sharing a taboo feels good, to say nothing of its analgesic effects.

Speaking of pain: Let’s unpack what automatic captions are, as well as their weaknesses.

Well, what the fuck is a caption in the first place?

Captions are text that is placed over video footage, either live or recorded. This text corresponds to the content spoken on screen, and is supposed to perfectly match. More on this in a bit.

Captions are a form of speech-to-text software, which takes words spoken by a person and converts them into text on a device. They are not text-to-speech, which is taking written content and making a device announce it via a digital voice.

And what the hell is an automatic caption?

Broadly-speaking, there are two different categories of captions:

  1. Manual captions, a service provided by a transcriptionist. These people use highly specialized keyboards and software to provide a text equivalent of spoken content in realtime.
  2. Automatic captions, a service provided by text-to-speech software. This is a service provided by many—but not all—remote meeting apps.

Of the two, manually-provided captions are ideal, as they are far more accurate (although there are exceptions). Some individuals even refer to automatic captions as “craptions”—a phrase I love.

Automatic captions don’t handle things like jargon, nuance, non-English words, low talkers, accents, and AAVE well. They are also susceptible to the racist, ableist shit we call algorithmic bias.

On the flip side of things, automatic captions can also misrepresent someone by interpreting a non swear word as a swear word. This can easily create embarassment and a negative impression of the speaker.

That all said, automatic captions are better than nothing at all.

Censorship, confusion, and infantilization

Many automatic captioning services censor swear words. I feel this is done because there’s the assumption that they’ll be used in a business context.

A variation on the “Ha ha! Business! meme, that reads, ‘Ha ha! Captions.’ Behind the text is a business man dressed in a early 90s-style suit pumping his fist while speaking on a cell phone. He has an ecstatic expression on his face.

You can’t solve culture with technology. If you need to rely on software to police your employee’s language, you’ve got bigger fucking problems.

The assumption that virtual meetings are only used for business-related things is also absurd. The pandemic has more than proved this, to say nothing of my Sunday night Dungeons & Dragons game. I’d also say the sub-assumption that you shouldn’t swear at work is also pretty damn shitty.

Also, what about conferences? Your event should include captions, and should not censor them.

I find the most engaging conferences have impassioned speakers, and with passion can come swearing. Conference organizers: get me fired up about your speaker’s content, not leaving me fixating on wondering if that chain of asterisks is hiding some crucial detail.

Confusion

Speaking of hiding detail, censored automatic captions also can play havoc with understanding. For example, consider these two phrases:

  1. “This is fucked up” (a disturbing situation), versus,
  2. “This is cocked up” (ruined to the point where others are affected).

If censored, both phrases read as, “This is ****** up.” Without the swear being present, I am left a great deal more confused about the status of the item or situation in question.

Infantilization

The inclusion of a swear is a deliberate act. Its removal undoes the speaker’s agency, and dilutes the message they’re trying to communicate.

In an abled context, it is artificial tone-policing, shotgunned without discretion or consent. In a disabled context, censorship of language in automatic captioning communicates that you are nannying what a d/Deaf person should be hearing.

This censorship creates a lack of equivalency in experience. Someone who does not use captions is privy to a more accurate interpretation of the spoken content. I don’t know about you, but that feels like some bullshit exclusion.

Fucking software

Another consideration is if anyone affected by this lack of an equivalent experience was included in the feature’s concepting and development. I sincerely fucking doubt it, given how prevalent access barriers and discrimination are present in the hiring pipeline.

The inclusion of logic to monitor each spoken phrase and strip out swear words is unnecessary bloat to download and extra logic to maintain. Just remove it.

Private business can effectively do whatever the fuck they want, so automatic caption censorship is completely and totally unnecessary. At the very worst, make this nanny cam antifeature an opt-in toggle for the end user.

The seven dirty words test

The late comedian George Carlin had a monologue about seven dirty words that are deemed inappropriate for television and would be censored. The words are:

  1. Shit,
  2. piss,
  3. fuck,
  4. cunt,
  5. cocksucker,
  6. motherfucker, and
  7. tits.

I figure this is as good a list as any to check which auto captioning services are facilitating unnecessary, ableist discrimination.

The test

I enabled auto captions for a range of popular virtual meeting services, and then tested the following phrase:

This is a test to see if censorship is present in {product name's} automatic captioning feature. Shit. Pause. Piss. Pause. Fuck. Pause. Cunt. Pause. Cocksucker. Pause. Motherfucker. Pause. Tits.

I’m using the word “pause” to insert a break between each curse word, to ensure each word is independently evaluated.

Shoutout to Sarah Higley, the real MVP of this post. She sat by and let me curse at her for services that needed a second participant minimum to use their automatic captioning functionality.

The results

Here’s how each service stacked up:

Figma

I’m a big fan of Figma, and was pleasantly surprised to see auto captioning included with their beta voice chat collaboration feature.

Skip Figma YouTube embed.

Content skipped.

Figma did not censor any of the seven dirty words. Nice!

Google Meet

Also referred to as Google Hangouts and Google Duo. Google Meet is the heir-apparent sitting atop a pile of seventeen years of mismanagement.

Skip Google Meet YouTube embed.

Content skipped.

Google Meet censored the following words:

Instagram Stories

A formerly decent-enough service for appreciating photos, Instagram has perpetually pivoted to poorly copying whatever service is currently popular and ramming it down your throat.

Skip Instagram Stories YouTube embed.

Content skipped.

Instagram censored the following words:

Of note, Instagram was the only service to use a cutesy beeping noise and grawlix when words it deemed swears were present. This was initially kind of funny, but quickly became patronizing.

Jitsi

Jitsi does not include auto captions out of the box. Maybe it should.

Microsoft Teams

Also known as “we have Slack at home,” Microsoft Teams continues to disappoint.

Skip Microsoft Teams YouTube embed.

Content skipped.

Teams censored the following words:

“Cunt” was inaccurately identified as “count.” The word “tits” was identified as “Titz”, a municipality in Germany.

The misidentification is interesting in that it:

Update: Teams offers profanity filtering controls for live captions as of April 11th, 2023. Unfortunately, this experience is opt-in, which requires you knowing it exists.

Slack

Slack has a Huddle feature, which can summon the ghost of Screenhero on request. This buried subfeature allows for video chatting, as well as auto captioning.

Skip Slack YouTube embed.

Content skipped.

Slack did not censor swear words, but did confuse “pause” for “powers,” which I thought was funny.

Skype

In all honesty, I bet Skype will outlive me. I probably shouldn’t make fun of it.

Skip Skype YouTube embed.

Content skipped.

Skype censored all seven swear words. Of note is many swears were initially displayed and then replaced with asterisks, making the entire point moot.

TikTok

I’m angry I had to join TikTok to test this, as I have an addictive personality and only barely survived Vine.

Skip TikTok YouTube embed.

Content skipped.

TikTok also did not censor any of the seven dirty words.

Twitter

Oh Twitter. You were a cesspool, but you were my cesspool. Jack Dorsey is a feckless piece of shit, but at least he was checked out enough to not mess with things too much.

Twitter Spaces’ captioning feature is a make-good on its disastrous voice Tweets launch. It was a great addition by the now-defunct Accessibility Team, whose dismissal is one of many travesties I will never forgive fuckwit Elon Musk for visiting upon us.

Shout out to my friend Soren who crashed this party and also maintained their professional composure while I swore at them.

Skip Twitter YouTube embed.

Content skipped.

Twitter’s automatic captions censored the words “fuck” and “cunt” by refusing to even acknowledge them. Unfortunately, I don’t know if that is because of active censorship or because of its crumbling services infrastructure.

Zoom

Zoom is popular in the disability space, due to its relative ease of use with assistive technology, the ability to programmatically facilitate both auto and live captioning, and its wide range of customization features.

Skip Zoom YouTube embed.

Content skipped.

Zoom also did not censor any of the seven dirty words. Liberté, égalité, fraternité!

What about BlueJeans, GoToMeeting, UberConference, etc.?

I am not going to shell out money to test this. If you:

  1. Have an account at one of these services,
  2. Are interested in the results, and
  3. Don’t mind being cursed at

Please contact me to set something up!

What about voice control software?

Another important aspect of auto-censorship is how it affects voice control software, assistive technology that lets someone operate their device by spoken input.

A common use case for voice control software is text transcription, where someone dictates content and it shows up on screen. This is helpful for auditory thinkers, as well as people with limb loss, paralysis, arthritis, Parkinson's, etc.

Voice control can also be used to do things like click on items to take action on them. However, censoring content may force someone to use alternate, more annoying strategies—think being prevented from saying “Click Dick” when picking a contact to email.

Sarah led the charge testing voice control software to see if they censor content. Here’s what we learned:

Voice Access

This is a new feature in Windows 11. I’m excited for it, because I think the more Operating Systems provide assistive technology features the more normalized and democratized they become.

Unfortunately, Voice Access censors your input. Microsoft is heavily invested in inclusive design, so hopefully they’ll reconsider this damnable antifeature.

Here’s Sarah putting it through its paces:

Skip Voice Access YouTube embed.

Content skipped.

Like Skype, Voice Access censored all seven words. Boo.

Voice Control

Voice Control is Apple’s version of Voice Access, and predates it. I have previously written about it, and am happy to report no censorship is present when revisiting the feature to test it.

Skip Voice Control YouTube embed.

Content skipped.

Dragon Home

Dragon is third party voice control software that languishes in purgatory thanks to Microsoft’s acquisition.

Saving the best for last, Dragon Home converts swearing to cussing, like a parent who stubbed their toe in front of one of their kids. You see what happens when you find a stranger in the alps?

Skip Dragon Home YouTube embed.

Content skipped.

Dragon swapped in the following words:

Wrapping the fuck up

You might think this is juvenile, but I’m deadly fucking serious.

Inclusivity means accommodating how an individual chooses to express themselves. If that does not align with your view of what is appropriate, I encourage you to think through the privilege and power dynamics at play. I also encourage you to question an experience’s defaults and how they came to be.