Hate isn't a game
Online communities are essential to our everyday lives. Whether we’re gaming, Tweeting or swiping on dating profiles, digital spaces are where we find the joy of community. But they are also where we experience the risks of hyper connection, with the vast majority of us experiencing hate online, directly or indirectly.
Take gaming as an example. The impact of harassment on player platforms is striking, and unsurprisingly, targets those who are already most discriminated against. Five out of six (83%) online multiplayer gamers have experienced some form of harassment (Anti-Defamation League, 2021). And not only that, harassment in gaming targets those who are already most discriminated against: more than half believe it was because of their race, ethnicity, religion, ability, gender, or sexual orientation (Anti-Defamation League, 2020).
And toxic content isn’t something you can just shake off — its impact runs deep.
- It drives people away from platforms: Over a quarter (30%) of online multiplayer gamers who experienced in-game harassment avoided certain games, while 27% stopped playing certain games altogether (Anti-Defamation League, 2021).
- It impacts the mental health of employees: Combing through violent and graphic images is harrowing. A legal case against one social media company – brought by content moderators suffering from PTSD – resulted in a $52M payout as compensation for mental health issues developed on the job (BBC, 2020).
- It erodes trust between the public and businesses: 79% of Americans believe that social media companies are doing an only fair or poor job at addressing online harassment or bullying on their platforms (Pew Research Center, 2021).
Just like you, we stand against harassment and hate in any online community. There is clear evidence that current content moderation tools can’t keep up. And we know that this is a problem that needs to be addressed, now.
Catch me if you can
Content moderation tools already exist. But it’s difficult for businesses and developers to get ahead of online toxicity because it’s ever-evolving. New terminology pops up. Users find ways to work around banned words or phrases. And every community has its own set of policies and guidelines based on its audience or content.
A gaming platform may need to moderate disruptive gameplay as well as hate speech. A site for children may ban all curse words, while a social media platform aimed at adults may focus on flagging misinformation. The phrase "planting the bomb" may be appropriate in a first-person shooter game, but cause alarm on a social media site.
It’s a complex problem that is currently being met with rudimentary AI or keyword search, allowing nuanced (e.g. an insult that requires an understanding of pop culture) or ‘nonuniform’ (e.g. a slur deliberately spelled incorrectly) hate to slip through the net.
To tackle toxicity, businesses need to augment their existing solutions with advanced and customizable natural language processing (NLP) that can understand the context of posts. Today, that shift has begun, with many online platforms already leveraging Cohere’s LLMs and advanced AI to bolster their content moderation approach and reduce churn.
A global game development company with toxic content challenges tested Cohere against open source models and other content moderation solutions. Preliminary results showed that Cohere’s baseline models outperformed existing solutions. The company is now finetuning a model to their specific community guidelines for even better performance.
The bridge to understanding
At Cohere, we have created a set of APIs that can be used by any software developer or business to utilize NLP, either as a standalone solution or integrated into existing applications.
When it comes to content moderation, the benefits are clear. Advanced NLP solutions give businesses the ability to create moderation tools that are tailored to their unique policies and a capacity to understand the nuance and context of language, significantly accelerating their ability to identify toxicity at scale.
In order to get performance tailored to your platform, all you need to do is provide good and bad examples of content and, most importantly, identify the ones that violate your policies. Then, using the Transformer architecture that underpins Google Search and Translate, Cohere’s content moderation Classify endpoint can be easily trained with these specific examples to identify toxic content. Here’s how it works:
- Cohere trains a baseline model on billions of words so it has a general understanding of language
- You teach Cohere what is and what is not acceptable for your community by uploading examples at which point the model can be finetuned to your needs
- Your model is ready to go as served, so you can use it to classify new comments as they come in via our API
The toxic comments can automatically be removed, or flagged to a content moderator, depending on a company’s individual approach.
Cohere against the competition
To demonstrate performance, we sourced a dataset from Surge AI that contained 1000+ comments from a range of social media platforms. Each comment was labeled into 1 of 2 categories: Toxic or non-toxic.
This is how we performed:
See Classify in action
Here, we use our Playground to demonstrate how Classify can be used to moderate chat conversations in online gaming communities. You can use the Playground to test your use cases and, when you're ready to start building, simply click
Export Code to add Cohere's functionality to your application.
With Classify, companies are able to identify online toxicity with greater breadth and accuracy than ever before. Helping to build safe communities, minimize harmful content, keep employees safe and reduce churn.
Cohere's state-of-the-art NLP models are ready to tackle online toxicity. Book time with our Content Moderation expert to find out how your solution stacks up.