Content Moderation with Classify

The internet is dominated by user-generated content. While it provides an avenue for online platforms to grow, it is a bane for content moderators managing them. It is impossible for humans to manually moderate all the user content that is created. There must be a more scalable way.

Content Moderation with Classify
Detecting and flagging toxic comments on social media

Content Moderation

The internet is dominated by user-generated content, which is a double-edged sword. While it provides an avenue for online platforms to grow and thrive, it is a bane for content moderators managing them.

Take this for example. In 2021, 83% of adults (18-45) and 60% of teens (13-17) experienced harassment in online gaming [Source]. If online platforms are to provide a safe and pleasant user experience, they need to be effectively moderated.

But the internet is a vast repository of content. Think about the various types of platforms available and the scale of their users: gaming, social media, dating, chat, online community, e-commerce, blog, video streaming, and the list goes on.

It is impossible for humans to manually moderate all the user content that is created. There must be a more scalable way.

Scaling content moderation is an enormous challenge

Large Language Models (LLM) as a Solution

Automated solutions using Natural Language Processing (NLP) are now emerging as the solution for such challenges.

Let’s say we have a task of flagging user-generated content that can be deemed as toxic, such as containing abusive or obscene language. This type of task is called text classification.

The common NLP approach today is to train a machine-learning algorithm to classify if a piece of content is toxic or not. While it is proving to be extremely effective, the problem with this approach is that it requires a huge amount of labeled training data to train a model from scratch to be able to reach an acceptable performance level.

This means that teams embarking on this task will need to spend resources and time collecting the data needed to perform the task. And not all teams have the luxury to do so.

Enter the Large Language Model (LLM). It is a type of machine learning system that is already pre-trained with a huge amount of text. It is a general-purpose model that performs well over a wide range of NLP tasks including text classification, and in our example, toxicity classification.

With the LLM approach, the amount of labeled training data required to achieve a good text classification performance significantly drops from typically in the thousands with the traditional approach to just in the hundreds or even tens in certain cases.

This is game-changing. It opens up the possibilities of leveraging machine learning for teams who would otherwise not have the resources and expertise to do it themselves. Developers are now empowered to build systems and applications that are capable of performing content moderation tasks at scale.

Leveraging an LLM platform helps teams achieve a much faster time to value

Content Moderation with the Cohere Platform

The Cohere platform provides an LLM API to help teams achieve time to value in the shortest time possible. With just a few examples of labeled data, you can get up and running with classifying text for your content moderation pipeline.

The API comes with a specialized endpoint called Classify which streamlines the task of running a text classification task. Via a single endpoint, you can deploy different kinds of content moderation use cases according to your needs.

Check out our quick walkthrough on content moderation with Classify here.