The petabytes of data held by global online communities, such as Twitter, make them valuable sources during data collection. You can use the data to study your target audience to gain insight into what people are tweeting about — as well as the sentiment of those tweets.

Twitter records over 500 million tweets per day from millions of users across the world. This volume presents analysts with a wide range of diverse datasets from users across different regions and backgrounds. However, this large size also makes manually searching for tweets a daunting and time-consuming task.

So, what can you do instead?

Using Cohere, Twitter’s Standard Search API, and the Python programming language, you can apply automation and text classification to extract insights from a large volume of tweets. To do this, you will need to build a bot that fetches a large volume of tweets and classifies the sentiment expressed within them — either positive, negative, or neutral — using Cohere classifiers.

This tutorial demonstrates how to create a keyword-driven tweet analysis bot that scrapes tweets related to a particular topic from the Twitter platform and performs sentiment analysis on them. Below is a quick preview of the Twitter bot in action.

You can find the final project code on GitHub.

Prerequisites

This tutorial contains hands-on steps to build a tweet analysis bot using Python. To follow the tutorial, ensure you have the following:

A Twitter developer account to gain access to the Twitter developer portal. Sign up for a developer account if you don't have one.
An app within your Twitter project. You’ll use the credentials for the app to authenticate requests from the bot to Twitter’s API.
A basic understanding of the Python programming language and the tools set up on your computer.

Tweet Analysis Bot

The tweet analysis bot you’re about to build will extract user sentiments related to the React, Next.js, Angular, Vue, Node.js, and Ember frameworks. The bot fetches 10,000 Tweets that contain a mention of these frameworks from the Search endpoint within the Twitter V2 API, and it then uses a Cohere classifier to classify the sentiment in the tweets as positive, negative, or neutral.

Preparing Application Resources

The bot uses Twitter’s Standard Search API as the source of data and Cohere to classify the retrieved tweets. To interact with these two platforms, you need specific credentials to authenticate your HTTP requests to these services.

Let’s proceed to generate or retrieve these credentials.

Creating a Cohere API key

Cohere API keys authenticate requests to the Cohere endpoints. The Cohere web console and CLI provide features to manage the API key generated for your Cohere account. You’ll use your API key to connect to Cohere when you have a batch of tweets ready for analysis.

Using your browser, navigate to the Dashboard tab of your Cohere console to generate an API key. Click Create API Key to launch a dialogue box for specifying the name of your API key. Enter your preferred text in the API Key Name field to name the API key.

Click Create API Key on the dialog box to save the API key. Then, copy the generated API key to a secure file, as you’ll use it in the next section.

Next, you need to create a Twitter application and retrieve the application's API credentials.

Creating a Twitter Application

A Twitter application’s API credentials are within the Keys and tokens tab on the application’s settings page.

If your bearer token value is unknown, click Regenerate to regenerate the Bearer Token and copy the token to a secure file. You’ll use the Bearer Token to authenticate HTTP requests for fetching data from Twitter.

With your Cohere API Key and the Bearer Token for your Twitter app, the stage is set for you to begin building the tweet analysis bot with Python.

Creating a Python Application

To begin development, execute the series of commands below in your console to create a project directory (python-analysis-bot), move into the directory, then create a virtual environment using the virtualenv package and activate it. A virtual environment enables you to isolate the dependencies for your project.

Note: If you don’t have the virtualenv package on your computer, execute the pip install virtualenv command to install it.

# create and change into the directory
mkdir python-analysis-bot  
cd python-analysis-bot

# create a virtual environment
virtualenv analysis-bot

# activate the virtual environment 
source analysis-bot/bin/activate

Installing Python Packages

Execute the pip command below to install the cohere, python-dotenv, and more-itertools packages from PyPi.

pip install cohere python-dotenv more-itertools

You use the python-dotenv package to securely access your API credentials before fetching data from the Twitter Search endpoint and transforming the response with the more-itertools package. The installed Cohere package will perform all operations to classify text.

Next, create a .env file to store the Cohere and Twitter credentials securely. By separating your API credentials from the code, you prevent them from leaking whenever you push your code to a public code host.

Replace the placeholders below with your Cohere API Key and Twitter Bearer Token.

# python-analysis-bot/.env
TWITTER_BEARER_TOKEN=BEARER_TOKEN
COHERE_API_KEY=API KEY

Implementing the Analysis Bot

Create a file named app.py within the python-analysis-bot project. This file is the entry point and will execute when you run the bot.

Add the code block's content below into the app.py file to add the import and variable declarations needed for the application.

Note: Take note of the indentation of the code blocks below, as indentation is mandatory for Python code.

# python-analysis-bot/app.py
import cohere
from cohere.responses.classify import Example
import requests
import more_itertools
import os
from dotenv import load_dotenv

load_dotenv()
co = cohere.Client(os.getenv("COHERE_API_KEY"))

twitter_api_url = "https://api.twitter.com/2/tweets/search/recent"
twitter_headers = {
   "Authorization": "Bearer {}".format(
       os.getenv("TWITTER_BEARER_TOKEN"))
}

request_params = {
   'query': '(React.js OR Next.js.js OR Angular.js OR Vue OR node.js OR ember.js) lang:en',
   'max_results': 100,
}

Take note of the value of the query key within the request_params dictionary. The string “React.js, Next.js, Angular.js, Vue, and Node.js” uses the OR standard search operator to find only tweets containing text written in English.

Next, add the code below to the existing code within the app.py file to create a class for the analysis bot. The use of classes helps you keep the bot code clean, organized, and readable.

# python-analysis-bot/app.py
class AnalysisBot():
   retrieved_tweets = []
   classified_tweets = []
   results = {
       'react': {
           'positive': 0,
           'mentions': 0
       },
       'next': {
           'positive': 0,
           'mentions': 0
       },
       'angular': {
           'positive': 0,
           'mentions': 0
       },
       'vue': {
           'positive': 0,
           'mentions': 0
       },
       'node': {
           'positive': 0,
           'mentions': 0
       },
       'ember': {
           'positive': 0,
           'mentions': 0
       }
   }

   def __init__(self) -> None:
       fetch_count = 0

       while (fetch_count < 100):
           print(
               'FETCHING BATCH: #{}. {} tweets retrieved.'
               .format(fetch_count, len(self.retrieved_tweets))
           )

           tweets = requests.request(
               "GET",
               url=twitter_api_url,
               headers=twitter_headers,
               params=request_params
           ).json()

           token = tweets['meta']['next_token']

           if (token):
               request_params["next_token"] = token

           for item in tweets['data']:
               self.retrieved_tweets.append(item['text'])

           fetch_count = fetch_count + 1

The constructor method of the AnalysisBot above uses the requests package to make a GET request to the search endpoint of Twitter APIs, fetching 100 tweets and storing them in the retrieved_tweets list. The code places the request in a while loop to execute the GET request 100 times, fetching 10,000 tweets at the end of the loop.

Alongside the retrieved tweets, a token value is part of the API response for pagination purposes. The code adds the token value to a next_token key within the request_params to fetch the next batch of 100 tweets.

Note: Twitter allows developers with Elevated access to fetch a maximum of 100 tweets per API request, while allowing developers with Academic Research access to fetch a maximum of 500 tweets per API request.

Add the code below to create the next method within the AnalysisBot class to classify all the fetched tweets.

# python-analysis-bot/app.py
    def classify_tweets(self):
       tweet_items = list(more_itertools.chunked(self.retrieved_tweets, 32))

       for idx, tweets in enumerate(tweet_items):
           print("PROCESSING:", idx)
           response = co.classify(
               model='medium',
               taskDescription='Classify tweets on JavaScript frameworks retrieved from the Twitter V2 Search API',
               outputIndicator='Classify retrieved tweets to determine developers stance on framework',
               inputs=tweets,
               examples=[
                    
			# positive tweets about frameworks
                   Example("React is still the best JS front-end library/framework", "positive review"),
                   Example("Vue is an amazing beginner framework. I've become competent in it, and the goal initially was to learn Vue then move to React... but I don't even know if I want to move to React anymore. Vue is great!",
                          "positive review"),
                   Example("Angular is such a great framework. My Front-End code has never been this tidy and organized. Google did a good job.", "positive review"),

                   # negative tweets about frameworks
                   Example("React/Redux is a bad framework choice for a complex application AND if you are developing w/ React/Redux < 6 months per year.",
                           "negative review"),
                   Example("Had a play with ember.js this morning. The code is not nice, the documentation horrific, and it does bad things to my markup.", "negative review"),

                   # neutral tweets about frameworks
                   Example("I dont know nodejs! I dont know javascript! I dont even know what my nodejs version is! lol",
                           "neutral review"),
                   Example("I dont know much about nodejs but I have to learn it, such is life.", "neutral review"),
                   Example("i Really want to use #NextJs but i dont know much about next and #react yet.", "neutral review")
               ]
           )

           for classification in response.classifications:
               self.determine_results('react', classification)
               self.determine_results('next', classification)
               self.determine_results('vue', classification)
               self.determine_results('angular', classification)
               self.determine_results('node', classification)
               self.determine_results('ember', classification)

       print("RESULTS:", self.results)

The code above uses the more_itertools package to split the retrieved tweets into chunks containing 32 tweets each. This is because a Cohere classifier can simultaneously process a maximum of 32 inputs. The classifier also has eight examples of three positives, two negatives, and three neutral statements copied from users’ tweets on Twitter.

After classifying a batch of 32 tweets, the fields are further passed into a determine_results helper function to determine what framework was in the tweet and extract the classification type.

Add the code below to create the helper method used by the classify_tweets method.

   def determine_results(self, framework, classification ):
       if (framework in classification.input):
           self.results[framework]['mentions'] = self.results[framework]['mentions'] + 1

           if (classification.prediction == "positive review"):
               self.results[framework]['positive'] = self.results[framework]['positive'] + 1

Finally, add the code below to instantiate the AnalysisBot class and invoke the classify_tweets method on an instance of the AnalysisBot class.

classifyObj = AnalysisBot()

classifyObj.classify_tweets()

At this point, you have put together the code for the tweet analysis bot. Proceed to run the application and watch the output.

Execute the command below to run the app.py script. Due to the large volume of API requests, it can take 10-20 minutes for the entire process to complete.

After its completion, the values of the results list will print out to show how many frameworks had a positive sentiment and how many times they did so. In the image below, you will see that React is used more often, but Node has a higher percentage of positive comments, with 224 positive tweets.

At this point, you now have a basic analysis bot that you can run multiple times to analyze data from the Twitter Search API.

However, keep in mind that, since the Twitter Search API returns new data upon each search, you may get a different result after running the bot multiple times.

Conclusion

Congratulations on completing this tutorial!

Using Cohere and Python, you classified 10,000 tweets from Twitter’s Search endpoint. Manually classifying such a massive volume of data would have taken several hours or even an entire day. Using Cohere, you’ve saved a significant amount of time and easily performed sentiment analysis to determine that Node.js is the best-liked Javascript framework. Now you know! .

Not done learning yet? Learn more about sentiment analysis through similar tutorials from Cohere.