Generative AI with Cohere: Part 3 - The Generate Endpoint
In Part 3, we move from Playground to code and begin our exploration of the Generate endpoint.

In Part 1 and 2 of our series on generative AI, we looked at how we can use the Playground to experiment with ideas in text generation. And let’s say we have now found that idea which we want to build on, so what’s next?
In Part 3, we will begin our exploration of the Cohere Generate endpoint, one of the endpoints available from the Cohere API. We’ll move from the Playground to code, in this case via the Python SDK, and learn how to use the endpoint.
This article comes with a Google Colaboratory notebook for reference.
Here’s a quick look at how to generate a piece of text via the endpoint. It’s actually quite straightforward.
We enter a prompt:
response = co.generate(
model='command-xlarge-nightly',
prompt='Write me a haiku about software and magic.')
print(response.generations[0].text)
And we get a response:
Magic of software
creating worlds from code
Imagination unleashed
Of course, there are more options available for you to define your call in a more precise way. In this article, we will cover that and more, including:
- An overview of the Generate endpoint
- Setting up
- Making the first API call
- Understanding the response
- Turning Playground prompts into code
- Selecting the model
- Understanding the other parameters
- Experimenting with a prompt
But before going further, let’s take a quick step back and reflect on what this all means.
Looking at the code snippet above, it’s easy to miss what’s at play here. There are many options out there for leveraging large language models (LLMs), from building your own models to deploying available pre-trained models. But with a managed LLM platform like Cohere, what you get is a simple interface to language technology via easy-to-use endpoints.
What this means is that you are free to focus solely on building applications instead of having to worry about getting the underlying technology to work. You don’t have to worry about the complexities, resources, and expertise required to build, train, deploy, and maintain these AI models.

Now let’s dive deeper into the Generate endpoint.
Overview of the Generate Endpoint
The Cohere platform can be accessed via the Playground, SDK, or the CLI tool. In this article, we’ll learn how to work with the Python SDK.
With the API, you can access a number of endpoints, such as Generate, Embed, and Classify. Different endpoints are used for different purposes and produce different outputs. For example, you would use the Classify endpoint when you want to perform text classification.
Our focus for this series is, of course, text generation, so we’ll work with the Generate endpoint.
Setting Up
First, if you haven’t done so already, you can register for a Cohere account and get a trial API key, which is free to use. There is no credit or time limit associated with a trial key; calls are rate-limited to 100 calls per minute, which is typically enough for an experimental project. Read more about using trial keys in our documentation.
Next, you need to install the Python SDK. You can install the package with this command:
pip install cohere
Making the First API Call
Now, to get a feel of what the Generate endpoint does, let’s try it with the code snippet we saw earlier.
Import the Cohere package, define the Cohere client with the API key, and add the text generation code, like so.
import cohere
co = cohere.Client('your_api_key')
response = co.generate(
model='command-xlarge-nightly',
prompt='Write me a haiku about software and magic.')
print(response.generations[0].text)
Run it on a Python interpreter and you get the response:
Magic of software
creating worlds from code
Imagination unleashed
And that’s our first API call!
Understanding the Response
Let’s understand what we get from the API response.
The Generate endpoint accepts a text input, that is the prompt, and outputs a Generation
object.

Here’s an example response, with the text generated:
[cohere.Generation {
id: 8f7541b8-1e85-4784-a848-7aaad74f7bbe
text:
Magic of software
creating worlds from code
Imagination unleashed
likelihood: None
token_likelihoods: None
}]
The response contains:
id
— A unique ID for the generationtext
— The generated text, given the inputlikelihood
— The average likelihood of all generated tokens (one English word roughly translates to 1-2 tokens)token_likelihoods
— The likelihood of each token
If you want to keep it simple, you just need to get the text
output and you’re good to go, like what we have done so far: response.generations[0].text
.
Here, we defined the the index of the generation (index 0, representing the first item) because the endpoint can actually generate more than one output in one go. We’ll cover how to get multiple generations later in this article, but here, we have not specified any number of outputs, so there is only one output by default.
In some scenarios, we may want to evaluate the quality of our output. This is where we can use the likelihood
and token_likelihoods
outputs.
Note: If you are not sure about what likelihood
means, you can read about it in Part 1.
Let’s see them in action. Say our text input is, “It’s great to” and the generated text is, “be out here”.
These three output words map directly to three individual tokens, which we can check using the Tokenize endpoint.
co.tokenize("be out here")
Response:
cohere.Tokens {
tokens: [894, 558, 1095]
token_strings: ['be', ' out', ' here']
}
So, back to the Generate endpoint output, let’s say we get the following likelihood values for each token:
- be: - 2.3
- out: -0.3
- here: -1.2

So, the average likelihood of all generated tokens, in this case, is the average of the three tokens, which equals -1.3.
Note that you need to enable the return_likelihoods
parameter to return either GENERATION
(output only) or ALL
(output and prompt), otherwise, you will get None
for the likelihood
and token_likelihoods
outputs.
response = co.generate(
model='command-xlarge-nightly',
prompt='Write me a haiku about software and magic.',
return_likelihoods='GENERATION')
print(response)
Response:
[cohere.Generation {
id: fa9d9499-1418-44ab-8349-6aa57e5ce70d
text:
Software is magic,
It can make you feel,
Like you're in a different world
likelihood: -0.80018723
token_likelihoods: [TokenLikelihood(token='\n', likelihood=-0.09764614), TokenLikelihood(token='Software', likelihood=-0.40727282), TokenLikelihood(token=' is', likelihood=-0.17436308), TokenLikelihood(token=' magic', likelihood=-0.06731783), TokenLikelihood(token=',', likelihood=-2.0117483), TokenLikelihood(token='\n', likelihood=-0.19473554), TokenLikelihood(token='It', ...]
}]
Turning Playground Prompts into Code
Recall that in Part 2, we went through a number of prompt ideas, where each comes with its own preset link. You might have come up with some ideas and saved them as presets. The question is, how do you turn those into code?
It’s quite easy actually. If you go back to the Playground and open up a preset, you will see a View Code
button. Click on that button and you will get the code version of that preset in the language of your choice, in this case, Python.
Let’s try that with the email writing preset.

You can simply copy and paste this code into the Python interpreter to get the response.
Selecting the Model
We have only used one model so far: command-xlarge-nightly
. But as we discussed in Part 1, you can prompt Cohere’s text generation model in two ways: by instruction and by example. So, the first thing that you want to define when calling the endpoint is the model type, depending on how you are constructing your prompt. Here are the available models at the time of writing:
- Prompting by instruction:
command-xlarge-nightly
,command-medium-nightly
- Prompting by example:
xlarge
,medium
The sizes implied in the model names represent the parameter size of the models. So, which one do you choose? It depends on your use case, but as a rule of thumb, smaller models are faster, while larger models are generally more fluent and coherent.
Understanding the Other Parameters
While defining just the model is enough to get started, in reality, you will likely need to specify other parameters as well, so the model’s output will match your intended output as closely as possible.
For example, if you look at the second example (email writing), the code export contains many more parameters we didn’t define in the first example. We covered some of these parameters in Part 1 in the context of the Playground, so we’ll not cover them again here.
Having said that, we didn’t cover all of the available parameters in Part 1. So, now is probably a good time to visit the Generate endpoint docs and learn more about all the available parameters, for example, their default values, their value ranges, and more.
Experimenting with a Prompt
If you have a prompt idea that you really want to take to the next step, you may want to experiment extensively with it, for example, by trying out different parameter combinations and finding that ideal combination that fits your needs.
You can do that with the Playground, but since you’ll have to manually adjust the settings after each generation, it’s probably not going to be very efficient. That’s when the SDK comes in.
In the following example, we’ll take a prompt (product description) and create a small function to automatically iterate over different text generations. This way we can evaluate the generations in a much faster way.
In particular, we’re going to use the following parameters:
temperature
— we’ll iterate over a range of values to arrive at a value that fits our use case (we covered whattemperature
means in Part 1)num_generations
— we can use this parameter to get five generations in one go instead of onereturn_likelihoods
— we’ll set this toGENERATIONS
and use this to evaluate the randomness of our ext output

And here’s what the code looks like:
# Function to call the Generate endpoint
def generate_text(prompt,temperature,num_gens):
response = co.generate(
model='command-xlarge-nightly',
prompt=prompt,
temperature=temperature,
num_generations = num_gens,
return_likelihoods='GENERATION')
return response
# Define the prompt
prompt='Write me a haiku about software and magic.'
# Define the range of temperature values and num_generations
temperatures = [x / 10.0 for x in range(0, 60, 10)]
num_gens = 3
# Iterate over the range of temperature values
print(f"Temperature range: {temperatures}")
for temperature in temperatures:
response = generate_text(prompt,temperature,num_gens)
for i in range(3):
text = response.generations[i].text
likelihood = response.generations[i].likelihood
print(f'Generation #{i+1}')
print(f'Text: {text}\n')
print(f'Likelihood: {likelihood}\n')
You can run the code in the notebook and get the full generation. Here, we are showing a few example outputs, as the full generation is quite long (view the notebook to see the full generation).
Temperature range: [0.0, 1.0, 2.0, 3.0, 4.0, 5.0]
----------
Temperature: 0.0
Generation #1
Text:
Software is magic
That makes the world go round
Without it, we're all lost
…
…
----------
Temperature: 5.0
Generation #1
Text:
Magic is coding,
Programming is sorcery,
Every day is a new spell.
Final Thoughts
In this blog post, we made our first foray into the Generate endpoint using the Python SDK. We got familiarized with how to get text generations via the API, and we created a simple function to help us experiment with a prompt idea.
We have only covered the basics, though. In upcoming articles, we’ll look at how we can integrate the endpoint into proper applications, such as adding user interfaces, working with other endpoints in tandem, and more.
In the meantime, get your free API key to start building with generative AI.