Cohere launches larger Embed models
2 Model Releases for 2x the Fun
We are excited to announce that the Cohere team has released a new suite of Representation models. We have released medium
and large
Representation Models and will now be offering these models as our Baseline Representation Models. small
has also been updated. In addition to releasing new models, we have expanded the maximum token length for our Representation models to 1024 tokens.
Model Comparison
Cohere’s Large and Medium Representation models outperform SOTA Representation models, and Cohere’s updated Small Representation model is in line with SOTA. For the purposes of comparison, we used SentEval, which is a standard academic benchmark for representation models.
Embedding Max Tokens Have Increased
We have increased previous max tokens per text from 512 to 1024. For any text longer than 128 tokens, the text is spliced and the resulting embeddings of each component are averaged and returned.
Upgrading to Larger Embeds
New models will be available at large-20220217
, medium-20220217
, and small-20220217
. Cohere’s previous “Small” Representation Model will still be available via small-20211115
, and the new small
model has redirected to small-20220217
since February 28th. See our pricing page for updated pricing.
Questions?
Feel free to share them on our co:mmunity forum or grab time with us to discuss.