How Cohere Works with Google’s Vertex Machine Engine to Power Embeddings

Complex comparisons of text for meaning and search from data, such as documents, images, videos, and plain text on the web (social posts, reviews, articles, etc.), is growing rapidly and presents a challenge to traditional databases that aren’t built to store and analyze it. Keyword and metadata classification can be insufficient at capturing all of this unstructured data’s characteristics, hindering search and other uses. Often, traditional systems for analyzing and comparing this complex data lack the computational power to effectively handle the text and metadata. As a result, traditional databases may struggle to store and analyze this data effectively. Due to these limitations, there’s been a major shift away from traditional keyword search and towards semantic search using embeddings.
Embeddings at their core allow computers to efficiently compare pieces of text and measure the degree of similarity between them. They do this by creating a long sequence of numbers that encode what words mean based on their context (e.g., bank of a river vs. money in the bank). These sequences are called “vectors,” and they allow the algorithm to see “nearest neighbors” to understand the similarity between words and their nuances. Vector search enables a contextual understanding that makes it far more accurate and effective than keyword- or metadata-based search.
When a development team wants to explore the many uses of text embeddings, Cohere’s large language models (LLMs) are an important part of the process. Before you can use Cohere to create the embeddings, however, a few things need to happen:
- A large unstructured dataset needs to be preprocessed via an established metadata pipeline, such as a Dataflow template.
- The now processed dataset must be ingested into a persistent storage location, such as Google Cloud Services or BigQuery.
- At this point, Cohere creates embeddings (numerical representations) of the text.
- Then, once you have the embeddings, you can run similarity measurements against them to power applications, such as search, categorization, intent recognition, and clustering, to look for trends.
Typically, these steps are conducted separately in self-managed environments, which requires a lot of compute power and expertise. It is expensive, not easily scalable with datasets that can number in the millions and billions, and rarely delivers the desired speed, low latency, and usability.
Google Cloud’s Vertex Matching Engine (VME) brings together these functions into a single, fully managed environment, streamlining the process and solving for the speed, latency, and scale issues that plague DIY vector similarity search (a.k.a. nearest neighbor search) setups.
And Cohere’s language models, along with the Cohere API and its Embed endpoint, fit neatly with VME, adding multi-language support, the ability to finetune the base models, and powerful accuracy.
Use Cases for Embeddings
What can you do with embeddings? Quite a lot. Our Embed endpoint takes a piece of text and creates a vector embedding, which represents the text as numbers that capture its meaning and context. Embedding transforms unstructured text data into a structured form that allows you to cluster, categorize, and semantically (contextually) search the text.
Cohere’s powerful language models can find relationships between pieces of text (words, phrases, sentences, paragraphs, or documents) that would not surface with linear keyword search. Some examples of industries that could benefit from vector search include:
- Healthcare: Quickly search for patient records or medical information
- Finance: Search for financial transactions or account information
- Insurance: Search for information in databases or records
- Retail: Search for products on an e-commerce website
- Education: Search for educational resources or materials
Overall, any industry that relies on large amounts of information, and the ability to quickly and accurately search for that information, could benefit from vector search.
Why VME + Cohere Is a Game-Changer
Beyond streamlining several different processes under one roof, so to speak, within the Google Cloud ecosystem, VME is massively scalable, really fast, and fully managed. A vector search database at its core, VME can search across billions of embedding vectors at high queries per second with very low latency.
Oftentimes, you may need to sacrifice accuracy on the altar of speed and low latency. However, when tested on many real-world applications, VME retained a recall (metric that measures percentage of true nearest neighbors) of 95-98% while serving results with 90th percentile latency less than 10ms.*
VME is fully managed, autoscales, requires no infrastructure, and costs less than leading alternatives without heavy compute requirements and it can scale to billions of embedding vectors, the lowest latency can be 5ms even with large amounts of searches. It also features some advanced capabilities not always found in vector similarity search engines, including index (i.e., dataset) updating with no downtime or latency, and built-in filtering that allows you to narrow your results to subsets within your main dataset.
When you add Cohere as your pre-trained language model in VME, you stack the benefits of our platform into this fully managed vector search tool. For example, you can finetune (or customize) our base language models in the Cohere Playground to add additional language understanding to your dataset, such as unique product names, community slang, or industry jargon, and keep that customized or finetuned model updated.
For example, the diagram below shows how a search pipeline would work by creating metadata for each document, creating and storing the vectors for the documents, and finally embedding the query to compare against the stored vectors.
The task is a similarity matching where it retrieves the most similar image.
Our multilingual support adds industry-leading accurate understanding of 100 languages to assist multinational companies with drawing insights and use from language data across the globe. In recent tests, we conducted against three common open-source alternatives, the Cohere Multilingual Text Understanding Model is by far the best available multilingual embedding model, outperforming alternatives by significant margins.
How to Use Cohere Embed with VME
We’ve put together a notebook on GitHub to help you learn how to create embeddings with the Cohere API and then leverage the Vertex AI Matching Engine to create and query an index. The notebook includes code samples and step-by-step instructions for using the Cohere Embed endpoint to quickly capture semantic information about input data, and then applying the Vertex AI Matching Engine's Approximate Nearest Neighbor (ANN) service to find similar texts.
A Match That Makes Sense
A vector search database like Vertex Matching Engine isn’t useful without vectors — and the higher the quality vectors you have, the more efficient the database can be. Embeddings by themselves, no matter the quality, don’t add value unless you can efficiently compare them. This is why this match — using Cohere within VME — can help companies scale and access the full potential of embeddings for real-world use cases.
To learn how Cohere and Vertex Machine Engine work, please contact us.