Part 1 of "Generative AI is.. Not Enough?"

It is almost impossible to ignore the astounding progress in artificial intelligence these days. From the new generation of generative chatbots, to the models that can generate (almost) any picture (or very soon, video), the pace of development in the AI field has been nothing short of phenomenal. This is especially true in the field of generative AI, where we see a growing number of impressive generative models that can create images, text, video, and music.

These developments have captured the popular imagination and businesses are struggling to determine how to use AI in their organization. Businesses are rushing to build AI into their products, services, and processes, hoping to find their AI unicorn. Some of these businesses are struggling to determine how to use AI, while others are finding that the current AI landscape is complex and difficult to navigate.

In this series of articles, we explore the importance of these generative AI models and discuss useful perspectives to view and deploy them. This first article introduces the current state of generative AI, and outlines how we should approach it. In the next article, we map the AI technology and value stack to better understand where generative AI fits in. Finally, we discuss how we can better harness its power to create a new generation of intelligent systems.

The summary above (created with Cohere's text generation model with some human editing) is a great introduction to this series of articles that crystalize a lot of what we learned over the last few years about Generative AI and how to think of its models, products, and industries.

Let's jump right in!

What's the big deal with Generative AI? Is it the future or the present?

In the first article in the series, we cover four points:

1- Recent AI developments are awe-inspiring and promise to change the world. But when?
2- Make a distinction between impressive 🍒 cherry-picked demos, and reliable use cases that are ready for the marketplace
3- Think of models as components of intelligent systems, not minds
4- Generative AI alone is only the tip of the iceberg

Let's now look at each one in more detail.

1- Recent AI developments are awe-inspiring and promise to change the world. But when?

Text Generation: Software that generates coherent human language

Text generation models are a central pillar of Generative AI.

The ability of language models to produce coherent text feels like a turning point in human technology. Just as impressive is these models’ ability to capture the meaning and context of text (e.g. articles, messages, documents) to make software deal with text more intelligently.

Without even knowing it, we experience the power of large language models on a daily basis. Think Google Translate, Google Search, and text generation models. Thousands of applications and features in your favorite products use large language models to manipulate language better than ever before – and they're getting faster, more efficient, and more accurate every day.

These models aren’t only enabling new features and products. In fact, entire new sectors of companies are based on these models as their foundation. One clear example here is the growing list of companies building AI writing assistants. This includes companies like HyperWrite, Jasper, Writer, copy.ai, and others. Another example is companies weaving model generations into interactive experiences like Latitude, Character AI, and Hidden Door.

Image Generation: Name a thing then see it manifest in front of your eyes

AI image generation is another exciting area in the Generative AI space. In that domain, models like DALL-E, MidJourney, and Stable Diffusion have taken the world by storm.

Image generation models were some of the highlights of AI in 2022

AI Image generation is not particularly new to the scene. Models like GANs (Generative Adversarial Networks) enabled generating images of people, art, and even homes for about nine years now. But each one of these models was trained specifically for the type of object it generates and it took a long time to generate an image.

The current batch of AI image generation models allow a single model to generate a vast number of image types. They also give the user the ability to control what they generate by describing it in text.

Image generation models create (often astounding) images guided by text prompts.

It’s often difficult to temper your excitement when these tools exceed your expectations of what software can produce with a simple text prompt. In my case, as well as others I would suspect, these models invoke a deep sense that something has changed. Some shift in the world as we know it has occurred and is expected to have a lasting impact on products, industries, and economies. The potential appears clear as day.

That potential is the very reason why caution is warranted.

Tempering excitement with care

As social media gets swept up in posts that claim “I made model X do impossible task Y 🤯”, it’s important to arm oneself with a discerning eye to filter these claims. One of the key questions to ask is whether a demonstrated capability is a 🍒 cherry-picked example that a model produces 40% of the time, or if it points to robust and reliable model behavior.

Reliability is key for an AI capability to become part of a customer-facing product.

Take for instance, the many capabilities attributed to large GPT models in the last few years. An example is a model’s capability to generate the code to build a website from just a text prompt that was floated in some 2020 demos. It is now three years later and such capabilities aren’t how we build websites.

Some capabilities attributed to models in 2020 are astounding, but turning them into reliable products can take anywhere from months to many years depending on the use case.

Code generation with language models is almost certain to change how software is written (ask users of Replit, Tabnine, and copilot). The timeline, however, is less certain. The “nearly” in the tweet above can be anywhere from two to five years.

There's a saying attributed to Bill Gates that can be applied here, “Most people overestimate what they can achieve in a year and underestimate what they can achieve in ten years”. The same can be said about people's expectations around some new technologies.

We tend to overestimate what a new technology can do in a year, and underestimate what it can do in ten

The last time the tech industry was swept up in a deep-learning-induced frenzy, we were promised self-driving cars by 2020.

They’re still not here.

Timeline graphic from Business Insider from 2016 showing the industry widely anticipated autonomous vehicles to be on the road by 2020.

One key takeaway here is to:

2- Make a distinction between impressive 🍒 cherry-picked demos, and reliable use cases that are ready for the marketplace

Large text generation models are able to answer many questions correctly. But can they do it reliably?

Stack Overflow doesn’t think so.

The popular forum where software developers ask questions has banned machine generated answers from being posted on the site “because the average rate of getting correct answers from ChatGPT is too low”. This is an example of a use case where some people expected the model to reliably be able to generate the exact correct generation for a complex set of problems.

AI use cases that are reliable now

There are, however, other use cases (and workflows) where these models are capable of much more reliable results. Key amongst them are neural search (more in that in point #4 below), auto categorization of text (classification), and copywriting suggestions and brainstorming workflows for generation models (discussed in more detail in part three of this series).

The amazing demos will keep rolling in. They’re part of a community discovery process for the limits and new possibilities of these models (more on community discovery of a model’s generative space and its product/economic value in part two). It will pay, however, to keep asking the cherry-picking question, recognize that the timelines that are less certain, and invest in robustness and reliability of AI systems and models.

3- Think of models as components of intelligent systems, not minds

Avoid the urge to think of a language model as a mind with an individual personality.

The capability of language models to generate coherence text will only continue to get better. The first time that some people think that a language model is sentient is already a thing of the past.

A more useful framing is to think of language models as language understanding and language generation components of a software system. They make it a little more intelligent and capable of behaviors beyond what software was traditionally able to do – especially when it comes to language and vision.

In a context like this, the term language understanding is not used to mean human-level understanding and reasoning. But these models are able to extract a lot more information from the text and meanings behind it to increase the usefulness of software.

When we think of language *understanding* and *generation* as distinct capabilities, we start to think more clearly about how to build the intelligent software systems of the future.

Once we think of a model as a component, we can start to compose more advanced systems that use multiple steps or models (Part three of this series is entirely dedicated to this topic).

4- Generative AI alone is only the tip of the iceberg

From a technical standpoint, text and image generation models aren’t distinct enough to deserve their own type or sub-area of “AI”. The same models can be used for a variety of other use cases with little to no adjustments. The concern with drawing an arbitrary line around generation is that some may miss other more mature AI capabilities that are reliably powering more and more systems in the industry.

Language understanding opens the door to so many improved (and new) capabilities of software systems. Chief amongst them are summarization, neural search, and text categorization.

Generative AI is only possible because larger, better models trained on massive datasets enable AI models to make better numeric representation of text and images. For builders, it’s important to know that those representations enable a wide variety of possibilities in addition to generation. One of these key possibilities is neural search.

Neural or semantic search systems utilize ML developments to incorporate context and meaning and go beyond keyword search.

Neural search is the new breed of search systems that use language models to improve on simple keyword search.

They enable searching by meaning.

Learn about neural search in this video by Nils Reimers, Cohere’s director of ML/embeddings and creator of the wildly popular Sentence Transformers open-source library.

Neural search fits alongside text classification as use cases where AI produces reliable results for many industry use cases (some challenging areas include sarcasm classification).

Coming up next

In the upcoming articles in this series, we'll look more closely at the tech and value stack of Generative AI. We will also discuss a number of design patterns for applications that use these models as building blocks to build the next generation of intelligent systems.

Stay tuned! Follow @CohereAI on Twitter and join the Discord community to learn when the next parts are published.