Magic the Gathering (Magic) has fascinated me since I was a child. It’s pretty unique as a game in that players use asymmetric game pieces. Unlike chess, in which both players have the same pieces in front of them, in Magic, every player sits down at the table with their own deck of cards that they have handcrafted.

My obsession with the game came from the enjoyment of poring over the thousands of Magic cards released during its history and trying to build a good deck. My only complaint about Magic is that although there are thousands of cards, I really wish there were hundreds of thousands! This was the motivation behind this project—I just wanted more Magic cards.

It turns out that finetuned large language models (LLM) are actually really good at creating magic cards, and they can create hundreds of thousands of them in just a few days!

This is a story of how my two friends and I created Urza’s AI, a website that uses Artificial Intelligence (AI) to generate Magic cards, and how over 38 thousand others joined in the fun.

How We Built It

Magic the Gathering is a collectible card game played with two or more players. Each player starts with a deck of cards and 20 life points and uses those cards to deal damage to the opponent and reduce the life points to 0. Each card consists of a set of information, including its name, cost, type (out of 6 basic types), subtype, a text description about its abilities, and sometimes a small story about the card (flavor text).

Together with my colleagues, Ali Sabet and Michael Kozakov, we had this question—what if we could create an AI that, when prompted with just a card name, would generate a playable Magic card that follows the theme of that given name, complete with all the card information and an image?

It turns out we could! In the end, it took a combination of language AI (an LLM) to generate the text of a Magic card and text-to-image AI to create the card’s image based on the generated text. Here is how we built it.

Step 1: Generating the text

LLMs are massive neural networks that model how the human language works. They are pre-trained with a vast body of text data, allowing them to excel in various Natural Language Processing (NLP) use cases.

One example is language generation, where an LLM can take a piece of text and continue generating text whose context matches closely the one given. This is the essence of prompt engineering, which is what enables so many creative use cases out there such as in creative writing, chatbots, role-playing games, and more.

With Urza’s AI, given a card name, we wanted to generate original and complete information about the card, such as the cost, type, subtype, and description. Using Cohere’s Generate LLM model, accessible via an API, we created a few examples of Magic card information as the prompt.

The outcome was not bad at all. The model generated a very good output, as in the screenshot below, taken from one example generation in the Cohere playground.

Prompting the Cohere Generate model to generate original Magic card information

But though the cards were readable, they were often overpowered or uninterpretable. There was room to make them better. So we decided to take it further by finetuning a Generate model.

Finetuning is a step where you take a pre-trained LLM and customize it with your own dataset. The model then goes through a training round and thereafter, can produce outputs that are more attuned to the dataset you had given it.

So we took a dataset of all the cards released by Wizards of the Coast containing full Magic card descriptions. We finetuned a Cohere Generate model using this dataset⁠—a simple activity of uploading the dataset and kickstarting the finetune via the Playground.

The result was much richer generations that felt more realistic and playable!

Getting much richer generations by prompting the finetuned model

I have saved the prompt so you can try it out here. Take it for a spin and let me know how it went!

Step 2: Generating the image

A Magic card of just text without a picture wouldn’t be complete, so we needed to find a way to create card images that match the theme of the generated text.

For this, we used the Wombo Art API to generate the card image. Wombo’s API leverages another type of neural network—a model trained with text as the input and image as the output.

We provided the API with the card name, types, and subtypes. We also made a few other prompt tricks to condition the API to return the kinds of images that we wanted.

The images generated were impressive! You can see from some examples below that the images captured the themes nicely and were beautifully illustrated to boot.

Getting highly realistic and playable cards via the Cohere and Wombo APIs

More examples here:

We didn’t stop there. We also wanted to use AI to generate the card back and the Mana icons. So we created some prompts and sent them to the Wombo API to generate the images.

Here’s what we got for the card back (except for the title text, which we had to add manually):

The card back generated by the Wombo API

And here are the Mana icons:

The Mana icons generated by the Wombo API

Step 3: Serving the application

With all the ingredients ready, it’s time to render a complete card. We pieced it together with a bunch of CSS and HTML stuff, which honestly turned out to be the hardest part of the whole project!

We hosted it on Urza’s AI, a website where you can enter a card name and it will take care of the rest, rendering a complete card. Try it out here!

The Outcome

We didn’t know how this project would turn out, but the quality of the cards we saw has been mind-blowing. They felt super realistic and were fully playable. I have actually generated complete decks and printed them out, and the games I have played with them have been wild!

Thousands of people seem to agree as well. The Urza’s AI website got more than 38,000 visitors within the first four days of the launch, accumulating more than 183,000 events. We also made it to the front page of Hacker News!

The usage continues to increase and is totaling more than 40,000 users at the time of writing.

Getting 38k users and 183k events within four days of launch

Getting the servers up and running to host the site and surviving the hug of death from Hacker News was not actually that difficult because all the AI is done via network-based API requests. Our site doesn’t even have GPUs. They just make requests to Cohere and Wombo!

What Made This Possible

LLMs are now emerging as a general-purpose answer to implementing NLP use cases. Currently, the common approach to machine learning in NLP is having specific models for specific use cases, with little flexibility within a model to perform different kinds of tasks. But with prompt engineering, we can use the same model to adapt to various use cases by shaping how we want the model to produce its outputs.

With Urza’s AI, that use case is text generation. We conditioned a model with Magic card information, and because it has been pre-trained with a massive corpus of text data, it was able to capture the prompt’s context. And it duly returned the kind of generation we were looking for.

But we are only just scratching the surface. Prompt engineering works beyond just simple text generation. We can adapt it to other tasks such as extracting information from a piece of text, paraphrasing, summarizing, classifying, and many more. LLMs’ versatility continues to surprise, and their applications continue to traverse various industries and verticals.

And if that’s still not enough, we can take an already strong pre-trained LLM up another level with finetuning. With Urza’s AI, the information we needed had a particular pattern and theme, and a baseline LLM would not have been attuned to this nuance. So we finetuned the model with the dataset, which was by no means massive, and saw a marked improvement in the quality of the output.

Finetuning a model with the Cohere platform is very accessible. You don’t need to be good at machine learning to be able to do it. It’s all done via a user interface.

Finetuning with Cohere is a simple step via a user interface

What Does This Mean

It’s incredible to think that via a simple API call, developers can have access to these massive LLMs, previously only within the realms of the big tech companies. Teams looking to leverage this technology can do so without having to worry about building the expertise, infrastructure, dataset, and the associated costs of training and serving a model.

I really think this is the future of AI development. The most cutting-edge neural networks will always be prohibitively large for most teams to host. So they will just make requests to hosted APIs like Cohere or Wombo.

I’m excited about empowering makers and innovators to use AI as a tool of creation. APIs such as this level up the playing field, opening up access to those who don’t have the finances and expertise to build and operate AI systems. Instead of having to mess with the technology, they can now focus solely on creating value.

Try out Cohere Generate here!