Cloudflare Docs
Workers AI
Visit Workers AI on GitHub
Set theme to dark (⇧+D)

Text Generation

Family of generative text models, such as large language models (LLM), that can be adapted for a variety of natural language tasks.

  • Task type: text-generation
  • TypeScript class: AiTextGeneration

​​ Available Embedding Models

List of available models in for this task type:

Model IDDescription
@cf/meta/llama-2-7b-chat-fp16Full precision (fp16) generative text model with 7 billion parameters from Meta
Default max (sequence) tokens (stream): 2500
Default max (sequence) tokens: 256
Context tokens limit: 3072
Sequence tokens limit: 2500
More information
Terms and license
@cf/meta/llama-2-7b-chat-int8Quantized (int8) generative text model with 7 billion parameters from Meta
Default max (sequence) tokens (stream): 1800
Default max (sequence) tokens: 256
Context tokens limit: 2048
Sequence tokens limit: 1800
More information
Terms and license
@cf/mistral/mistral-7b-instruct-v0.1Instruct fine-tuned version of the Mistral-7b generative text model with 7 billion parameters
Default max (sequence) tokens (stream): 1800
Default max (sequence) tokens: 256
More information
@hf/codellama/codellama-7b-hfGenerative text model built on top of Llama 2, fine-tuned for generating and discussing code
Default max (sequence) tokens (stream): 1800
Default max (sequence) tokens: 256
More information
Terms and license

​​ Examples - chat style with system prompt (preferred)


import { Ai } from '@cloudflare/ai'
export interface Env {
AI: any;
}
export default {
async fetch(request: Request, env: Env) {
const ai = new Ai(env.AI);
const messages = [
{ role: 'system', content: 'You are a friendly assistant' },
{ role: 'user', content: 'What is the origin of the phrase Hello, World' }
];
const stream = await ai.run('@cf/meta/llama-2-7b-chat-int8', {
messages,
stream: true
});
return new Response(
stream,
{ headers: { "content-type": "text/event-stream" } }
);
},
};

import { Ai } from '@cloudflare/ai'
export interface Env {
AI: any;
}
export default {
async fetch(request: Request, env: Env) {
const ai = new Ai(env.AI);
const messages = [
{ role: 'system', content: 'You are a friendly assistant' },
{ role: 'user', content: 'What is the origin of the phrase Hello, World' }
];
const response = await ai.run('@cf/meta/llama-2-7b-chat-int8', { messages });
return Response.json(response);
},
};

async function run(model, prompt) {
const messages = [
{ role: 'system', content: 'You are a friendly assistant' },
{ role: 'user', content: prompt }
];
const response = await fetch(
`https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/${model}`,
{
headers: { Authorization: "Bearer {API_TOKEN}" },
method: "POST",
body: JSON.stringify({ messages }),
}
);
const result = await response.json();
return result;
}
run('@cf/meta/llama-2-7b-chat-int8', 'Tell me a story').then((response) => {
console.log(JSON.stringify(response));
});

import requests
API_BASE_URL = "https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/"
headers = {"Authorization": "Bearer {API_TOKEN}"}
def run(model, prompt):
input = {
"messages": [
{ "role": "system", "content": "You are a friendly assistant" },
{ "role": "user", "content": prompt }
]
}
response = requests.post(f"{API_BASE_URL}{model}", headers=headers, json=input)
return response.json()
output = run("@cf/meta/llama-2-7b-chat-int8", "Tell me a story")
print(output)

$ curl https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/meta/llama-2-7b-chat-int8 \
-X POST \
-H "Authorization: Bearer {API_TOKEN}" \
-d '{ "messages": [{ "role": "system", "content": "You are a friendly assistant" }, { "role": "user", "content": "Why is pizza so good" }]}'

​​ Responses

​​ Non-streaming response


{
"response":
"The origin of the phrase \"Hello, World\" is not well-documented, but it is believed to have originated in the early days of computing. In the 1970s, when personal computers were first becoming popular, many programming languages, including C, had a simple \"Hello, World\" program that was used to demonstrate the basics of programming.\nThe idea behind the program was to print the words \"Hello, World\" on the screen, and it was often used as a first program for beginners to learn the basics of programming. Over time, the phrase \"Hello, World\" became a common greeting among programmers and computer enthusiasts, and it is now widely recognized as a symbol of the computing industry.\nIt's worth noting that the phrase \"Hello, World\" is not a specific phrase that was coined by any one person or organization, but rather a catchphrase that evolved over time as a result of its widespread use in the computing industry."
}

​​ Handling streaming responses in the client

A streaming response will be returned in the server-side events, or SSE format. Below is an example showing how to parse this response in JavaScript, from the browser:


const source = new EventSource("/"); // Workers AI streaming endpoint
source.onmessage = (event) => {
if (event.data == "[DONE]") {
source.close();
return;
}
const data = JSON.parse(event.data);
el.innerHTML += data.response;
}

​​ API schema

The following schema is based on JSON Schema

​​ Input


{
"type": "object",
"oneOf": [
{
"properties": {
"prompt": {
"type": "string"
},
"stream": {
"type": "boolean",
"default": false
},
"max_tokens": {
"type": "integer",
"default": 256
}
},
"required": [
"prompt"
]
},
{
"properties": {
"messages": {
"type": "array",
"items": {
"type": "object",
"properties": {
"role": {
"type": "string"
},
"content": {
"type": "string"
}
},
"required": [
"role",
"content"
]
}
},
"stream": {
"type": "boolean",
"default": false
},
"max_tokens": {
"type": "integer",
"default": 256
}
},
"required": [
"messages"
]
}
]
}

TypeScript class: AiTextGenerationInput

​​ Output


{
"oneOf": [
{
"type": "object",
"contentType": "application/json",
"properties": {
"response": {
"type": "string"
}
}
},
{
"type": "string",
"contentType": "text/event-stream",
"format": "binary"
}
]
}

TypeScript class: AiTextGenerationOutput