Cloudflare Docs
Workers AI
Visit Workers AI on GitHub
Set theme to dark (⇧+D)

Automatic Speech Recognition

Automatic speech recognition (ASR) models convert a speech signal, typically an audio input, to text.

  • Task type: speech-recognition
  • TypeScript class: AiSpeechRecognition

​​ Available Embedding Models

List of available models in for this task type:

Model IDDescription
@cf/openai/whisperAutomatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data
More information

​​ Examples


import { Ai } from "@cloudflare/ai";
export interface Env {
AI: any;
}
export default {
async fetch(request: Request, env: Env) {
const res: any = await fetch("https://github.com/Azure-Samples/cognitive-services-speech-sdk/raw/master/samples/cpp/windows/console/samples/enrollment_audio_katie.wav");
const blob = await res.arrayBuffer();
const ai = new Ai(env.AI);
const input = {
audio: [...new Uint8Array(blob)],
};
const response = await ai.run("@cf/openai/whisper", input);
return Response.json({ input: { audio: [] }, response });
}
}

$ curl https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/openai/whisper \
-X POST \
-H "Authorization: Bearer {API_TOKEN}" \
--data-binary @talking-llama.mp3

​​ API schema

The following schema is based on JSON Schema

​​ Input


{
"oneOf": [
{
"type": "string",
"format": "binary"
},
{
"type": "object",
"properties": {
"audio": {
"type": "array",
"items": {
"type": "number"
}
}
}
}
]
}

TypeScript class: AiSpeechRecognitionInput

​​ Output


{
"type": "object",
"contentType": "application/json",
"properties": {
"text": {
"type": "string"
}
}
}

TypeScript class: AiSpeechRecognitionOutput