API 使用指南

基于 Cloudflare Workers AI 的多模型 OpenAI 兼容 API

可用模型

Model ID参数量上下文类型定价(每百万 token)
@cf/openai/gpt-oss-120b 1200 亿 128K 推理模型 (CoT) $0.35 输入 / $0.75 输出
@cf/openai/gpt-oss-20b 200 亿 128K 推理模型 (CoT) $0.10 输入 / $0.30 输出
免费额度:每天 10,000 Neurons,所有模型共享。

接口列表

方法路径说明
GET/交互式 Playground
GET/health健康检查
GET/v1/models列出所有模型
GET/v1/models/:id查询单个模型
POST/v1/chat/completions聊天补全(OpenAI 格式)

请求参数

参数类型默认值说明
modelstring@cf/openai/gpt-oss-120b使用的模型
messagesarray{role, content} 对象数组(必填)
temperaturenumber0.6采样温度(0 – 5)
max_tokensinteger128000最大生成 token 数(最高 128K)
top_pnumber1核采样参数(0.001 – 1)
top_kintegerTop-k 采样(1 – 50)
streambooleanfalse启用 SSE 流式输出
seedinteger随机种子,用于可复现性
repetition_penaltynumber重复惩罚(0 – 2)
frequency_penaltynumber频率惩罚,降低逐字重复(-2 – 2)
presence_penaltynumber存在惩罚,鼓励新话题(-2 – 2)

推理深度控制

两个模型都是推理模型(类似 OpenAI 的 o1/o3 系列),所有响应都包含 reasoning_content 字段(思维链/思考过程),无法完全关闭

但可以通过 Cloudflare REST API 的 reasoning.effort 参数来控制推理深度:

级别说明适用场景
low最少推理,快速响应,消耗更少 token简单问答、翻译、格式化
medium适中的推理深度一般性任务、写作、总结
high深度推理(默认)数学、逻辑、代码、复杂分析
注意:reasoning.effort 参数需通过 Cloudflare REST API(/ai/v1/responses)使用,本 Worker 的 /v1/chat/completions 端点尚不支持此参数透传。如需控制推理深度,请直接调用 Cloudflare API。

通过 Cloudflare REST API 控制推理深度

curl -X POST https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/v1/responses \
  -H "Authorization: Bearer {API_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "@cf/openai/gpt-oss-120b",
    "input": "解释量子纠缠",
    "reasoning": {
      "effort": "low",
      "summary": "concise"
    }
  }'

调用示例

cURL — 非流式(gpt-oss-120b)

curl -X POST https://gpt-oss.vibelab.uk/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "@cf/openai/gpt-oss-120b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "什么是 Cloudflare Workers AI?"}
    ],
    "temperature": 0.7,
    "max_tokens": 128000
  }'

cURL — 流式(gpt-oss-20b)

curl -X POST https://gpt-oss.vibelab.uk/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "@cf/openai/gpt-oss-20b",
    "messages": [
      {"role": "user", "content": "讲个笑话"}
    ],
    "stream": true
  }'

Python — OpenAI SDK

from openai import OpenAI

client = OpenAI(
    base_url="https://gpt-oss.vibelab.uk/v1",
    api_key="not-needed",  # 无需 API Key
)

response = client.chat.completions.create(
    model="@cf/openai/gpt-oss-120b",
    messages=[{"role": "user", "content": "你好!"}],
    max_tokens=128000,
)

print(response.choices[0].message.content)

Node.js — OpenAI SDK

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://gpt-oss.vibelab.uk/v1",
  apiKey: "not-needed",  // 无需 API Key
});

const completion = await client.chat.completions.create({
  model: "@cf/openai/gpt-oss-20b",
  messages: [{ role: "user", content: "你好!" }],
});

console.log(completion.choices[0].message.content);

Python — 流式输出

from openai import OpenAI

client = OpenAI(
    base_url="https://gpt-oss.vibelab.uk/v1",
    api_key="not-needed",
)

stream = client.chat.completions.create(
    model="@cf/openai/gpt-oss-120b",
    messages=[{"role": "user", "content": "解释量子计算"}],
    stream=True,
)

for chunk in stream:
    if chunk.choices:
        delta = chunk.choices[0].delta
        if delta.content:
            print(delta.content, end="", flush=True)

响应格式

非流式响应

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "@cf/openai/gpt-oss-120b",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "你好!有什么可以帮助你的?",
      "reasoning_content": "用户打了个招呼..."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 74,
    "completion_tokens": 43,
    "total_tokens": 117
  }
}

流式响应 (SSE)

每个 chunk 遵循 OpenAI SSE 格式:

// 推理过程(思维链)
data: {"choices": [{"delta": {"reasoning_content": "让我想想..."}}]}

// 最终回答
data: {"choices": [{"delta": {"content": "你好"}}]}

// 结束标记
data: [DONE]

注意事项