基于 Cloudflare Workers AI 的多模型 OpenAI 兼容 API
| Model ID | 参数量 | 上下文 | 类型 | 定价(每百万 token) |
|---|---|---|---|---|
@cf/openai/gpt-oss-120b |
1200 亿 | 128K | 推理模型 (CoT) | $0.35 输入 / $0.75 输出 |
@cf/openai/gpt-oss-20b |
200 亿 | 128K | 推理模型 (CoT) | $0.10 输入 / $0.30 输出 |
| 方法 | 路径 | 说明 |
|---|---|---|
| GET | / | 交互式 Playground |
| GET | /health | 健康检查 |
| GET | /v1/models | 列出所有模型 |
| GET | /v1/models/:id | 查询单个模型 |
| POST | /v1/chat/completions | 聊天补全(OpenAI 格式) |
| 参数 | 类型 | 默认值 | 说明 |
|---|---|---|---|
model | string | @cf/openai/gpt-oss-120b | 使用的模型 |
messages | array | — | {role, content} 对象数组(必填) |
temperature | number | 0.6 | 采样温度(0 – 5) |
max_tokens | integer | 128000 | 最大生成 token 数(最高 128K) |
top_p | number | 1 | 核采样参数(0.001 – 1) |
top_k | integer | — | Top-k 采样(1 – 50) |
stream | boolean | false | 启用 SSE 流式输出 |
seed | integer | — | 随机种子,用于可复现性 |
repetition_penalty | number | — | 重复惩罚(0 – 2) |
frequency_penalty | number | — | 频率惩罚,降低逐字重复(-2 – 2) |
presence_penalty | number | — | 存在惩罚,鼓励新话题(-2 – 2) |
两个模型都是推理模型(类似 OpenAI 的 o1/o3 系列),所有响应都包含 reasoning_content 字段(思维链/思考过程),无法完全关闭。
但可以通过 Cloudflare REST API 的 reasoning.effort 参数来控制推理深度:
| 级别 | 说明 | 适用场景 |
|---|---|---|
low | 最少推理,快速响应,消耗更少 token | 简单问答、翻译、格式化 |
medium | 适中的推理深度 | 一般性任务、写作、总结 |
high | 深度推理(默认) | 数学、逻辑、代码、复杂分析 |
reasoning.effort 参数需通过 Cloudflare REST API(/ai/v1/responses)使用,本 Worker 的 /v1/chat/completions 端点尚不支持此参数透传。如需控制推理深度,请直接调用 Cloudflare API。
curl -X POST https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/v1/responses \
-H "Authorization: Bearer {API_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"model": "@cf/openai/gpt-oss-120b",
"input": "解释量子纠缠",
"reasoning": {
"effort": "low",
"summary": "concise"
}
}'
curl -X POST https://gpt-oss.vibelab.uk/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "@cf/openai/gpt-oss-120b",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "什么是 Cloudflare Workers AI?"}
],
"temperature": 0.7,
"max_tokens": 128000
}'
curl -X POST https://gpt-oss.vibelab.uk/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "@cf/openai/gpt-oss-20b",
"messages": [
{"role": "user", "content": "讲个笑话"}
],
"stream": true
}'
from openai import OpenAI
client = OpenAI(
base_url="https://gpt-oss.vibelab.uk/v1",
api_key="not-needed", # 无需 API Key
)
response = client.chat.completions.create(
model="@cf/openai/gpt-oss-120b",
messages=[{"role": "user", "content": "你好!"}],
max_tokens=128000,
)
print(response.choices[0].message.content)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://gpt-oss.vibelab.uk/v1",
apiKey: "not-needed", // 无需 API Key
});
const completion = await client.chat.completions.create({
model: "@cf/openai/gpt-oss-20b",
messages: [{ role: "user", content: "你好!" }],
});
console.log(completion.choices[0].message.content);
from openai import OpenAI
client = OpenAI(
base_url="https://gpt-oss.vibelab.uk/v1",
api_key="not-needed",
)
stream = client.chat.completions.create(
model="@cf/openai/gpt-oss-120b",
messages=[{"role": "user", "content": "解释量子计算"}],
stream=True,
)
for chunk in stream:
if chunk.choices:
delta = chunk.choices[0].delta
if delta.content:
print(delta.content, end="", flush=True)
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"model": "@cf/openai/gpt-oss-120b",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "你好!有什么可以帮助你的?",
"reasoning_content": "用户打了个招呼..."
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 74,
"completion_tokens": 43,
"total_tokens": 117
}
}
每个 chunk 遵循 OpenAI SSE 格式:
// 推理过程(思维链)
data: {"choices": [{"delta": {"reasoning_content": "让我想想..."}}]}
// 最终回答
data: {"choices": [{"delta": {"content": "你好"}}]}
// 结束标记
data: [DONE]
reasoning_content(思维链),无法完全关闭。可通过 Cloudflare REST API 的 reasoning.effort 设为 "low" 来减少推理量。model,默认使用 @cf/openai/gpt-oss-120b。base_url 为 https://gpt-oss.vibelab.uk/v1,api_key 填任意值。GET /v1/models 列出所有模型,GET /v1/models/:id 查询单个模型。