gpt-oss-20b
OpenAI Generally Available
gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for...
Context
131K
tokens
Input
$0.030
per MTok
Output
$0.14
per MTok
About
gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for...
Modalities
Input
Text
Output
Text
Advanced Capabilities
Multi-turn Tool Calling
Chained tool calls in one session
Code Examples
curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/gpt-oss-20b \
-H "Authorization: Bearer $CLOUDFLARE_AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "Explain quantum entanglement in one sentence." }
]
}' API Parameters
| Name | Type | Description |
|---|---|---|
input required | one of | Responses API Input messages. Refer to OpenAI Responses API docs to learn more about supported content types |
messages required | array | An array of message objects representing the conversation history. |
prompt required | string | The input text prompt for the model to generate a response. |
requests required | array | — |
frequency_penalty | number | Decreases the likelihood of the model repeating the same lines verbatim. |
functions deprecated | array | Deprecated. Use tools. |
lora | string | Name of the LoRA (Low-Rank Adaptation) model to fine-tune the base model. |
max_tokens | integer | The maximum number of tokens to generate in the response. |
presence_penalty | number | Increases the likelihood of the model introducing new topics. |
raw | boolean | If true, a chat template is not applied and you must adhere to the specific model's expected formatting. |
reasoning | object | Configuration for extended-thinking / reasoning mode. |
repetition_penalty | number | Penalty for repeated tokens; higher values discourage repetition. |
response_format | object | Constrain output to a JSON schema or an enum (structured outputs). |
seed | integer | Random seed for reproducibility of the generation. |
stream | boolean | If true, the response will be streamed back incrementally using SSE, Server Sent Events. |
temperature | number | Controls the randomness of the output; higher values produce more random results. |
tools | array | A list of tools available for the assistant to use. |
top_k | integer | Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises. |
top_p | number | Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses. |
Sourced from the model's published API schema.