gpt-oss-120b

OpenAI Generally Available

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized...

Context
131K
tokens
Input
$0.039
per MTok
Output
$0.18
per MTok

About

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized...

Modalities

Input
Text
Output
Text

Advanced Capabilities

Multi-turn Tool Calling
Chained tool calls in one session

Code Examples

curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/gpt-oss-120b \
  -H "Authorization: Bearer $CLOUDFLARE_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "Explain quantum entanglement in one sentence." }
    ]
  }'

API Parameters

Name Type Description
input required one of Responses API Input messages. Refer to OpenAI Responses API docs to learn more about supported content types
messages required array An array of message objects representing the conversation history.
prompt required string The input text prompt for the model to generate a response.
requests required array
frequency_penalty number Decreases the likelihood of the model repeating the same lines verbatim.
functions deprecated array Deprecated. Use tools.
lora string Name of the LoRA (Low-Rank Adaptation) model to fine-tune the base model.
max_tokens integer The maximum number of tokens to generate in the response.
presence_penalty number Increases the likelihood of the model introducing new topics.
raw boolean If true, a chat template is not applied and you must adhere to the specific model's expected formatting.
reasoning object Configuration for extended-thinking / reasoning mode.
repetition_penalty number Penalty for repeated tokens; higher values discourage repetition.
response_format object Constrain output to a JSON schema or an enum (structured outputs).
seed integer Random seed for reproducibility of the generation.
stream boolean If true, the response will be streamed back incrementally using SSE, Server Sent Events.
temperature number Controls the randomness of the output; higher values produce more random results.
tools array A list of tools available for the assistant to use.
top_k integer Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises.
top_p number Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses.

Sourced from the model's published API schema.