Gemma 3 12b IT

Google Generally Available

Context

80K

tokens

Input

$0.35

per MTok

Output

$0.56

per MTok

Model Page Try It API Docs

About

Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Gemma 3 models are multimodal, handling text and image input and generating text output, with a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions.

Code Examples

curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/@cf/google/gemma-3-12b-it \
  -H "Authorization: Bearer $CLOUDFLARE_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "Explain quantum entanglement in one sentence." }
    ]
  }'

import os, requests

ACCOUNT_ID = os.environ["CLOUDFLARE_ACCOUNT_ID"]
TOKEN      = os.environ["CLOUDFLARE_AUTH_TOKEN"]

response = requests.post(
    f"https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/google/gemma-3-12b-it",
    headers={"Authorization": f"Bearer {TOKEN}"},
    json={
        "messages": [
            {"role": "system",  "content": "You are a helpful assistant."},
            {"role": "user",    "content": "Explain quantum entanglement in one sentence."},
        ],
    },
)
print(response.json())

interface Env { AI: Ai }

export default {
  async fetch(req: Request, env: Env): Promise<Response> {
    const messages = [
      { role: "system", content: "You are a helpful assistant." },
      { role: "user",   content: "Explain quantum entanglement in one sentence." },
    ];
    const response = await env.AI.run("@cf/google/gemma-3-12b-it", { messages });
    return Response.json(response);
  },
};

API Parameters

Temperature: 0 – 5

Name	Type	Default	Description
`messages` required	array	—	An array of message objects representing the conversation history.
`prompt` required	string	—	The input text prompt for the model to generate a response.
`frequency_penalty`	number	—	Decreases the likelihood of the model repeating the same lines verbatim.
`functions` deprecated	array	—	Deprecated. Use tools.
`guided_json`	object	—	JSON schema that should be fufilled for the response.
`max_tokens`	integer	256	The maximum number of tokens to generate in the response.
`presence_penalty`	number	—	Increases the likelihood of the model introducing new topics.
`raw`	boolean	false	If true, a chat template is not applied and you must adhere to the specific model's expected formatting.
`repetition_penalty`	number	—	Penalty for repeated tokens; higher values discourage repetition.
`seed`	integer	—	Random seed for reproducibility of the generation.
`stream`	boolean	false	If true, the response will be streamed back incrementally using SSE, Server Sent Events.
`temperature`	number	0.6	Controls the randomness of the output; higher values produce more random results.
`tools`	array	—	A list of tools available for the assistant to use.
`top_k`	integer	—	Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises.
`top_p`	number	—	Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses.

Sourced from the model's published API schema.