Llama 3.2 11B Vision Instruct

Meta Generally Available

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...

Context

131K

tokens

Input

$0.24

per MTok

Output

$0.24

per MTok

Model Page Try It API Docs

About

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...

Modalities

Input

Text Vision

Output

Text

Advanced Capabilities

Vision Input

Accepts image inputs

Code Examples

curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/llama-3.2-11b-vision-instruct \
  -H "Authorization: Bearer $CLOUDFLARE_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "Explain quantum entanglement in one sentence." }
    ]
  }'

import os, requests

ACCOUNT_ID = os.environ["CLOUDFLARE_ACCOUNT_ID"]
TOKEN      = os.environ["CLOUDFLARE_AUTH_TOKEN"]

response = requests.post(
    f"https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/llama-3.2-11b-vision-instruct",
    headers={"Authorization": f"Bearer {TOKEN}"},
    json={
        "messages": [
            {"role": "system",  "content": "You are a helpful assistant."},
            {"role": "user",    "content": "Explain quantum entanglement in one sentence."},
        ],
    },
)
print(response.json())

interface Env { AI: Ai }

export default {
  async fetch(req: Request, env: Env): Promise<Response> {
    const messages = [
      { role: "system", content: "You are a helpful assistant." },
      { role: "user",   content: "Explain quantum entanglement in one sentence." },
    ];
    const response = await env.AI.run("llama-3.2-11b-vision-instruct", { messages });
    return Response.json(response);
  },
};

API Parameters

Temperature: 0 – 5

Name	Type	Default	Description
`messages` required	array	—	An array of message objects representing the conversation history.
`prompt` required	string	—	The input text prompt for the model to generate a response.
`frequency_penalty`	number	—	Decreases the likelihood of the model repeating the same lines verbatim.
`functions` deprecated	array	—	Deprecated. Use tools.
`image`	one of	—	—
`lora`	string	—	Name of the LoRA (Low-Rank Adaptation) model to fine-tune the base model.
`max_tokens`	integer	256	The maximum number of tokens to generate in the response.
`presence_penalty`	number	—	Increases the likelihood of the model introducing new topics.
`raw`	boolean	false	If true, a chat template is not applied and you must adhere to the specific model's expected formatting.
`repetition_penalty`	number	—	Penalty for repeated tokens; higher values discourage repetition.
`seed`	integer	—	Random seed for reproducibility of the generation.
`stream`	boolean	false	If true, the response will be streamed back incrementally using SSE, Server Sent Events.
`temperature`	number	0.6	Controls the randomness of the output; higher values produce more random results.
`tools`	array	—	A list of tools available for the assistant to use.
`top_k`	integer	—	Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises.
`top_p`	number	—	Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses.

Sourced from the model's published API schema.