Llama 3.2 1B Instruct
Meta Generally Available
Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural language tasks, such as summarization, dialogue, and multilingual text analysis. Its smaller size allows it to operate...
Context
60K
tokens
Input
$0.027
per MTok
Output
$0.20
per MTok
About
Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural language tasks, such as summarization, dialogue, and multilingual text analysis. Its smaller size allows it to operate...
Modalities
Input
Text
Output
Text
Advanced Capabilities
Structured Outputs
JSON schema-constrained generation
Code Examples
curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/llama-3.2-1b-instruct \
-H "Authorization: Bearer $CLOUDFLARE_AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "Explain quantum entanglement in one sentence." }
]
}' API Parameters
Temperature: 0 – 5| Name | Type | Description |
|---|---|---|
messages required | array | An array of message objects representing the conversation history. |
prompt required | string | The input text prompt for the model to generate a response. |
frequency_penalty | number | Decreases the likelihood of the model repeating the same lines verbatim. |
functions deprecated | array | Deprecated. Use tools. |
lora | string | Name of the LoRA (Low-Rank Adaptation) model to fine-tune the base model. |
max_tokens | integer | The maximum number of tokens to generate in the response. |
presence_penalty | number | Increases the likelihood of the model introducing new topics. |
raw | boolean | If true, a chat template is not applied and you must adhere to the specific model's expected formatting. |
repetition_penalty | number | Penalty for repeated tokens; higher values discourage repetition. |
response_format | object | Constrain output to a JSON schema or an enum (structured outputs). |
seed | integer | Random seed for reproducibility of the generation. |
stream | boolean | If true, the response will be streamed back incrementally using SSE, Server Sent Events. |
temperature | number | Controls the randomness of the output; higher values produce more random results. |
tools | array | A list of tools available for the assistant to use. |
top_k | integer | Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises. |
top_p | number | Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses. |
Sourced from the model's published API schema.