Llama 4 Scout
Featured Latest Meta Llama 4 Generally Available Apr 2025
Meta's 17B-active / 109B-total MoE model with a world-record 10 million token context window. Open weights, natively multimodal, and highly efficient for long-context tasks.
Context
10M
tokens
Input
$0.080
per MTok
Output
$0.30
per MTok
About
Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input...
Modalities
Input
Text Vision Code
Output
Text Code
Code Examples
curl https://openrouter.ai/api/v1/chat/completions \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-4-Scout-17B-16E",
"messages": [
{ "role": "user", "content": "Explain quantum entanglement in one sentence." }
]
}' API Parameters
| Name | Type | Description |
|---|---|---|
frequency_penalty | number | Penalize tokens by their frequency so far. Positive values reduce repetition. |
logit_bias | object | Map of token-id to bias (-100…100) added to the logit before sampling. |
max_tokens deprecated | integer | Deprecated. Use max_completion_tokens. |
min_p | unknown | — |
presence_penalty | number | Penalize tokens that have appeared at all so far. Positive values encourage new topics. |
repetition_penalty | number | Penalize repeated tokens (>1.0 reduces repetition, <1.0 encourages it). |
response_format | one of | Constrain output to a JSON schema or an enum (structured outputs). |
seed | integer | Deterministic seed for sampling. Same seed + same prompt produces identical output. |
stop | array | Up to 4 sequences where the API will stop generating tokens. |
structured_outputs | boolean | Enable JSON-schema-constrained output. |
temperature | number | Sampling temperature; higher values produce more random output. 0 is deterministic. |
tool_choice | one of | Controls which (if any) tool is called: "none", "auto", "required", or a specific tool. |
tools | array | List of tools (functions) the model may call. |
top_k | integer | Limit sampling to the top-k most likely tokens at each step. |
top_p | number | Nucleus sampling: consider only tokens whose cumulative probability ≥ top_p. |
Standard OpenAI-compatible parameters. Consult the provider docs for model-specific behaviour.
Benchmark Scores
| Benchmark | Score |
|---|---|
| MMLU | 79.6% |
| MMLU-Pro | 74.3% |
| HumanEval | 78% |
Strengths & Limitations
Best For
10M token context window (world record)
Open weights — self-hostable
Natively multimodal
Very low API cost
MoE efficiency
Limitations
Lower reasoning ceiling than Maverick
Large total parameter count requires MoE-aware serving
Newer HF adoption (235 likes as of launch)
Tags
Long ContextOpen WeightsMultimodalMoEVision