Llama 4 Scout

Featured Latest
Meta Llama 4 Generally Available Apr 2025

Meta's 17B-active / 109B-total MoE model with a world-record 10 million token context window. Open weights, natively multimodal, and highly efficient for long-context tasks.

Context
10M
tokens
Input
$0.080
per MTok
Output
$0.30
per MTok

About

Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input...

Modalities

Input
Text Vision Code
Output
Text Code

Code Examples

curl https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-4-Scout-17B-16E",
    "messages": [
      { "role": "user", "content": "Explain quantum entanglement in one sentence." }
    ]
  }'

API Parameters

Name Type Description
frequency_penalty number Penalize tokens by their frequency so far. Positive values reduce repetition.
logit_bias object Map of token-id to bias (-100…100) added to the logit before sampling.
max_tokens deprecated integer Deprecated. Use max_completion_tokens.
min_p unknown
presence_penalty number Penalize tokens that have appeared at all so far. Positive values encourage new topics.
repetition_penalty number Penalize repeated tokens (>1.0 reduces repetition, <1.0 encourages it).
response_format one of Constrain output to a JSON schema or an enum (structured outputs).
seed integer Deterministic seed for sampling. Same seed + same prompt produces identical output.
stop array Up to 4 sequences where the API will stop generating tokens.
structured_outputs boolean Enable JSON-schema-constrained output.
temperature number Sampling temperature; higher values produce more random output. 0 is deterministic.
tool_choice one of Controls which (if any) tool is called: "none", "auto", "required", or a specific tool.
tools array List of tools (functions) the model may call.
top_k integer Limit sampling to the top-k most likely tokens at each step.
top_p number Nucleus sampling: consider only tokens whose cumulative probability ≥ top_p.

Standard OpenAI-compatible parameters. Consult the provider docs for model-specific behaviour.

Benchmark Scores

Benchmark Score
MMLU 79.6%
MMLU-Pro 74.3%
HumanEval 78%

Strengths & Limitations

Best For
10M token context window (world record)
Open weights — self-hostable
Natively multimodal
Very low API cost
MoE efficiency
Limitations
Lower reasoning ceiling than Maverick
Large total parameter count requires MoE-aware serving
Newer HF adoption (235 likes as of launch)

Tags

Long ContextOpen WeightsMultimodalMoEVision