Llama 4 Maverick

Featured

Meta Llama 4 Generally Available Apr 2025

Meta's most capable open-weights model. 17B active / 400B total MoE with 1M context, strong MMLU scores that match GPT-4o, and native multimodal support.

Context

tokens

Input

$0.15

per MTok

Output

$0.60

per MTok

Model Page API Docs

About

Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward...

Modalities

Input

Text Vision Code

Output

Text Code

Code Examples

curl https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-4-Maverick-17B-128E",
    "messages": [
      { "role": "user", "content": "Explain quantum entanglement in one sentence." }
    ]
  }'

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENROUTER_API_KEY"],
)
response = client.chat.completions.create(
    model="meta-llama/Llama-4-Maverick-17B-128E",
    messages=[
        {"role": "user", "content": "Explain quantum entanglement in one sentence."},
    ],
)
print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: process.env.OPENROUTER_API_KEY,
});

const response = await client.chat.completions.create({
  model: "meta-llama/Llama-4-Maverick-17B-128E",
  messages: [
    { role: "user", content: "Explain quantum entanglement in one sentence." },
  ],
});
console.log(response.choices[0].message.content);

API Parameters

Name	Type	Default	Description
`frequency_penalty`	number	0	Penalize tokens by their frequency so far. Positive values reduce repetition.
`logit_bias`	object	—	Map of token-id to bias (-100…100) added to the logit before sampling.
`max_tokens` deprecated	integer	—	Deprecated. Use max_completion_tokens.
`min_p`	unknown	—	—
`presence_penalty`	number	0	Penalize tokens that have appeared at all so far. Positive values encourage new topics.
`repetition_penalty`	number	1	Penalize repeated tokens (>1.0 reduces repetition, <1.0 encourages it).
`response_format`	one of	—	Constrain output to a JSON schema or an enum (structured outputs).
`seed`	integer	—	Deterministic seed for sampling. Same seed + same prompt produces identical output.
`stop`	array	—	Up to 4 sequences where the API will stop generating tokens.
`structured_outputs`	boolean	—	Enable JSON-schema-constrained output.
`temperature`	number	1	Sampling temperature; higher values produce more random output. 0 is deterministic.
`top_k`	integer	—	Limit sampling to the top-k most likely tokens at each step.
`top_p`	number	1	Nucleus sampling: consider only tokens whose cumulative probability ≥ top_p.

Standard OpenAI-compatible parameters. Consult the provider docs for model-specific behaviour.

Benchmark Scores

Benchmark	Score	Methodology
MMLU	85.5%	5-shot
MMLU-Pro	80.5%	5-shot
HumanEval	86%	0-shot
MATH	73.5%	0-shot

Strengths & Limitations

Best For

GPT-4o-level MMLU

1M token context

Open weights — self-hostable

Natively multimodal

Strong code generation

Limitations

Requires large GPU cluster for full deployment

Smaller HF community vs Scout at launch