Mistral Small 3.2

Latest

Mistral Mistral Small Generally Available Jun 2025

Mistral's refined 24B model with 128K context, vision support, and Apache 2.0 licence. Beats Mixtral 8x7B on most benchmarks at much lower cost — 1M+ HuggingFace downloads.

Context

131K

tokens

Input

$0.10

per MTok

Output

$0.30

per MTok

Model Page API Docs

Modalities

Input

Text Vision Code

Output

Text Code

Code Examples

curl https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistralai/Mistral-Small-3.2-24B-Instruct-2506",
    "messages": [
      { "role": "user", "content": "Explain quantum entanglement in one sentence." }
    ]
  }'

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENROUTER_API_KEY"],
)
response = client.chat.completions.create(
    model="mistralai/Mistral-Small-3.2-24B-Instruct-2506",
    messages=[
        {"role": "user", "content": "Explain quantum entanglement in one sentence."},
    ],
)
print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: process.env.OPENROUTER_API_KEY,
});

const response = await client.chat.completions.create({
  model: "mistralai/Mistral-Small-3.2-24B-Instruct-2506",
  messages: [
    { role: "user", content: "Explain quantum entanglement in one sentence." },
  ],
});
console.log(response.choices[0].message.content);

API Parameters

Name	Type	Default	Description
`frequency_penalty`	number	0	Penalize tokens by their frequency so far. Positive values reduce repetition.
`max_completion_tokens`	integer	—	Maximum number of tokens the model may generate in the response.
`presence_penalty`	number	0	Penalize tokens that have appeared at all so far. Positive values encourage new topics.
`response_format`	one of	—	Constrain output to a JSON schema or an enum (structured outputs).
`seed`	integer	—	Deterministic seed for sampling. Same seed + same prompt produces identical output.
`stop`	array	—	Up to 4 sequences where the API will stop generating tokens.
`stream`	boolean	false	Stream partial responses as Server-Sent Events.
`temperature`	number	1	Sampling temperature; higher values produce more random output. 0 is deterministic.
`tool_choice`	one of	—	Controls which (if any) tool is called: "none", "auto", "required", or a specific tool.
`tools`	array	—	List of tools (functions) the model may call.
`top_p`	number	1	Nucleus sampling: consider only tokens whose cumulative probability ≥ top_p.

Standard OpenAI-compatible parameters. Consult the provider docs for model-specific behaviour.

Benchmark Scores

Benchmark	Score	Methodology
MMLU	82%	5-shot
HumanEval	88.99%	HumanEval+, 0-shot
MATH	70%	0-shot

Performance

120

tok / sec

output speed

Source: Mistral AI docs + llm-stats.com, April 2026

Strengths & Limitations

Best For

Vision capable (Mistral Small 3.1+)

128K context

Apache 2.0 — fully open

Multilingual (24 languages)

Excellent cost efficiency

Limitations

24B — smaller than frontier models

No extended thinking