Gemini 2.5 Flash

Google Gemini 2.5 Generally Available May 2025

Google's fast and affordable model with 1M token context and optional thinking. Best price-to-performance in the Gemini 2.5 family for high-volume tasks.

Context
1M
tokens
Input
$0.30
per MTok
Output
$2.50
per MTok

About

Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater...

Modalities

Input
Text Vision Audio Video Documents / PDFs Code
Output
Text Code

Code Examples

curl https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-flash",
    "messages": [
      { "role": "user", "content": "Explain quantum entanglement in one sentence." }
    ]
  }'

API Parameters

Name Type Description
include_reasoning boolean Include the model's internal reasoning trace in the response.
max_tokens deprecated integer Deprecated. Use max_completion_tokens.
reasoning object Configuration for extended-thinking / reasoning mode.
response_format one of Constrain output to a JSON schema or an enum (structured outputs).
seed integer Deterministic seed for sampling. Same seed + same prompt produces identical output.
stop array Up to 4 sequences where the API will stop generating tokens.
structured_outputs boolean Enable JSON-schema-constrained output.
temperature number Sampling temperature; higher values produce more random output. 0 is deterministic.
tool_choice one of Controls which (if any) tool is called: "none", "auto", "required", or a specific tool.
tools array List of tools (functions) the model may call.
top_p number Nucleus sampling: consider only tokens whose cumulative probability ≥ top_p.

Standard OpenAI-compatible parameters. Consult the provider docs for model-specific behaviour.

Benchmark Scores

Benchmark Score
MMLU 88.4%
GPQA Diamond 82.8%
MATH 88%

Performance

180
tok / sec
output speed

Source: Google AI docs + Artificial Analysis, April 2026

Strengths & Limitations

Best For
Very low cost at scale
1M token context window
Optional thinking mode
Free tier available
Multimodal inputs
Limitations
Lower ceiling than Gemini 2.5 Pro on hard reasoning
Thinking mode adds latency and cost

Tags

FastCheapLong ContextThinkingVision