Claude Sonnet 4

Featured Latest
Anthropic Claude 4 Generally Available May 2025

Anthropic's mid-tier flagship — the sweet spot of intelligence and speed. 200K context, strong coding and reasoning, with optional extended thinking and prompt caching.

Context
200K
tokens
Input
$3.00
per MTok
Output
$15.00
per MTok

About

Claude Sonnet 4 significantly enhances the capabilities of its predecessor, Sonnet 3.7, excelling in both coding and reasoning tasks with improved precision and controllability. Achieving state-of-the-art performance on SWE-bench (72.7%),...

Modalities

Input
Text Vision Documents / PDFs Code
Output
Text Code

Code Examples

curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-20250514",
    "max_tokens": 1024,
    "messages": [
      { "role": "user", "content": "Explain quantum entanglement in one sentence." }
    ]
  }'

API Parameters

Name Type Description
include_reasoning boolean Include the model's internal reasoning trace in the response.
max_tokens deprecated integer Deprecated. Use max_completion_tokens.
reasoning object Configuration for extended-thinking / reasoning mode.
stop array Up to 4 sequences where the API will stop generating tokens.
temperature number Sampling temperature; higher values produce more random output. 0 is deterministic.
tool_choice one of Controls which (if any) tool is called: "none", "auto", "required", or a specific tool.
tools array List of tools (functions) the model may call.
top_k integer Limit sampling to the top-k most likely tokens at each step.
top_p number Nucleus sampling: consider only tokens whose cumulative probability ≥ top_p.

Standard OpenAI-compatible parameters. Consult the provider docs for model-specific behaviour.

Benchmark Scores

Benchmark Score
MMLU 86.5%
GPQA Diamond 75.4%
SWE-bench Verified 72.7%
MATH-500 78.2%

Performance

80
tok / sec
output speed

Source: Anthropic docs + llm-stats.com, April 2026

Strengths & Limitations

Best For
200K context window
Extended thinking mode
Excellent coding (SWE-bench)
Prompt caching for long contexts
Computer use capability
Limitations
Premium output pricing
Thinking mode adds latency
No audio modality

Tags

CodingReasoningLong ContextTool UseThinking