DeepSeek V3

Featured Latest

DeepSeek DeepSeek V3 Generally Available Dec 2024

DeepSeek's flagship MoE model — 671B total / 37B active params, 128K context, and near-GPT-4 performance at a fraction of the cost. MIT licence with 4K+ HuggingFace likes.

Context

128K

tokens

Input

$0.32

per MTok

Output

$0.89

per MTok

Model Page API Docs

About

DeepSeek-V3 — AI tool

Modalities

Input

Text Code

Output

Text Code

Code Examples

curl https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-ai/DeepSeek-V3",
    "messages": [
      { "role": "user", "content": "Explain quantum entanglement in one sentence." }
    ]
  }'

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENROUTER_API_KEY"],
)
response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3",
    messages=[
        {"role": "user", "content": "Explain quantum entanglement in one sentence."},
    ],
)
print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: process.env.OPENROUTER_API_KEY,
});

const response = await client.chat.completions.create({
  model: "deepseek-ai/DeepSeek-V3",
  messages: [
    { role: "user", content: "Explain quantum entanglement in one sentence." },
  ],
});
console.log(response.choices[0].message.content);

API Parameters

Name	Type	Default	Description
`frequency_penalty`	number	0	Penalize tokens by their frequency so far. Positive values reduce repetition.
`logit_bias`	object	—	Map of token-id to bias (-100…100) added to the logit before sampling.
`max_tokens` deprecated	integer	—	Deprecated. Use max_completion_tokens.
`min_p`	unknown	—	—
`presence_penalty`	number	0	Penalize tokens that have appeared at all so far. Positive values encourage new topics.
`repetition_penalty`	number	1	Penalize repeated tokens (>1.0 reduces repetition, <1.0 encourages it).
`response_format`	one of	—	Constrain output to a JSON schema or an enum (structured outputs).
`seed`	integer	—	Deterministic seed for sampling. Same seed + same prompt produces identical output.
`stop`	array	—	Up to 4 sequences where the API will stop generating tokens.
`structured_outputs`	boolean	—	Enable JSON-schema-constrained output.
`temperature`	number	1	Sampling temperature; higher values produce more random output. 0 is deterministic.
`tool_choice`	one of	—	Controls which (if any) tool is called: "none", "auto", "required", or a specific tool.
`tools`	array	—	List of tools (functions) the model may call.
`top_k`	integer	—	Limit sampling to the top-k most likely tokens at each step.
`top_p`	number	1	Nucleus sampling: consider only tokens whose cumulative probability ≥ top_p.

Standard OpenAI-compatible parameters. Consult the provider docs for model-specific behaviour.

Benchmark Scores

Benchmark	Score	Methodology
MMLU	88.5%	5-shot
MMLU-Pro	75.9%	5-shot
GPQA Diamond	59.1%	0-shot
HumanEval	65.2%	0-shot

Performance

tok / sec

output speed

Source: DeepSeek V3 technical report + Artificial Analysis, April 2026

Strengths & Limitations

Best For

Near-GPT-4 MMLU at near-open-source pricing

MIT licence

MoE architecture reduces inference cost

4K HuggingFace likes

730K downloads

Limitations

671B total params require specialised MoE infrastructure

No vision modality

Inferior to R1 on reasoning tasks