Phi-4

Latest

Other Phi Generally Available Dec 2024

Microsoft's 14B small language model focused on math and code. Outperforms many 70B models on MMLU (84.8) and MATH (80.4), with MIT licence and 600K+ HuggingFace downloads.

Context

16K

tokens

Input

—

per MTok

Output

—

per MTok

Model Page API Docs

Modalities

Input

Text Code

Output

Text Code

Code Examples

curl https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "microsoft/phi-4",
    "messages": [
      { "role": "user", "content": "Explain quantum entanglement in one sentence." }
    ]
  }'

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENROUTER_API_KEY"],
)
response = client.chat.completions.create(
    model="microsoft/phi-4",
    messages=[
        {"role": "user", "content": "Explain quantum entanglement in one sentence."},
    ],
)
print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: process.env.OPENROUTER_API_KEY,
});

const response = await client.chat.completions.create({
  model: "microsoft/phi-4",
  messages: [
    { role: "user", content: "Explain quantum entanglement in one sentence." },
  ],
});
console.log(response.choices[0].message.content);

API Parameters

Name	Type	Default	Description
`frequency_penalty`	number	0	Penalize tokens by their frequency so far. Positive values reduce repetition.
`max_completion_tokens`	integer	—	Maximum number of tokens the model may generate in the response.
`presence_penalty`	number	0	Penalize tokens that have appeared at all so far. Positive values encourage new topics.
`response_format`	one of	—	Constrain output to a JSON schema or an enum (structured outputs).
`seed`	integer	—	Deterministic seed for sampling. Same seed + same prompt produces identical output.
`stop`	array	—	Up to 4 sequences where the API will stop generating tokens.
`stream`	boolean	false	Stream partial responses as Server-Sent Events.
`temperature`	number	1	Sampling temperature; higher values produce more random output. 0 is deterministic.
`tool_choice`	one of	—	Controls which (if any) tool is called: "none", "auto", "required", or a specific tool.
`tools`	array	—	List of tools (functions) the model may call.
`top_p`	number	1	Nucleus sampling: consider only tokens whose cumulative probability ≥ top_p.

Standard OpenAI-compatible parameters. Consult the provider docs for model-specific behaviour.

Benchmark Scores

Benchmark	Score	Methodology
MMLU	84.8%	5-shot
HumanEval	82.6%	0-shot
MATH	80.4%	4-shot
GPQA Diamond	56.1%	0-shot

Performance

130

tok / sec

output speed

Source: Phi-4 technical report (Microsoft Research) + HuggingFace, April 2026

Strengths & Limitations

Best For

Best-in-class MMLU for 14B models

MIT licence — fully commercial

Strong math (MATH 80.4)

600K+ HuggingFace downloads

Runs on single A100

Limitations

16K context only

No vision or multimodal

Training biased toward English