Mistral 7B Instruct v0.1
Mistral Deprecated
A 7.3B parameter model that outperforms Llama 2 13B on all benchmarks, with optimizations for speed and context length.
Context
3K
tokens
Input
$0.11
per MTok
Output
$0.19
per MTok
About
A 7.3B parameter model that outperforms Llama 2 13B on all benchmarks, with optimizations for speed and context length.
Modalities
Input
Text
Output
Text
Advanced Capabilities
Structured Outputs
JSON schema-constrained generation
Code Examples
curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/mistral-7b-instruct-v0.1 \
-H "Authorization: Bearer $CLOUDFLARE_AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "Explain quantum entanglement in one sentence." }
]
}' API Parameters
Temperature: 0 – 5| Name | Type | Description |
|---|---|---|
messages required | array | An array of message objects representing the conversation history. |
prompt required | string | The input text prompt for the model to generate a response. |
frequency_penalty | number | Decreases the likelihood of the model repeating the same lines verbatim. |
functions deprecated | array | Deprecated. Use tools. |
lora | string | Name of the LoRA (Low-Rank Adaptation) model to fine-tune the base model. |
max_tokens | integer | The maximum number of tokens to generate in the response. |
presence_penalty | number | Increases the likelihood of the model introducing new topics. |
raw | boolean | If true, a chat template is not applied and you must adhere to the specific model's expected formatting. |
repetition_penalty | number | Penalty for repeated tokens; higher values discourage repetition. |
response_format | object | Constrain output to a JSON schema or an enum (structured outputs). |
seed | integer | Random seed for reproducibility of the generation. |
stream | boolean | If true, the response will be streamed back incrementally using SSE, Server Sent Events. |
temperature | number | Controls the randomness of the output; higher values produce more random results. |
tools | array | A list of tools available for the assistant to use. |
top_k | integer | Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises. |
top_p | number | Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses. |
Sourced from the model's published API schema.