Llama Guard 3 8B

Meta Generally Available

Llama Guard 3 is a Llama-3.1-8B pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification)...

Context

131K

tokens

Input

$0.48

per MTok

Output

$0.030

per MTok

Model Page Try It API Docs

About

Modalities

Input

Text

Output

Text

Advanced Capabilities

Structured Outputs

JSON schema-constrained generation

Code Examples

curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/llama-guard-3-8b \
  -H "Authorization: Bearer $CLOUDFLARE_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "Explain quantum entanglement in one sentence." }
    ]
  }'

import os, requests

ACCOUNT_ID = os.environ["CLOUDFLARE_ACCOUNT_ID"]
TOKEN      = os.environ["CLOUDFLARE_AUTH_TOKEN"]

response = requests.post(
    f"https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/llama-guard-3-8b",
    headers={"Authorization": f"Bearer {TOKEN}"},
    json={
        "messages": [
            {"role": "system",  "content": "You are a helpful assistant."},
            {"role": "user",    "content": "Explain quantum entanglement in one sentence."},
        ],
    },
)
print(response.json())

interface Env { AI: Ai }

export default {
  async fetch(req: Request, env: Env): Promise<Response> {
    const messages = [
      { role: "system", content: "You are a helpful assistant." },
      { role: "user",   content: "Explain quantum entanglement in one sentence." },
    ];
    const response = await env.AI.run("llama-guard-3-8b", { messages });
    return Response.json(response);
  },
};

API Parameters

Temperature: 0 – 5

Name	Type	Default	Description
`messages` required	array	—	An array of message objects representing the conversation history.
`max_tokens`	integer	256	The maximum number of tokens to generate in the response.
`response_format`	object	—	Dictate the output format of the generated response.
`temperature`	number	0.6	Controls the randomness of the output; higher values produce more random results.

Sourced from the model's published API schema.