llama.cpp

by llama.cpp freemium

High-performance local inference engine for LLMs and multimodal models, supporting CPU/GPU execution, quantization, and broad model compatibility.

Freemium Free tier available

About

High-performance local inference engine for LLMs and multimodal models, supporting CPU/GPU execution, quantization, and broad model compatibility.

Features

Community Feedback

Quick Info

Category

Pricing freemium

Vendor llama.cpp

Website github.com

Freemium

Free tier available

Similar Products

Free self-host / $8/mo cloud

agentic-ai-prompt-research

Free / Open Source