llama.cpp

by llama.cpp freemium

High-performance local inference engine for LLMs and multimodal models, supporting CPU/GPU execution, quantization, and broad model compatibility.

Freemium Free tier available

About

High-performance local inference engine for LLMs and multimodal models, supporting CPU/GPU execution, quantization, and broad model compatibility.

Features

Community Feedback

How would you rate llama.cpp?