LLM(1)
NAME
llm — drive the central LLM module: the only thing that loads a model into memory and exposes it to the other commands (miaougpt, glaude, denree)
SYNOPSIS
llm llm --list llm --list-all llm --load <model-id> llm --unload llm --cache llm --rm <model-id> llm --rm-all
DESCRIPTION
Every command that needs a local LLM (miaougpt, glaude, denree) goes through a single central manager. llm is its console: it shows the currently loaded model and the tokens consumed (in / out), lists models, manages the browser cache and frees GPU memory. The top-right widget mirrors this state live.
By default no model is loaded. A model is loaded either on demand by a command (which asks you to confirm), or explicitly with llm --load <id>. The model stays resident for the session and is shared by every command; llm --unload frees it.
Models ship in two builds: q4f16 (smaller, needs the GPU shader-f16 feature) and q4f32 (larger, universal). The right build is chosen automatically for your GPU.
Cache operations (--cache, --rm, --rm-all) do not need WebGPU and work in any browser.
OPTIONS
(none) show the loaded model and the in/out tokens --list, -l list the recommended chat models --list-all list every known model id --load <id> load a model (asks for confirmation) --unload, --stop free the loaded model from GPU memory --cache list the models stored in the browser cache --rm <id> delete a model from the cache (id or unique substring) --rm-all clear the model cache (after confirmation)
EXAMPLES
llm llm --list llm --load Qwen2.5-1.5B-Instruct llm --unload llm --cache llm --rm-all