Local Commands¶

The superqode local command group is the local-first toolkit: detect hardware, find and serve models, generate a tuned harness, and check readiness. It is built for Local Agentic Coding (open models on your own hardware, no token bills).

New to local models? Start with the Local Providers guide, then use this page as the command reference.

superqode local COMMAND [OPTIONS]

Command Summary¶

Command	Purpose
`init`	Generate a tuned harness for this repo (doctor + smoke + write)
`doctor`	Detect hardware/engines/models and recommend a local stack
`search`	Find a model and how to get it on every engine (size + fit)
`labs`	Browse trusted models.dev model labs
`packs`	List model-policy packs (per model family)
`serve`	Start a local model server as a managed daemon
`servers`	Show status of every known local server
`stop`	Stop a server SuperQode started
`models`	List chat-capable models on running servers
`warm`	Preload a model and report first-token latency
`smoke`	Non-destructive coding-readiness smoke test
`bench`	Measure TTFT and decode speed
`optimize`	Benchmark candidates and recommend role routing
`guardrails`	Recommend conservative runtime limits for this machine

Getting Started¶

`local init`¶

The one-command path: detect hardware, recommend a trusted model, run a smoke test, and write a tuned superqode.local.yaml.

superqode local init --repo .
superqode --harness superqode.local.yaml

Option	Description
`--repo DIRECTORY`	Repository to tune the harness for (default `.`)
`--output FILE`	Harness file to write (default `superqode.local.yaml`)
`--engine TEXT`	Local engine to smoke test
`--model TEXT`	Model id to smoke test
`--skip-smoke`	Generate the harness without running smoke
`-y, --yes`	Overwrite an existing harness file
`--json`	Emit summary as JSON

`local doctor`¶

Detect hardware, installed engines, and downloaded models, then recommend the best engine + model for this machine (preferring what you already have).

superqode local doctor --repo .
superqode local doctor --generate superqode.local.yaml

Option	Description
`--json`	Emit the full report as JSON
`--repo DIRECTORY`	Repository to size recommendations for
`--guardrails`	Include conservative runtime guardrails
`--generate PATH`	Write a tuned harness spec for the recommended stack
`--name TEXT`	Name for the generated harness (default `local-coder`)

Finding Models¶

`local search`¶

Find models matching a query in the trusted catalog and show, per model, the real native download command for every engine it can run on (Ollama, llama.cpp, LM Studio, MLX) plus a superqode models download alternative, an approximate size, whether you already have it, and a rough memory-fit verdict for your hardware. With --hub it also queries the Hugging Face Hub live (trusted publishers only) for the newest releases.

superqode local search qwen3-coder
superqode local search glm --hub
superqode local search qwen3-coder --hub --gguf

Option	Description
`--hub`	Also search Hugging Face live (trusted publishers)
`--gguf`	With `--hub`: only GGUF (Ollama / llama.cpp)
`--mlx`	With `--hub`: only MLX (Apple Silicon)
`--json`	Emit results as JSON

In the TUI this is :local search <name>, or enter :hub model-search mode and just type names.

`local labs`¶

Browse local-friendly model labs from models.dev (GLM, Qwen, Gemma, DeepSeek, Mistral). Use it before downloading weights.

superqode local labs
superqode local labs alibaba

Option	Description
`--limit INTEGER`	Maximum models to show (default `12`)
`--refresh`	Refresh the models.dev cache
`--json`	Emit labs or models as JSON

`local packs`¶

List model-policy packs (shipped plus ~/.superqode/model-packs/). A pack carries tuned defaults for one open-model family; reference one from a harness with model_policy.pack.

superqode local packs

Option	Description
`--json`	Emit packs as JSON

Servers¶

MLX and llama.cpp serve one model per process; Ollama and LM Studio run as background apps. serve starts a managed daemon that survives SuperQode exiting.

`local serve`¶

superqode local serve ollama
superqode local serve mlx --model mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit --port 8090
superqode local serve llama.cpp --model /path/to/model.gguf --ctx 16384

Option	Description
`-m, --model TEXT`	Model id / weight path (required for `mlx` and `llama.cpp`)
`-p, --port INTEGER`	Port (default: engine default)
`--host TEXT`	Bind host (default `127.0.0.1`)
`--ctx INTEGER`	Context window (where the engine supports it)
`--no-wait`	Return immediately, do not wait for readiness
`--build`	(ds4) Build the `ds4-server` binary first if missing
`--allow-download`	(mlx) Permit downloading the model from Hugging Face if not cached
`--extra TEXT`	Extra flag passed to the server (repeatable)

Engines: ollama, lmstudio, mlx, ds4, llama.cpp.

`local servers`¶

Show the status of every known local server (running, managed, pid).

superqode local servers

Option	Description
`--json`	Emit status as JSON

`local stop`¶

Stop a server SuperQode started (adopted servers are left untouched).

superqode local stop mlx

`local models`¶

List chat-capable models on running local servers (embedding/reranker models are hidden). Omit the engine to scan every running server.

superqode local models
superqode local models ollama

Option	Description
`--json`	Emit models as JSON

Readiness And Performance¶

`local warm`¶

Preload a model and report first-token latency. Run it before a session so the first real prompt does not pay the model-load cost. A high TTFT here usually means the context window is too large for the hardware.

superqode local warm ollama --model qwen3:8b

Option	Description
`-m, --model TEXT`	Model id to preload (default: first served model)
`--max-tokens INTEGER`	Tokens to generate (default `8`)

`local smoke`¶

Non-destructive coding-readiness check: server reachable, chat model loaded (not embedding-only), context window detected, TTFT/decode measured, and clean tool-call/patch output on a tiny prompt. Never reads or edits your repo.

superqode local smoke --repo .

Option	Description
`--engine TEXT`	Local engine id (default: first running server)
`--endpoint TEXT`	OpenAI-compatible base URL
`--model TEXT`	Model id (default: first served chat model)
`--repo DIRECTORY`	Repository to label in the report
`--api-key TEXT`	Bearer token if the endpoint needs one
`--max-tokens INTEGER`	Tokens to generate (default `384`)
`--json`	Emit report as JSON

`local bench`¶

Measure TTFT and decode speed with a coding prompt. Without --endpoint, benches the first model of every running engine. TTFT (prefill) matters most for agent loops.

superqode local bench
superqode local bench --endpoint http://localhost:11434/v1 --agentic

Option	Description
`--endpoint URL`	OpenAI-compatible base URL (default: every running engine)
`--model TEXT`	Model id to bench (repeatable)
`--max-tokens INTEGER`	Tokens to generate (default `256`)
`--api-key TEXT`	Bearer token if the endpoint needs one
`--agentic`	Also probe tool-call, edit-format, shell-call, context-recall
`--json`	Emit results as JSON

`local optimize`¶

Benchmark candidate models and recommend role-specific routing (planner / implementer / reviewer / utility), optionally writing a role-routed harness.

superqode local optimize --repo .
superqode local optimize --generate superqode.local.yaml

Option	Description
`--endpoint URL`	OpenAI-compatible base URL (default: every running engine)
`--model TEXT`	Candidate model id (repeatable)
`--role TEXT`	Workflow role to optimize (default: all four roles)
`--repo DIRECTORY`	Repository to size when scoring routes
`--max-tokens INTEGER`	Tokens to generate (default `384`)
`--api-key TEXT`	Bearer token if the endpoint needs one
`--generate PATH`	Write a role-routed harness spec
`--name TEXT`	Name for the generated harness (default `local-optimized`)
`--json`	Emit report as JSON

`local guardrails`¶

Recommend conservative runtime limits (context cap, worker concurrency, memory headroom) for this machine.

superqode local guardrails --repo .

Option	Description
`--json`	Emit guardrails as JSON
`--repo DIRECTORY`	Repository to include when capping context and concurrency

Local Providers: the guided local-coding walkthrough.
Local Agentic Coding: the local-first positioning.
Local Stack Doctor: how the recommendation matrix works.
Harness Commands: the superqode harness group.

Local Commands¶

Command Summary¶

Getting Started¶

local init¶

local doctor¶

Finding Models¶

local search¶

local labs¶

local packs¶

Servers¶

local serve¶

local servers¶

local stop¶

local models¶