turboagents¶
TurboQuant for agent runtimes and retrieval stacks
turboagents is a Python package for TurboQuant-style KV-cache and vector compression. It is designed to sit under existing AI systems, not replace them.
Quant Core
Fast Walsh-Hadamard rotation, PolarQuant-style angle/radius encoding, seeded QJL-style residual sketch, and binary payload serialization.
Real Adapters
MLX, llama.cpp, experimental vLLM, plus Chroma, FAISS, LanceDB, SurrealDB, and pgvector retrieval surfaces.
Validated Benchmarks
Benchmark matrix, MLX sweep, live pgvector validation, and a minimal Needle-style long-context harness.
Quick Snapshot¶
| Surface | Current Evidence |
|---|---|
| Chroma | local adapter benchmark reached recall@10 = 1.0 across the tested bit-width sweep |
| MLX | cached 3B smoke test passes and the 3B sweep identified 3.5 bits as the best current tradeoff |
| FAISS | recall@10 = 1.0 across the tested medium-rag bit-width sweep |
| pgvector | live PostgreSQL 17 validation completed, with recall@10 = 0.896875 at 4.0 bits |
| Needle | exact-match retrieval only held at insertion fraction 0.1; not yet robust at 0.5 or 0.9 |
What TurboAgents Is¶
TurboAgents combines a reusable quantization core, runnable benchmark surfaces, engine wrappers, and retrieval adapters in one package. The point is not to give you a new orchestration layer. The point is to make the runtime and retrieval layers underneath your existing stack cheaper, smaller, and easier to measure.
Why The Current Version Is Useful¶
The current version is already useful if you want to test compressed retrieval behavior without writing your own harness, compare FAISS, LanceDB, Chroma, and pgvector on the same synthetic workload, script MLX-based serving paths, or get hard numbers about where the current long-context story holds and where it fails. It is a practical package now, not just an experimental code dump.
How To Use It¶
Most users land in one of three entry points.
Under An Existing Agent Runtime¶
Keep your current agent framework and use TurboAgents to improve the runtime under it. The current package is best suited to MLX-based local agents, llama.cpp-based local stacks, and experimental vLLM-backed serving paths where you want to inspect the runtime contract without rewriting the rest of the application.
Under An Existing RAG Stack¶
Keep your current application logic and use TurboAgents in the retrieval layer. That can mean FAISS-backed local retrieval, Chroma candidate search with a TurboAgents rerank pass, LanceDB or SurrealDB sidecar retrieval, or a PostgreSQL application that already depends on pgvector.
As A Benchmark And Compression Tool¶
If you are still evaluating fit, start with the CLI:
turboagents doctorturboagents bench kvturboagents bench ragturboagents compressuv run python scripts/run_benchmark_matrix.py --output-dir benchmark-results/<run-id>
That gives you a low-risk way to decide where deeper integration is worth it.
Start Here¶
If you are evaluating the project quickly, use this order:
- Read Getting Started and install with
uv. - Run the synthetic CLI benchmarks locally.
- Read Adapters and Examples to pick the backend path you actually need.
- Read Benchmarks for the current benchmark results.
- Read Architecture if you want the runtime and retrieval layout.
Reference Integration¶
TurboAgents is designed to stay standalone, but the first full reference integration is now SuperOptiX.
That integration currently proves that turboagents-chroma works as a
SuperOptiX retrieval option, turboagents-lancedb works end to end in the
LanceDB demo, and turboagents-surrealdb works end to end in the OpenAI
Agents and Pydantic AI demo paths.
If you want the end-to-end application story rather than the package-only API, read the SuperOptiX TurboAgents guide after this page.
Included In This Release¶
This release includes the quantization core, binary payload serialization, synthetic benchmark CLI, Chroma and FAISS retrieval paths, LanceDB and SurrealDB sidecar adapters, a pgvector client adapter, MLX and llama.cpp runtime wrappers, an experimental vLLM wrapper, the checked-in benchmark harness, and the minimal Needle-style long-context evaluation surface.