turboagents¶

Compression Infrastructure For Real Systems

TurboQuant for agent runtimes and retrieval stacks

turboagents is a Python package for TurboQuant-style KV-cache and vector compression. It is designed to sit under existing AI systems, not replace them.

Visit Website Start With uv See Benchmarks SuperOptiX Integration Browse Adapters

Quant Core

Fast Walsh-Hadamard rotation, PolarQuant-style angle/radius encoding, seeded QJL-style residual sketch, and binary payload serialization.

Real Adapters

MLX, llama.cpp, experimental vLLM, plus Chroma, FAISS, LanceDB, SurrealDB, and pgvector retrieval surfaces.

Validated Benchmarks

Benchmark matrix, MLX sweep, live pgvector validation, and a minimal Needle-style long-context harness.

Product Overview super-agentic.ai/turboagents

Reference Integration superoptix.ai/turboagents

Perfect Top-10 Recall Chroma and FAISS both held full top-10 retrieval accuracy on the validated benchmark sweep.

Strong PostgreSQL Path pgvector reached 0.896875 top-10 recall at 4.0 bits in live PostgreSQL validation.

Best MLX Tradeoff 3.5 bits was the best quality and throughput balance in the 3B MLX benchmark run.

Reference Integration SuperOptiX is the first full application integration with real demo and retrieval coverage.

Quick Snapshot¶

Surface	Current Evidence
Chroma	local adapter benchmark reached `recall@10 = 1.0` across the tested bit-width sweep
MLX	cached `3B` smoke test passes and the `3B` sweep identified `3.5` bits as the best current tradeoff
FAISS	`recall@10 = 1.0` across the tested `medium-rag` bit-width sweep
pgvector	live PostgreSQL `17` validation completed, with `recall@10 = 0.896875` at `4.0` bits
Needle	exact-match retrieval only held at insertion fraction `0.1`; not yet robust at `0.5` or `0.9`

Use it when you already have: - an agent runtime that is hitting KV-cache or context limits - a RAG stack with growing vector storage cost - an inference layer built on MLX, llama.cpp, or vLLM - a retrieval layer built on Chroma, FAISS, LanceDB, SurrealDB, or pgvector

What TurboAgents Is¶

TurboAgents combines a reusable quantization core, runnable benchmark surfaces, engine wrappers, and retrieval adapters in one package. The point is not to give you a new orchestration layer. The point is to make the runtime and retrieval layers underneath your existing stack cheaper, smaller, and easier to measure.

Why The Current Version Is Useful¶

The current version is already useful if you want to test compressed retrieval behavior without writing your own harness, compare FAISS, LanceDB, Chroma, and pgvector on the same synthetic workload, script MLX-based serving paths, or get hard numbers about where the current long-context story holds and where it fails. It is a practical package now, not just an experimental code dump.

How To Use It¶

Most users land in one of three entry points.

Under An Existing Agent Runtime¶

Keep your current agent framework and use TurboAgents to improve the runtime under it. The current package is best suited to MLX-based local agents, llama.cpp-based local stacks, and experimental vLLM-backed serving paths where you want to inspect the runtime contract without rewriting the rest of the application.

Under An Existing RAG Stack¶

Keep your current application logic and use TurboAgents in the retrieval layer. That can mean FAISS-backed local retrieval, Chroma candidate search with a TurboAgents rerank pass, LanceDB or SurrealDB sidecar retrieval, or a PostgreSQL application that already depends on pgvector.

As A Benchmark And Compression Tool¶

If you are still evaluating fit, start with the CLI:

turboagents doctor
turboagents bench kv
turboagents bench rag
turboagents compress
uv run python scripts/run_benchmark_matrix.py --output-dir benchmark-results/<run-id>

That gives you a low-risk way to decide where deeper integration is worth it.

Start Here¶

If you are evaluating the project quickly, use this order:

Read Getting Started and install with uv.
Run the synthetic CLI benchmarks locally.
Read Adapters and Examples to pick the backend path you actually need.
Read Benchmarks for the current benchmark results.
Read Architecture if you want the runtime and retrieval layout.

Reference Integration¶

TurboAgents is designed to stay standalone, but the first full reference integration is now SuperOptiX.

That integration currently proves that turboagents-chroma works as a SuperOptiX retrieval option, turboagents-lancedb works end to end in the LanceDB demo, and turboagents-surrealdb works end to end in the OpenAI Agents and Pydantic AI demo paths.

If you want the end-to-end application story rather than the package-only API, read the SuperOptiX TurboAgents guide after this page.

Included In This Release¶

This release includes the quantization core, binary payload serialization, synthetic benchmark CLI, Chroma and FAISS retrieval paths, LanceDB and SurrealDB sidecar adapters, a pgvector client adapter, MLX and llama.cpp runtime wrappers, an experimental vLLM wrapper, the checked-in benchmark harness, and the minimal Needle-style long-context evaluation surface.