Skip to content

turboagents

turboagents logo
Compression Infrastructure For Real Systems

TurboQuant for agent runtimes and retrieval stacks

turboagents is a Python package for TurboQuant-style KV-cache and vector compression. It is designed to sit under existing AI systems, not replace them.

Quant Core

Fast Walsh-Hadamard rotation, PolarQuant-style angle/radius encoding, seeded QJL-style residual sketch, and binary payload serialization.

Real Adapters

MLX, llama.cpp, experimental vLLM, plus Chroma, FAISS, LanceDB, SurrealDB, and pgvector retrieval surfaces.

Validated Benchmarks

Benchmark matrix, MLX sweep, live pgvector validation, and a minimal Needle-style long-context harness.

Product Overview super-agentic.ai/turboagents
Reference Integration superoptix.ai/turboagents
Perfect Top-10 Recall Chroma and FAISS both held full top-10 retrieval accuracy on the validated benchmark sweep.
Strong PostgreSQL Path pgvector reached 0.896875 top-10 recall at 4.0 bits in live PostgreSQL validation.
Best MLX Tradeoff 3.5 bits was the best quality and throughput balance in the 3B MLX benchmark run.
Reference Integration SuperOptiX is the first full application integration with real demo and retrieval coverage.

Quick Snapshot

Surface Current Evidence
Chroma local adapter benchmark reached recall@10 = 1.0 across the tested bit-width sweep
MLX cached 3B smoke test passes and the 3B sweep identified 3.5 bits as the best current tradeoff
FAISS recall@10 = 1.0 across the tested medium-rag bit-width sweep
pgvector live PostgreSQL 17 validation completed, with recall@10 = 0.896875 at 4.0 bits
Needle exact-match retrieval only held at insertion fraction 0.1; not yet robust at 0.5 or 0.9
Use it when you already have: - an agent runtime that is hitting KV-cache or context limits - a RAG stack with growing vector storage cost - an inference layer built on MLX, llama.cpp, or vLLM - a retrieval layer built on Chroma, FAISS, LanceDB, SurrealDB, or pgvector

What TurboAgents Is

TurboAgents combines a reusable quantization core, runnable benchmark surfaces, engine wrappers, and retrieval adapters in one package. The point is not to give you a new orchestration layer. The point is to make the runtime and retrieval layers underneath your existing stack cheaper, smaller, and easier to measure.

Why The Current Version Is Useful

The current version is already useful if you want to test compressed retrieval behavior without writing your own harness, compare FAISS, LanceDB, Chroma, and pgvector on the same synthetic workload, script MLX-based serving paths, or get hard numbers about where the current long-context story holds and where it fails. It is a practical package now, not just an experimental code dump.

How To Use It

Most users land in one of three entry points.

Under An Existing Agent Runtime

Keep your current agent framework and use TurboAgents to improve the runtime under it. The current package is best suited to MLX-based local agents, llama.cpp-based local stacks, and experimental vLLM-backed serving paths where you want to inspect the runtime contract without rewriting the rest of the application.

Under An Existing RAG Stack

Keep your current application logic and use TurboAgents in the retrieval layer. That can mean FAISS-backed local retrieval, Chroma candidate search with a TurboAgents rerank pass, LanceDB or SurrealDB sidecar retrieval, or a PostgreSQL application that already depends on pgvector.

As A Benchmark And Compression Tool

If you are still evaluating fit, start with the CLI:

  • turboagents doctor
  • turboagents bench kv
  • turboagents bench rag
  • turboagents compress
  • uv run python scripts/run_benchmark_matrix.py --output-dir benchmark-results/<run-id>

That gives you a low-risk way to decide where deeper integration is worth it.

Start Here

If you are evaluating the project quickly, use this order:

  1. Read Getting Started and install with uv.
  2. Run the synthetic CLI benchmarks locally.
  3. Read Adapters and Examples to pick the backend path you actually need.
  4. Read Benchmarks for the current benchmark results.
  5. Read Architecture if you want the runtime and retrieval layout.

Reference Integration

TurboAgents is designed to stay standalone, but the first full reference integration is now SuperOptiX.

That integration currently proves that turboagents-chroma works as a SuperOptiX retrieval option, turboagents-lancedb works end to end in the LanceDB demo, and turboagents-surrealdb works end to end in the OpenAI Agents and Pydantic AI demo paths.

If you want the end-to-end application story rather than the package-only API, read the SuperOptiX TurboAgents guide after this page.

Included In This Release

This release includes the quantization core, binary payload serialization, synthetic benchmark CLI, Chroma and FAISS retrieval paths, LanceDB and SurrealDB sidecar adapters, a pgvector client adapter, MLX and llama.cpp runtime wrappers, an experimental vLLM wrapper, the checked-in benchmark harness, and the minimal Needle-style long-context evaluation surface.