π Getting Started with AgentVectorDB (AVDB)
AgentVectorDB (AVDB) is a specialized vector database designed for AI agents, built on LanceDB. This guide will help you set up and start using AgentVectorDB in your projects.
π― Key Features
- π Semantic search optimized for AI agent memories
- π Async/sync APIs for flexible integration
- π Advanced filtering and querying capabilities
- π Customizable schema support
- π High-performance vector operations
- πΎ Persistent storage with LanceDB backend
β Prerequisites
Before installing AgentVectorDB, ensure you have:
- π Python 3.8 or higher installed
- π¦ pip (Python package installer)
- π§ Virtual environment tool (venv or conda)
- π» Terminal or command prompt access
π₯ Installation Guide
1. Setting Up Python Environment
π¨ Using venv (Python's built-in virtual environment)
For Mac/Linux:
# Create a new directory for your project
mkdir my_agent_project
cd my_agent_project
# Create a virtual environment
python3 -m venv venv
# Activate the virtual environment
source venv/bin/activate
# Verify Python version
python --version
For Windows:
# Create a new directory for your project
mkdir my_agent_project
cd my_agent_project
# Create a virtual environment
python -m venv venv
# Activate the virtual environment
.\venv\Scripts\activate
# Verify Python version
python --version
π Using Conda
# Create a new conda environment
conda create -n agentdb python=3.12
# Activate the conda environment
conda activate agentdb
# Verify conda environment
conda info --envs
2. π¦ Installing AgentVectorDB
# Basic installation
pip install agentvectordb
# Installation with all optional dependencies (recommended for full features)
pip install agentvectordb[all]
# Development installation (if you want to contribute)
git clone https://github.com/superagenticai/agentvectordb.git
cd agentvectordb
pip install -e ".[dev]"
3. β Verify Installation
# Check installed version
python -c "import agentvectordb; print(agentvectordb.__version__)"
# Verify dependencies
pip freeze | grep agentvectordb
π Basic Usage
1. ποΈ Create a Store
from agentvectordb import AgentVectorDBStore
from agentvectordb.embeddings import DefaultTextEmbeddingFunction
# Initialize the store with path
store = AgentVectorDBStore(db_path="./my_agent_db")
# Create embedding function with specific dimensions
ef = DefaultTextEmbeddingFunction(dimension=64)
2. π Create a Collection
# Create or get a collection with specific embedding function
collection = store.get_or_create_collection(
name="agent_memories",
embedding_function=ef,
recreate=False # Set to True only if you want to delete existing collection
)
3. πΎ Add Memories
Important: The vector store requires at least 2 examples for proper index creation using KMeans. Adding fewer items will result in "Skipping vector index creation: not enough rows for KMeans" warning.
# Add multiple memories in batch to ensure vector index creation
memories = [
{
"content": "The sky appears blue due to Rayleigh scattering of sunlight",
"type": "scientific_fact",
"importance_score": 0.8,
"metadata": {
"domain": "physics",
"confidence": "high",
"tags": ["science", "physics", "optics"]
}
},
{
"content": "API response times reduced by 40% after optimization",
"type": "performance",
"importance_score": 0.89,
"metadata": {
"component": "api",
"improvement": "significant"
}
}
]
# Add memories in batch
collection.add_batch(memories)
# After batch initialization, you can add single memories
collection.add(
content="New observation about system",
type="observation",
importance_score=0.75, # Note the correct syntax here
metadata={"category": "system"}
)
# Verify the collection size
print(f"Collection size: {collection.count()}")
Best Practices for Adding Memories
- Always start with a batch of at least 8 diverse memories (10+ recommended)
- Use
add_batch()
for initial data loading - Ensure memories cover different types and contexts
- After initial batch, you can use single
add()
operations - Include varied metadata and types for better vector clustering
4. π Query Memories
# Simple semantic search
results = collection.query(
query_text="Why is the sky blue?",
k=2 # Number of results to return
)
# Process results
for result in results:
print(f"Content: {result['content']}")
print(f"Similarity Score: {result['_distance']}")
print(f"Metadata: {result.get('metadata', {})}")
# Query with filters
filtered_results = collection.query(
query_text="user preferences",
k=5,
filter_sql="type = 'user_preference' AND importance_score > 0.5"
)
Async Usage
AgentVectorDB provides async APIs for better integration with async applications:
import asyncio
from agentvectordb import AsyncAgentVectorDBStore
from agentvectordb.embeddings import DefaultTextEmbeddingFunction
async def main():
# Create embedding function
ef = DefaultTextEmbeddingFunction(dimension=64)
# Initialize async store
store = AsyncAgentVectorDBStore(db_path="./async_db")
# Create collection
collection = await store.get_or_create_collection(
name="async_memories",
embedding_function=ef,
recreate=True # Start fresh
)
# Create initial memories
initial_memories = [
{
"content": "System started processing batch job",
"type": "system_log",
"metadata": {"operation": "batch_start"}
},
{
"content": "Processing async operation",
"type": "system_log",
"metadata": {"timestamp": "2024-05-19"}
}
]
# Add initial batch
try:
await collection.add_batch(initial_memories)
print("Successfully added initial memories")
# Note: This will show a warning about KMeans index creation
# as it requires minimum 8 entries
except Exception as e:
print(f"Error adding batch memories: {e}")
return None
# Query memories with timeout
try:
results = await asyncio.wait_for(
collection.query(
query_text="async processing",
k=1
),
timeout=5.0
)
return results
except asyncio.TimeoutError:
print("Query timed out")
return None
# Run async code
if __name__ == "__main__":
result = asyncio.run(main())
if result:
print("\nQuery Results:")
for item in result:
print(f"Content: {item['content']}")
print(f"Score: {item['_distance']}")
print("---")
Note: While this example shows basic async usage with two memories, you'll see a warning: "Skipping vector index creation: not enough rows for KMeans." For production use, it's recommended to start with at least 5 diverse memories as shown in the earlier batch example.
π§ Advanced Configuration
Custom Schema Definition
from pydantic import BaseModel, Field
from typing import Optional, List
class CustomMemorySchema(BaseModel):
content: str
importance: float = Field(ge=0.0, le=1.0)
tags: Optional[List[str]] = []
source_id: Optional[str] = None
Performance Optimization
# Batch operations for better performance
with collection.batch_add() as batch:
for memory in large_memory_list:
batch.add(memory)
π¨ Common Pitfalls
- Memory Management: Watch memory usage with large collections
- Vector Dimensions: Choose appropriate dimensions for your use case
- Batch Operations: Use batch operations for large datasets
- Index Updates: Consider index update frequency
π Next Steps
- π Explore the Core Concepts section
- π Check the API Reference
- π‘ View Examples for more usage patterns
- π Read the Guides for best practices
π Support & Resources
π€ Contributing
We welcome contributions! See our Contributing Guide for details.