πŸš€ Getting Started

πŸš€ Getting Started with AgentVectorDB (AVDB)

AgentVectorDB (AVDB) is a specialized vector database designed for AI agents, built on LanceDB. This guide will help you set up and start using AgentVectorDB in your projects.

🎯 Key Features

  • πŸ“ Semantic search optimized for AI agent memories
  • πŸ”„ Async/sync APIs for flexible integration
  • πŸ” Advanced filtering and querying capabilities
  • πŸ“Š Customizable schema support
  • πŸš„ High-performance vector operations
  • πŸ’Ύ Persistent storage with LanceDB backend

βœ… Prerequisites

Before installing AgentVectorDB, ensure you have:

  • 🐍 Python 3.8 or higher installed
  • πŸ“¦ pip (Python package installer)
  • πŸ”§ Virtual environment tool (venv or conda)
  • πŸ’» Terminal or command prompt access

πŸ“₯ Installation Guide

1. Setting Up Python Environment

πŸ”¨ Using venv (Python's built-in virtual environment)

For Mac/Linux:

# Create a new directory for your project
mkdir my_agent_project
cd my_agent_project
 
# Create a virtual environment
python3 -m venv venv
 
# Activate the virtual environment
source venv/bin/activate
 
# Verify Python version
python --version

For Windows:

# Create a new directory for your project
mkdir my_agent_project
cd my_agent_project
 
# Create a virtual environment
python -m venv venv
 
# Activate the virtual environment
.\venv\Scripts\activate
 
# Verify Python version
python --version

🐍 Using Conda

# Create a new conda environment
conda create -n agentdb python=3.12
 
# Activate the conda environment
conda activate agentdb
 
# Verify conda environment
conda info --envs

2. πŸ“¦ Installing AgentVectorDB

# Basic installation
pip install agentvectordb
 
# Installation with all optional dependencies (recommended for full features)
pip install agentvectordb[all]
 
# Development installation (if you want to contribute)
git clone https://github.com/superagenticai/agentvectordb.git
cd agentvectordb
pip install -e ".[dev]"

3. βœ… Verify Installation

# Check installed version
python -c "import agentvectordb; print(agentvectordb.__version__)"
 
# Verify dependencies
pip freeze | grep agentvectordb

πŸŽ“ Basic Usage

1. πŸ—οΈ Create a Store

from agentvectordb import AgentVectorDBStore
from agentvectordb.embeddings import DefaultTextEmbeddingFunction
 
# Initialize the store with path
store = AgentVectorDBStore(db_path="./my_agent_db")
 
# Create embedding function with specific dimensions
ef = DefaultTextEmbeddingFunction(dimension=64)

2. πŸ“š Create a Collection

# Create or get a collection with specific embedding function
collection = store.get_or_create_collection(
    name="agent_memories",
    embedding_function=ef,
    recreate=False  # Set to True only if you want to delete existing collection
)

3. πŸ’Ύ Add Memories

Important: The vector store requires at least 2 examples for proper index creation using KMeans. Adding fewer items will result in "Skipping vector index creation: not enough rows for KMeans" warning.

# Add multiple memories in batch to ensure vector index creation
memories = [
    {
        "content": "The sky appears blue due to Rayleigh scattering of sunlight",
        "type": "scientific_fact",
        "importance_score": 0.8,
        "metadata": {
            "domain": "physics",
            "confidence": "high",
            "tags": ["science", "physics", "optics"]
        }
    },
    {
        "content": "API response times reduced by 40% after optimization",
        "type": "performance",
        "importance_score": 0.89,
        "metadata": {
            "component": "api",
            "improvement": "significant"
        }
    }
]
 
# Add memories in batch
collection.add_batch(memories)
 
# After batch initialization, you can add single memories
collection.add(
    content="New observation about system",
    type="observation",
    importance_score=0.75,  # Note the correct syntax here
    metadata={"category": "system"}
)
 
# Verify the collection size
print(f"Collection size: {collection.count()}")

Best Practices for Adding Memories

  1. Always start with a batch of at least 8 diverse memories (10+ recommended)
  2. Use add_batch() for initial data loading
  3. Ensure memories cover different types and contexts
  4. After initial batch, you can use single add() operations
  5. Include varied metadata and types for better vector clustering

4. πŸ” Query Memories

# Simple semantic search
results = collection.query(
    query_text="Why is the sky blue?",
    k=2  # Number of results to return
)
 
# Process results
for result in results:
    print(f"Content: {result['content']}")
    print(f"Similarity Score: {result['_distance']}")
    print(f"Metadata: {result.get('metadata', {})}")
 
# Query with filters
filtered_results = collection.query(
    query_text="user preferences",
    k=5,
    filter_sql="type = 'user_preference' AND importance_score > 0.5"
)

Async Usage

AgentVectorDB provides async APIs for better integration with async applications:

import asyncio
from agentvectordb import AsyncAgentVectorDBStore
from agentvectordb.embeddings import DefaultTextEmbeddingFunction
 
async def main():
    # Create embedding function
    ef = DefaultTextEmbeddingFunction(dimension=64)
 
    # Initialize async store
    store = AsyncAgentVectorDBStore(db_path="./async_db")
    
    # Create collection
    collection = await store.get_or_create_collection(
        name="async_memories",
        embedding_function=ef,
        recreate=True  # Start fresh
    )
    
    # Create initial memories
    initial_memories = [
        {
            "content": "System started processing batch job",
            "type": "system_log",
            "metadata": {"operation": "batch_start"}
        },
        {
            "content": "Processing async operation",
            "type": "system_log",
            "metadata": {"timestamp": "2024-05-19"}
        }
    ]
 
    # Add initial batch
    try:
        await collection.add_batch(initial_memories)
        print("Successfully added initial memories")
        # Note: This will show a warning about KMeans index creation
        # as it requires minimum 8 entries
    except Exception as e:
        print(f"Error adding batch memories: {e}")
        return None
    
    # Query memories with timeout
    try:
        results = await asyncio.wait_for(
            collection.query(
                query_text="async processing",
                k=1
            ),
            timeout=5.0
        )
        return results
    except asyncio.TimeoutError:
        print("Query timed out")
        return None
 
# Run async code
if __name__ == "__main__":
    result = asyncio.run(main())
    if result:
        print("\nQuery Results:")
        for item in result:
            print(f"Content: {item['content']}")
            print(f"Score: {item['_distance']}")
            print("---")

Note: While this example shows basic async usage with two memories, you'll see a warning: "Skipping vector index creation: not enough rows for KMeans." For production use, it's recommended to start with at least 5 diverse memories as shown in the earlier batch example.

πŸ”§ Advanced Configuration

Custom Schema Definition

from pydantic import BaseModel, Field
from typing import Optional, List
 
class CustomMemorySchema(BaseModel):
    content: str
    importance: float = Field(ge=0.0, le=1.0)
    tags: Optional[List[str]] = []
    source_id: Optional[str] = None

Performance Optimization

# Batch operations for better performance
with collection.batch_add() as batch:
    for memory in large_memory_list:
        batch.add(memory)

🚨 Common Pitfalls

  1. Memory Management: Watch memory usage with large collections
  2. Vector Dimensions: Choose appropriate dimensions for your use case
  3. Batch Operations: Use batch operations for large datasets
  4. Index Updates: Consider index update frequency

πŸ”œ Next Steps

  1. πŸ“š Explore the Core Concepts section
  2. πŸ“– Check the API Reference
  3. πŸ’‘ View Examples for more usage patterns
  4. πŸ“‹ Read the Guides for best practices

πŸ†˜ Support & Resources

🀝 Contributing

We welcome contributions! See our Contributing Guide for details.