💡 Core Concepts

Vector Database Fundamentals

What is a Vector Database?

Vector databases store and manage high-dimensional vectors, enabling semantic search and similarity matching. In AgentVectorDB, these vectors represent agent memories, thoughts, and knowledge.

Embeddings

Embeddings are numerical representations of data (text, images, etc.) that capture semantic meaning. AgentVectorDB uses embeddings to:

Store memories efficiently
Enable semantic search
Find similar content
Support clustering for efficient retrieval

Key Components

🏪 Stores

A Store in AgentVectorDB is the main entry point for:

Managing collections
Handling database connections
Coordinating operations

from agentvectordb import AgentVectorDBStore
 
# Initialize store with a specific path
store = AgentVectorDBStore(db_path="./my_db")

📚 Collections

Collections organize related memories and provide:

Consistent schema enforcement
Optimized retrieval
Metadata management
Vector index management

Important: Collections require at least 8 entries for proper KMeans vector index creation.

from agentvectordb.embeddings import DefaultTextEmbeddingFunction
 
# Create embedding function
ef = DefaultTextEmbeddingFunction(dimension=64)
 
# Create collection
collection = store.get_or_create_collection(
    name="agent_memories",
    embedding_function=ef,
    recreate=False  # Set to True to start fresh
)
 
# Add initial batch of memories (minimum 8 required)
initial_memories = [
    {
        "content": "First observation",
        "type": "observation",
        "importance_score": 0.8,
        "metadata": {"category": "startup"}
    },
    # ... add at least 8 memories for proper initialization
]
 
# Add batch
collection.add_batch(initial_memories)

🧠 Memory Entries

Each memory entry contains:

Content (the actual information)
Type (categorization)
Importance score (0.0 to 1.0)
Metadata (additional context)
Embedding vector (automatically generated)

# After initialization with 8+ memories, you can add single entries
collection.add(
    content="Important observation",
    type="observation",
    importance_score=0.9,
    metadata={"context": "meeting"}
)

Advanced Concepts

Embedding Functions

Custom embedding functions can be implemented:

from agentvectordb.embeddings import BaseEmbeddingFunction
 
class CustomEmbedder(BaseEmbeddingFunction):
    def __init__(self):
        super().__init__(dimension=384)
    
    def embed(self, texts):
        # Your embedding logic here
        return vectors

Async Operations

Async support for high-performance applications:

import asyncio
from agentvectordb import AsyncAgentVectorDBStore
 
async def main():
    # Initialize async store
    store = AsyncAgentVectorDBStore(db_path="./async_db")
    
    # Create collection with embedding function
    ef = DefaultTextEmbeddingFunction(dimension=64)
    collection = await store.get_or_create_collection(
        name="async_memories",
        embedding_function=ef
    )
    
    # Add memories with error handling
    try:
        await collection.add(
            content="Async thought",
            type="observation",
            metadata={"timestamp": "2024-05-19"}
        )
    except Exception as e:
        print(f"Error: {e}")
 
# Run async code
asyncio.run(main())

Memory Management

AgentVectorDB provides several memory management features:

Vector Index Requirements

# Minimum 8 diverse entries required for KMeans index
collection.add_batch([
    # Add 8+ diverse memories here
    # See earlier examples for full structure
])

Memory Types and Importance

collection.add(
    content="Critical system event",
    type="system_event",
    importance_score=1.0  # Highest importance
)

Rich Metadata Support

collection.add(
    content="Meeting notes",
    type="notes",
    importance_score=0.8,
    metadata={
        "date": "2024-05-19",
        "participants": ["Alice", "Bob"],
        "project": "Nebula",
        "tags": ["meeting", "planning"]
    }
)

Best Practices

1. Collection Initialization

Always start with 8+ diverse memories
Use add_batch() for initial data loading
Ensure proper embedding function configuration

2. Memory Structure

Include relevant metadata
Set appropriate importance scores
Use consistent memory types
Provide diverse content for better clustering

3. Query Optimization

Use specific queries
Implement proper error handling
Consider using timeouts for async operations
Balance between precision and recall

4. Performance Considerations

Monitor memory usage with large collections
Use batch operations for bulk insertions
Consider async operations for better concurrency
Properly handle vector index creation

Check the Guides section for detailed implementation examples!

🚀 Getting Started 📖 Guides