πŸ’‘ Core Concepts

πŸ’‘ Core Concepts

Vector Database Fundamentals

What is a Vector Database?

Vector databases store and manage high-dimensional vectors, enabling semantic search and similarity matching. In AgentVectorDB, these vectors represent agent memories, thoughts, and knowledge.

Embeddings

Embeddings are numerical representations of data (text, images, etc.) that capture semantic meaning. AgentVectorDB uses embeddings to:

  • Store memories efficiently
  • Enable semantic search
  • Find similar content
  • Support clustering for efficient retrieval

Key Components

πŸͺ Stores

A Store in AgentVectorDB is the main entry point for:

  • Managing collections
  • Handling database connections
  • Coordinating operations
from agentvectordb import AgentVectorDBStore
 
# Initialize store with a specific path
store = AgentVectorDBStore(db_path="./my_db")

πŸ“š Collections

Collections organize related memories and provide:

  • Consistent schema enforcement
  • Optimized retrieval
  • Metadata management
  • Vector index management

Important: Collections require at least 8 entries for proper KMeans vector index creation.

from agentvectordb.embeddings import DefaultTextEmbeddingFunction
 
# Create embedding function
ef = DefaultTextEmbeddingFunction(dimension=64)
 
# Create collection
collection = store.get_or_create_collection(
    name="agent_memories",
    embedding_function=ef,
    recreate=False  # Set to True to start fresh
)
 
# Add initial batch of memories (minimum 8 required)
initial_memories = [
    {
        "content": "First observation",
        "type": "observation",
        "importance_score": 0.8,
        "metadata": {"category": "startup"}
    },
    # ... add at least 8 memories for proper initialization
]
 
# Add batch
collection.add_batch(initial_memories)

🧠 Memory Entries

Each memory entry contains:

  • Content (the actual information)
  • Type (categorization)
  • Importance score (0.0 to 1.0)
  • Metadata (additional context)
  • Embedding vector (automatically generated)
# After initialization with 8+ memories, you can add single entries
collection.add(
    content="Important observation",
    type="observation",
    importance_score=0.9,
    metadata={"context": "meeting"}
)

Advanced Concepts

Embedding Functions

Custom embedding functions can be implemented:

from agentvectordb.embeddings import BaseEmbeddingFunction
 
class CustomEmbedder(BaseEmbeddingFunction):
    def __init__(self):
        super().__init__(dimension=384)
    
    def embed(self, texts):
        # Your embedding logic here
        return vectors

Async Operations

Async support for high-performance applications:

import asyncio
from agentvectordb import AsyncAgentVectorDBStore
 
async def main():
    # Initialize async store
    store = AsyncAgentVectorDBStore(db_path="./async_db")
    
    # Create collection with embedding function
    ef = DefaultTextEmbeddingFunction(dimension=64)
    collection = await store.get_or_create_collection(
        name="async_memories",
        embedding_function=ef
    )
    
    # Add memories with error handling
    try:
        await collection.add(
            content="Async thought",
            type="observation",
            metadata={"timestamp": "2024-05-19"}
        )
    except Exception as e:
        print(f"Error: {e}")
 
# Run async code
asyncio.run(main())

Memory Management

AgentVectorDB provides several memory management features:

Vector Index Requirements

# Minimum 8 diverse entries required for KMeans index
collection.add_batch([
    # Add 8+ diverse memories here
    # See earlier examples for full structure
])

Memory Types and Importance

collection.add(
    content="Critical system event",
    type="system_event",
    importance_score=1.0  # Highest importance
)

Rich Metadata Support

collection.add(
    content="Meeting notes",
    type="notes",
    importance_score=0.8,
    metadata={
        "date": "2024-05-19",
        "participants": ["Alice", "Bob"],
        "project": "Nebula",
        "tags": ["meeting", "planning"]
    }
)

Best Practices

1. Collection Initialization

  • Always start with 8+ diverse memories
  • Use add_batch() for initial data loading
  • Ensure proper embedding function configuration

2. Memory Structure

  • Include relevant metadata
  • Set appropriate importance scores
  • Use consistent memory types
  • Provide diverse content for better clustering

3. Query Optimization

  • Use specific queries
  • Implement proper error handling
  • Consider using timeouts for async operations
  • Balance between precision and recall

4. Performance Considerations

  • Monitor memory usage with large collections
  • Use batch operations for bulk insertions
  • Consider async operations for better concurrency
  • Properly handle vector index creation

Check the Guides section for detailed implementation examples!