π‘ Core Concepts
Vector Database Fundamentals
What is a Vector Database?
Vector databases store and manage high-dimensional vectors, enabling semantic search and similarity matching. In AgentVectorDB, these vectors represent agent memories, thoughts, and knowledge.
Embeddings
Embeddings are numerical representations of data (text, images, etc.) that capture semantic meaning. AgentVectorDB uses embeddings to:
- Store memories efficiently
- Enable semantic search
- Find similar content
- Support clustering for efficient retrieval
Key Components
πͺ Stores
A Store in AgentVectorDB is the main entry point for:
- Managing collections
- Handling database connections
- Coordinating operations
from agentvectordb import AgentVectorDBStore
# Initialize store with a specific path
store = AgentVectorDBStore(db_path="./my_db")
π Collections
Collections organize related memories and provide:
- Consistent schema enforcement
- Optimized retrieval
- Metadata management
- Vector index management
Important: Collections require at least 8 entries for proper KMeans vector index creation.
from agentvectordb.embeddings import DefaultTextEmbeddingFunction
# Create embedding function
ef = DefaultTextEmbeddingFunction(dimension=64)
# Create collection
collection = store.get_or_create_collection(
name="agent_memories",
embedding_function=ef,
recreate=False # Set to True to start fresh
)
# Add initial batch of memories (minimum 8 required)
initial_memories = [
{
"content": "First observation",
"type": "observation",
"importance_score": 0.8,
"metadata": {"category": "startup"}
},
# ... add at least 8 memories for proper initialization
]
# Add batch
collection.add_batch(initial_memories)
π§ Memory Entries
Each memory entry contains:
- Content (the actual information)
- Type (categorization)
- Importance score (0.0 to 1.0)
- Metadata (additional context)
- Embedding vector (automatically generated)
# After initialization with 8+ memories, you can add single entries
collection.add(
content="Important observation",
type="observation",
importance_score=0.9,
metadata={"context": "meeting"}
)
Advanced Concepts
Embedding Functions
Custom embedding functions can be implemented:
from agentvectordb.embeddings import BaseEmbeddingFunction
class CustomEmbedder(BaseEmbeddingFunction):
def __init__(self):
super().__init__(dimension=384)
def embed(self, texts):
# Your embedding logic here
return vectors
Async Operations
Async support for high-performance applications:
import asyncio
from agentvectordb import AsyncAgentVectorDBStore
async def main():
# Initialize async store
store = AsyncAgentVectorDBStore(db_path="./async_db")
# Create collection with embedding function
ef = DefaultTextEmbeddingFunction(dimension=64)
collection = await store.get_or_create_collection(
name="async_memories",
embedding_function=ef
)
# Add memories with error handling
try:
await collection.add(
content="Async thought",
type="observation",
metadata={"timestamp": "2024-05-19"}
)
except Exception as e:
print(f"Error: {e}")
# Run async code
asyncio.run(main())
Memory Management
AgentVectorDB provides several memory management features:
Vector Index Requirements
# Minimum 8 diverse entries required for KMeans index
collection.add_batch([
# Add 8+ diverse memories here
# See earlier examples for full structure
])
Memory Types and Importance
collection.add(
content="Critical system event",
type="system_event",
importance_score=1.0 # Highest importance
)
Rich Metadata Support
collection.add(
content="Meeting notes",
type="notes",
importance_score=0.8,
metadata={
"date": "2024-05-19",
"participants": ["Alice", "Bob"],
"project": "Nebula",
"tags": ["meeting", "planning"]
}
)
Best Practices
1. Collection Initialization
- Always start with 8+ diverse memories
- Use
add_batch()
for initial data loading - Ensure proper embedding function configuration
2. Memory Structure
- Include relevant metadata
- Set appropriate importance scores
- Use consistent memory types
- Provide diverse content for better clustering
3. Query Optimization
- Use specific queries
- Implement proper error handling
- Consider using timeouts for async operations
- Balance between precision and recall
4. Performance Considerations
- Monitor memory usage with large collections
- Use batch operations for bulk insertions
- Consider async operations for better concurrency
- Properly handle vector index creation
Check the Guides section for detailed implementation examples!