Policy Registry¶
Overview¶
The PolicyRegistry is the central management hub for all policies in RLM Code. It provides decorator-based registration, lookup by name, configuration-driven instantiation, and discovery of all available policies. The registry uses class-level dictionaries, so registered policies are available globally without needing to pass registry instances around.
Class Reference¶
PolicyRegistry¶
class PolicyRegistry:
"""Central registry for all policy types."""
# Internal registries (class-level)
_reward_policies: dict[str, Type[RewardPolicy]] = {}
_action_policies: dict[str, Type[ActionSelectionPolicy]] = {}
_compaction_policies: dict[str, Type[CompactionPolicy]] = {}
_termination_policies: dict[str, Type[TerminationPolicy]] = {}
# Default policy names
_default_reward: str = "default"
_default_action: str = "greedy"
_default_compaction: str = "sliding_window"
_default_termination: str = "final_pattern"
Default Policies¶
When no policy name is specified, the registry returns these defaults:
| Category | Default Policy | Description |
|---|---|---|
| Reward | "default" | DefaultRewardPolicy -- balanced reward |
| Action | "greedy" | GreedyActionPolicy -- deterministic best-action |
| Compaction | "sliding_window" | SlidingWindowCompactionPolicy -- keep last N entries |
| Termination | "final_pattern" | FinalPatternTerminationPolicy -- detect FINAL() patterns |
Registration¶
Decorator-Based Registration¶
The primary way to register policies is through class decorators. Each policy category has its own registration decorator:
from rlm_code.rlm.policies import (
PolicyRegistry,
RewardPolicy,
ActionSelectionPolicy,
CompactionPolicy,
TerminationPolicy,
)
@PolicyRegistry.register_reward("my_reward")
class MyRewardPolicy(RewardPolicy):
name = "my_reward"
description = "My custom reward policy"
def calculate(self, action, result, context):
...
@PolicyRegistry.register_action("my_action")
class MyActionPolicy(ActionSelectionPolicy):
name = "my_action"
description = "My custom action policy"
def select(self, candidates, context):
...
@PolicyRegistry.register_compaction("my_compaction")
class MyCompactionPolicy(CompactionPolicy):
name = "my_compaction"
description = "My custom compaction policy"
def should_compact(self, context):
...
def compact(self, history, context):
...
@PolicyRegistry.register_termination("my_termination")
class MyTerminationPolicy(TerminationPolicy):
name = "my_termination"
description = "My custom termination policy"
def should_terminate(self, result, context):
...
Registration Methods¶
| Method | Decorator Syntax | Registers |
|---|---|---|
register_reward(name) | @PolicyRegistry.register_reward("name") | Reward policy class |
register_action(name) | @PolicyRegistry.register_action("name") | Action selection policy class |
register_compaction(name) | @PolicyRegistry.register_compaction("name") | Compaction policy class |
register_termination(name) | @PolicyRegistry.register_termination("name") | Termination policy class |
Name resolution
If the name argument is None, the decorator falls back to the class's name attribute. It is recommended to always provide an explicit name to avoid ambiguity:
Lookup and Instantiation¶
Getting Policy Instances¶
Use the get_* methods to retrieve instantiated policy objects by name:
# Get with default name
reward = PolicyRegistry.get_reward() # DefaultRewardPolicy()
action = PolicyRegistry.get_action() # GreedyActionPolicy()
compaction = PolicyRegistry.get_compaction() # SlidingWindowCompactionPolicy()
termination = PolicyRegistry.get_termination() # FinalPatternTerminationPolicy()
# Get by name
reward = PolicyRegistry.get_reward("strict")
action = PolicyRegistry.get_action("mcts")
compaction = PolicyRegistry.get_compaction("llm")
termination = PolicyRegistry.get_termination("confidence")
# Get with custom configuration
reward = PolicyRegistry.get_reward("research", config={
"base_success": 0.4,
"fast_execution_bonus": 0.1,
})
action = PolicyRegistry.get_action("sampling", config={
"temperature": 0.7,
"min_probability": 0.05,
})
Lookup Methods¶
| Method | Signature | Returns |
|---|---|---|
get_reward | (name: str \| None, config: dict \| None) -> RewardPolicy | Instantiated reward policy |
get_action | (name: str \| None, config: dict \| None) -> ActionSelectionPolicy | Instantiated action policy |
get_compaction | (name: str \| None, config: dict \| None) -> CompactionPolicy | Instantiated compaction policy |
get_termination | (name: str \| None, config: dict \| None) -> TerminationPolicy | Instantiated termination policy |
Unknown policy names
If you request a policy name that is not registered, a ValueError is raised with a helpful message listing all available policies:
Configuration-Based Instantiation¶
create_from_config¶
The create_from_config() class method creates a complete set of policy instances from a configuration dictionary. This is the preferred way to set up policies from configuration files.
policies = PolicyRegistry.create_from_config({
"reward": {
"name": "research",
"config": {
"base_success": 0.4,
"fast_execution_bonus": 0.1,
},
},
"action": {
"name": "sampling",
"config": {
"temperature": 0.7,
},
},
"compaction": {
"name": "hierarchical",
"config": {
"recent_window": 5,
},
},
"termination": {
"name": "composite",
"config": {
"policies": ["final_pattern", "reward_threshold"],
"require_all": False,
},
},
})
The returned dictionary contains instantiated policy objects:
policies["reward"] # ResearchRewardPolicy instance
policies["action"] # SamplingActionPolicy instance
policies["compaction"] # HierarchicalCompactionPolicy instance
policies["termination"] # CompositeTerminationPolicy instance
Partial configuration
You can include only the categories you want to customize. Omitted categories are simply absent from the returned dictionary:
Configuration Format¶
The configuration dictionary follows this schema:
{
"<category>": { # "reward", "action", "compaction", or "termination"
"name": str, # Registered policy name
"config": dict, # Optional: policy-specific configuration
},
...
}
YAML Configuration Example¶
# rlm_policies.yaml
reward:
name: research
config:
base_success: 0.4
base_failure: 0.3
fast_execution_bonus: 0.1
step_penalty_per_step: 0.02
action:
name: sampling
config:
temperature: 0.7
min_probability: 0.05
compaction:
name: llm
config:
max_entries_before_compact: 15
preserve_last_n: 3
summary_max_tokens: 300
termination:
name: final_pattern
config:
case_sensitive: false
extract_answer: true
import yaml
from rlm_code.rlm.policies import PolicyRegistry
with open("rlm_policies.yaml") as f:
config = yaml.safe_load(f)
policies = PolicyRegistry.create_from_config(config)
Discovery and Listing¶
Listing Registered Policies¶
The registry provides methods to enumerate all registered policies:
# List all policies across all categories
all_policies = PolicyRegistry.list_all()
# Returns:
# {
# "reward": [
# {"name": "default", "description": "Balanced reward for general use"},
# {"name": "strict", "description": "Heavy penalties for errors..."},
# {"name": "lenient", "description": "Forgiving rewards, encourages exploration"},
# {"name": "research", "description": "Detailed breakdowns for reward function research"},
# ],
# "action": [...],
# "compaction": [...],
# "termination": [...],
# }
# List by category
reward_list = PolicyRegistry.list_reward_policies()
action_list = PolicyRegistry.list_action_policies()
compaction_list = PolicyRegistry.list_compaction_policies()
termination_list = PolicyRegistry.list_termination_policies()
Listing Methods¶
| Method | Returns |
|---|---|
list_all() | dict[str, list[dict[str, str]]] -- all policies grouped by category |
list_reward_policies() | list[dict[str, str]] -- reward policies with name and description |
list_action_policies() | list[dict[str, str]] -- action policies with name and description |
list_compaction_policies() | list[dict[str, str]] -- compaction policies with name and description |
list_termination_policies() | list[dict[str, str]] -- termination policies with name and description |
Each entry in the list is a dictionary with "name" and "description" keys.
Changing Defaults¶
You can change the default policy for each category at runtime:
PolicyRegistry.set_default_reward("research")
PolicyRegistry.set_default_action("sampling")
PolicyRegistry.set_default_compaction("hierarchical")
PolicyRegistry.set_default_termination("confidence")
# Now get_reward() returns ResearchRewardPolicy
reward = PolicyRegistry.get_reward()
| Method | Description |
|---|---|
set_default_reward(name) | Set the default reward policy |
set_default_action(name) | Set the default action selection policy |
set_default_compaction(name) | Set the default compaction policy |
set_default_termination(name) | Set the default termination policy |
Validation
All set_default_* methods raise ValueError if the specified name is not registered:
Complete Example: Custom Policy Lifecycle¶
This example shows the full lifecycle of creating, registering, configuring, and using a custom policy:
from rlm_code.rlm.policies import (
PolicyRegistry,
RewardPolicy,
RewardSignal,
ActionResult,
PolicyContext,
)
from typing import Any
# Step 1: Define and register the policy
@PolicyRegistry.register_reward("weighted_components")
class WeightedComponentsRewardPolicy(RewardPolicy):
"""
Reward policy with user-defined component weights.
Allows fine-grained control over which factors
matter most for a specific task.
"""
name = "weighted_components"
description = "User-defined weighted reward components"
@classmethod
def get_default_config(cls) -> dict[str, Any]:
return {
"weights": {
"success": 1.0,
"speed": 0.5,
"brevity": 0.3,
},
"success_value": 0.5,
"speed_threshold_ms": 1000,
"brevity_threshold_chars": 500,
}
def validate_config(self) -> list[str]:
"""Validate that weights are non-negative."""
errors = []
config = {**self.get_default_config(), **self.config}
weights = config.get("weights", {})
for name, weight in weights.items():
if weight < 0:
errors.append(f"Weight '{name}' must be non-negative, got {weight}")
return errors
def calculate(self, action, result, context):
config = {**self.get_default_config(), **self.config}
weights = config["weights"]
components = {}
total = 0.0
# Success component
if "success" in weights:
val = config["success_value"] if result.success else -config["success_value"]
weighted = val * weights["success"]
components["success"] = weighted
total += weighted
# Speed component
if "speed" in weights and result.duration_ms > 0:
fast = result.duration_ms < config["speed_threshold_ms"]
val = 0.2 if fast else -0.1
weighted = val * weights["speed"]
components["speed"] = weighted
total += weighted
# Brevity component
if "brevity" in weights:
brief = len(result.output or "") < config["brevity_threshold_chars"]
val = 0.1 if brief else -0.05
weighted = val * weights["brevity"]
components["brevity"] = weighted
total += weighted
return RewardSignal(
value=max(-1.0, min(1.0, total)),
components=components,
explanation=f"Weighted reward: {total:.3f}",
)
# Step 2: Verify registration
all_rewards = PolicyRegistry.list_reward_policies()
print([p["name"] for p in all_rewards])
# ['default', 'strict', 'lenient', 'research', 'weighted_components']
# Step 3: Validate configuration
policy = PolicyRegistry.get_reward("weighted_components", config={
"weights": {"success": 2.0, "speed": -0.5}, # Invalid: negative weight
})
errors = policy.validate_config()
# ["Weight 'speed' must be non-negative, got -0.5"]
# Step 4: Use with valid configuration
policy = PolicyRegistry.get_reward("weighted_components", config={
"weights": {"success": 2.0, "speed": 1.0, "brevity": 0.0},
})
# Step 5: Set as default
PolicyRegistry.set_default_reward("weighted_components")
default_policy = PolicyRegistry.get_reward() # Returns WeightedComponentsRewardPolicy
# Step 6: Use in configuration-driven setup
policies = PolicyRegistry.create_from_config({
"reward": {
"name": "weighted_components",
"config": {
"weights": {"success": 1.5, "speed": 0.8},
"speed_threshold_ms": 500,
},
},
"action": {"name": "greedy"},
"termination": {"name": "final_pattern"},
})