OpenResponses Gateway¶
OpenResponses is a unified API specification that provides a consistent interface across multiple AI providers, with advanced features like streaming, reasoning, and built-in tools.
Overview¶
OpenResponses provides:
- Unified API: Single interface for multiple AI providers
- Streaming: 45+ event types for real-time updates
- Reasoning: Access to model thinking/reasoning content
- Built-in Tools: Native support for
apply_patch,code_interpreter,file_search - Message Conversion: Automatic conversion between messages and items
What is OpenResponses?¶
OpenResponses is an open specification for AI provider APIs that extends beyond basic chat completions. It supports:
- Structured Outputs: Items-based message format
- Reasoning Content: Access to model reasoning/thinking
- Rich Streaming: Detailed streaming events for progress tracking
- Native Tools: Built-in tool definitions and execution
- Provider Agnostic: Works with any OpenResponses-compatible provider
Supported Providers¶
OpenResponses works with providers that implement the specification:
- Ollama: With OpenResponses support enabled
- vLLM: Via OpenResponses adapter
- Custom Servers: Any OpenResponses-compatible endpoint
- Cloud Providers: With OpenResponses-compatible APIs
Features¶
Streaming with 45+ Event Types¶
OpenResponses supports rich streaming with detailed events:
response.created: Response startedresponse.in_progress: Response in progressresponse.output.text.delta: Text token deltaresponse.reasoning.delta: Reasoning content deltaresponse.function_call.arguments.delta: Tool call argumentsresponse.completed: Response finished
async for chunk in gateway.stream_completion(messages, model):
if chunk.type == "response.output.text.delta":
print(chunk.content, end="")
elif chunk.type == "response.reasoning.delta":
print(f"[Reasoning: {chunk.content}]", end="")
Reasoning/Thinking Content¶
Access model reasoning alongside output:
response = await gateway.chat_completion(messages, model)
print(f"Output: {response.content}")
print(f"Reasoning: {response.reasoning}") # Model's thinking process
Reasoning Levels:
low: Minimal reasoning (faster, less insight)medium: Balanced reasoning (default)high: Detailed reasoning (slower, more insight)
Built-in Tools¶
OpenResponses defines native tools:
apply_patch¶
Apply code patches directly:
{
"type": "apply_patch",
"patch": {
"path": "src/api/users.py",
"diff": "@@ -42,7 +42,9 @@\n-sql = f\"...\"\n+sql = \"...\"\n+params = (...)"
}
}
code_interpreter¶
Execute code in isolated environment:
{
"type": "code_interpreter",
"code": "def test_api():\n response = requests.get('...')\n assert response.status_code == 200"
}
file_search¶
Search and retrieve files:
Usage¶
Basic Usage¶
from superqode.providers.gateway import OpenResponsesGateway
gateway = OpenResponsesGateway(
base_url="http://localhost:11434",
reasoning_effort="medium"
)
# Chat completion
response = await gateway.chat_completion(
messages=[
{"role": "user", "content": "Analyze this code..."}
],
model="qwen3:8b"
)
print(response.content)
Streaming¶
async for chunk in gateway.stream_completion(
messages=messages,
model="qwen3:8b"
):
if chunk.type == "response.output.text.delta":
print(chunk.content, end="", flush=True)
With Tools¶
tools = [
{
"type": "function",
"function": {
"name": "read_file",
"description": "Read a file",
"parameters": {
"type": "object",
"properties": {
"path": {"type": "string"}
}
}
}
}
]
response = await gateway.chat_completion(
messages=messages,
model="qwen3:8b",
tools=tools
)
Reasoning Configuration¶
gateway = OpenResponsesGateway(
base_url="http://localhost:11434",
reasoning_effort="high", # Get detailed reasoning
truncation="auto" # Auto-truncate if needed
)
Configuration¶
SuperQode Configuration¶
# superqode.yaml
providers:
openresponses:
base_url: http://localhost:11434
reasoning_effort: medium
truncation: auto
timeout: 300
Environment Variables¶
export OPENRESPONSES_BASE_URL=http://localhost:11434
export OPENRESPONSES_REASONING_EFFORT=medium
export OPENRESPONSES_TRUNCATION=auto
Gateway Initialization¶
from superqode.providers.gateway import OpenResponsesGateway
gateway = OpenResponsesGateway(
base_url="http://localhost:11434",
api_key=None, # Optional for local providers
reasoning_effort="medium",
truncation="auto",
timeout=300.0,
track_costs=False # Local providers typically don't charge
)
Message โ Item Conversion¶
OpenResponses uses an items-based format internally. SuperQode automatically converts:
Messages to Items¶
from superqode.providers.openresponses import messages_to_items
messages = [
{"role": "user", "content": "Hello"}
]
items = messages_to_items(messages)
# Converts to OpenResponses item format
Items to Messages¶
from superqode.providers.openresponses import items_to_messages
# After receiving OpenResponses response
messages = items_to_messages(response.items)
Streaming Events¶
Event Types¶
| Event | Description |
|---|---|
response.created | Response started |
response.in_progress | Processing in progress |
response.output.text.delta | Text token delta |
response.output.text.done | Text output complete |
response.reasoning.delta | Reasoning content delta |
response.reasoning.done | Reasoning complete |
response.function_call.arguments.delta | Tool arguments delta |
response.function_call.arguments.done | Tool arguments complete |
response.completed | Response finished |
Processing Events¶
async for chunk in gateway.stream_completion(messages, model):
event_type = chunk.type
if event_type == "response.output.text.delta":
# Accumulate text
output_text += chunk.content
elif event_type == "response.reasoning.delta":
# Accumulate reasoning
reasoning += chunk.content
elif event_type == "response.function_call.arguments.delta":
# Accumulate tool call
tool_args += chunk.content
elif event_type == "response.completed":
# Process final response
final_response = chunk.response
Tool Conversion¶
OpenResponses tools are automatically converted:
from superqode.providers.openresponses import (
convert_tools_to_openresponses,
convert_tools_from_openresponses
)
# Convert SuperQode tools to OpenResponses
openresponses_tools = convert_tools_to_openresponses(tools)
# Convert OpenResponses tools back
superqode_tools = convert_tools_from_openresponses(openresponses_tools)
Reasoning Levels¶
Low¶
Minimal reasoning, fastest:
Use when: - Speed is critical - Simple tasks - Reasoning not needed
Medium (Default)¶
Balanced reasoning:
Use when: - General QE tasks - Need some insight - Balance of speed and detail
High¶
Detailed reasoning, slower:
Use when: - Complex analysis needed - Understanding model logic - Debugging model behavior
Truncation¶
Auto (Default)¶
Automatically truncate if needed:
Disabled¶
Never truncate:
Error Handling¶
from superqode.providers.gateway import (
GatewayError,
ModelNotFoundError,
RateLimitError
)
try:
response = await gateway.chat_completion(messages, model)
except ModelNotFoundError:
print(f"Model {model} not found")
except RateLimitError:
print("Rate limit exceeded")
except GatewayError as e:
print(f"Gateway error: {e}")
Best Practices¶
1. Use Appropriate Reasoning Level¶
# Quick tasks
gateway = OpenResponsesGateway(reasoning_effort="low")
# QE analysis
gateway = OpenResponsesGateway(reasoning_effort="medium")
# Deep investigation
gateway = OpenResponsesGateway(reasoning_effort="high")
2. Handle Streaming Efficiently¶
async def process_stream(gateway, messages, model):
output = ""
reasoning = ""
async for chunk in gateway.stream_completion(messages, model):
if chunk.type == "response.output.text.delta":
output += chunk.content
# Process incrementally
process_text_delta(chunk.content)
elif chunk.type == "response.reasoning.delta":
reasoning += chunk.content
return output, reasoning
3. Use Tools for Complex Tasks¶
tools = [
{
"type": "function",
"function": {
"name": "analyze_code",
"description": "Analyze code quality",
"parameters": {...}
}
}
]
response = await gateway.chat_completion(
messages,
model,
tools=tools
)
Troubleshooting¶
Provider Not Compatible¶
Problem: Provider doesn't support OpenResponses
Solution: Use standard gateway instead:
from superqode.providers.gateway import LiteLLMGateway
gateway = LiteLLMGateway() # Standard gateway
Reasoning Not Available¶
Problem: Provider doesn't support reasoning
Solution: Reasoning will be empty; output still works:
response = await gateway.chat_completion(messages, model)
if response.reasoning:
print(f"Reasoning: {response.reasoning}")
else:
print("Reasoning not available for this provider")
Streaming Errors¶
Problem: Streaming events not parsed correctly
Solution: Check event parser:
Integration Examples¶
With Ollama¶
gateway = OpenResponsesGateway(
base_url="http://localhost:11434"
)
response = await gateway.chat_completion(
messages=[{"role": "user", "content": "Test"}],
model="qwen3:8b"
)
With vLLM¶
gateway = OpenResponsesGateway(
base_url="http://localhost:8000"
)
response = await gateway.chat_completion(
messages=[{"role": "user", "content": "Test"}],
model="Qwen/Qwen2.5-Coder-7B-Instruct"
)
Next Steps¶
- Local Providers - Local model setup
- BYOK Providers - Cloud provider setup
- Provider Commands - CLI reference