Skip to content

OpenResponses Gateway

OpenResponses is a unified API specification that provides a consistent interface across multiple AI providers, with advanced features like streaming, reasoning, and built-in tools.


Overview

OpenResponses provides:

  • Unified API: Single interface for multiple AI providers
  • Streaming: 45+ event types for real-time updates
  • Reasoning: Access to model thinking/reasoning content
  • Built-in Tools: Native support for apply_patch, code_interpreter, file_search
  • Message Conversion: Automatic conversion between messages and items

What is OpenResponses?

OpenResponses is an open specification for AI provider APIs that extends beyond basic chat completions. It supports:

  • Structured Outputs: Items-based message format
  • Reasoning Content: Access to model reasoning/thinking
  • Rich Streaming: Detailed streaming events for progress tracking
  • Native Tools: Built-in tool definitions and execution
  • Provider Agnostic: Works with any OpenResponses-compatible provider

Supported Providers

OpenResponses works with providers that implement the specification:

  • Ollama: With OpenResponses support enabled
  • vLLM: Via OpenResponses adapter
  • Custom Servers: Any OpenResponses-compatible endpoint
  • Cloud Providers: With OpenResponses-compatible APIs

Features

Streaming with 45+ Event Types

OpenResponses supports rich streaming with detailed events:

  • response.created: Response started
  • response.in_progress: Response in progress
  • response.output.text.delta: Text token delta
  • response.reasoning.delta: Reasoning content delta
  • response.function_call.arguments.delta: Tool call arguments
  • response.completed: Response finished
async for chunk in gateway.stream_completion(messages, model):
    if chunk.type == "response.output.text.delta":
        print(chunk.content, end="")
    elif chunk.type == "response.reasoning.delta":
        print(f"[Reasoning: {chunk.content}]", end="")

Reasoning/Thinking Content

Access model reasoning alongside output:

response = await gateway.chat_completion(messages, model)
print(f"Output: {response.content}")
print(f"Reasoning: {response.reasoning}")  # Model's thinking process

Reasoning Levels:

  • low: Minimal reasoning (faster, less insight)
  • medium: Balanced reasoning (default)
  • high: Detailed reasoning (slower, more insight)

Built-in Tools

OpenResponses defines native tools:

apply_patch

Apply code patches directly:

{
  "type": "apply_patch",
  "patch": {
    "path": "src/api/users.py",
    "diff": "@@ -42,7 +42,9 @@\n-sql = f\"...\"\n+sql = \"...\"\n+params = (...)"
  }
}

code_interpreter

Execute code in isolated environment:

{
  "type": "code_interpreter",
  "code": "def test_api():\n    response = requests.get('...')\n    assert response.status_code == 200"
}

Search and retrieve files:

{
  "type": "file_search",
  "query": "authentication middleware",
  "paths": ["src/"]
}

Usage

Basic Usage

from superqode.providers.gateway import OpenResponsesGateway

gateway = OpenResponsesGateway(
    base_url="http://localhost:11434",
    reasoning_effort="medium"
)

# Chat completion
response = await gateway.chat_completion(
    messages=[
        {"role": "user", "content": "Analyze this code..."}
    ],
    model="qwen3:8b"
)

print(response.content)

Streaming

async for chunk in gateway.stream_completion(
    messages=messages,
    model="qwen3:8b"
):
    if chunk.type == "response.output.text.delta":
        print(chunk.content, end="", flush=True)

With Tools

tools = [
    {
        "type": "function",
        "function": {
            "name": "read_file",
            "description": "Read a file",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {"type": "string"}
                }
            }
        }
    }
]

response = await gateway.chat_completion(
    messages=messages,
    model="qwen3:8b",
    tools=tools
)

Reasoning Configuration

gateway = OpenResponsesGateway(
    base_url="http://localhost:11434",
    reasoning_effort="high",  # Get detailed reasoning
    truncation="auto"         # Auto-truncate if needed
)

Configuration

SuperQode Configuration

# superqode.yaml
providers:
  openresponses:
    base_url: http://localhost:11434
    reasoning_effort: medium
    truncation: auto
    timeout: 300

Environment Variables

export OPENRESPONSES_BASE_URL=http://localhost:11434
export OPENRESPONSES_REASONING_EFFORT=medium
export OPENRESPONSES_TRUNCATION=auto

Gateway Initialization

from superqode.providers.gateway import OpenResponsesGateway

gateway = OpenResponsesGateway(
    base_url="http://localhost:11434",
    api_key=None,  # Optional for local providers
    reasoning_effort="medium",
    truncation="auto",
    timeout=300.0,
    track_costs=False  # Local providers typically don't charge
)

Message โ†” Item Conversion

OpenResponses uses an items-based format internally. SuperQode automatically converts:

Messages to Items

from superqode.providers.openresponses import messages_to_items

messages = [
    {"role": "user", "content": "Hello"}
]

items = messages_to_items(messages)
# Converts to OpenResponses item format

Items to Messages

from superqode.providers.openresponses import items_to_messages

# After receiving OpenResponses response
messages = items_to_messages(response.items)

Streaming Events

Event Types

Event Description
response.created Response started
response.in_progress Processing in progress
response.output.text.delta Text token delta
response.output.text.done Text output complete
response.reasoning.delta Reasoning content delta
response.reasoning.done Reasoning complete
response.function_call.arguments.delta Tool arguments delta
response.function_call.arguments.done Tool arguments complete
response.completed Response finished

Processing Events

async for chunk in gateway.stream_completion(messages, model):
    event_type = chunk.type

    if event_type == "response.output.text.delta":
        # Accumulate text
        output_text += chunk.content

    elif event_type == "response.reasoning.delta":
        # Accumulate reasoning
        reasoning += chunk.content

    elif event_type == "response.function_call.arguments.delta":
        # Accumulate tool call
        tool_args += chunk.content

    elif event_type == "response.completed":
        # Process final response
        final_response = chunk.response

Tool Conversion

OpenResponses tools are automatically converted:

from superqode.providers.openresponses import (
    convert_tools_to_openresponses,
    convert_tools_from_openresponses
)

# Convert SuperQode tools to OpenResponses
openresponses_tools = convert_tools_to_openresponses(tools)

# Convert OpenResponses tools back
superqode_tools = convert_tools_from_openresponses(openresponses_tools)

Reasoning Levels

Low

Minimal reasoning, fastest:

gateway = OpenResponsesGateway(
    reasoning_effort="low"
)

Use when: - Speed is critical - Simple tasks - Reasoning not needed

Medium (Default)

Balanced reasoning:

gateway = OpenResponsesGateway(
    reasoning_effort="medium"
)

Use when: - General QE tasks - Need some insight - Balance of speed and detail

High

Detailed reasoning, slower:

gateway = OpenResponsesGateway(
    reasoning_effort="high"
)

Use when: - Complex analysis needed - Understanding model logic - Debugging model behavior


Truncation

Auto (Default)

Automatically truncate if needed:

gateway = OpenResponsesGateway(
    truncation="auto"
)

Disabled

Never truncate:

gateway = OpenResponsesGateway(
    truncation="disabled"
)

Error Handling

from superqode.providers.gateway import (
    GatewayError,
    ModelNotFoundError,
    RateLimitError
)

try:
    response = await gateway.chat_completion(messages, model)
except ModelNotFoundError:
    print(f"Model {model} not found")
except RateLimitError:
    print("Rate limit exceeded")
except GatewayError as e:
    print(f"Gateway error: {e}")

Best Practices

1. Use Appropriate Reasoning Level

# Quick tasks
gateway = OpenResponsesGateway(reasoning_effort="low")

# QE analysis
gateway = OpenResponsesGateway(reasoning_effort="medium")

# Deep investigation
gateway = OpenResponsesGateway(reasoning_effort="high")

2. Handle Streaming Efficiently

async def process_stream(gateway, messages, model):
    output = ""
    reasoning = ""

    async for chunk in gateway.stream_completion(messages, model):
        if chunk.type == "response.output.text.delta":
            output += chunk.content
            # Process incrementally
            process_text_delta(chunk.content)
        elif chunk.type == "response.reasoning.delta":
            reasoning += chunk.content

    return output, reasoning

3. Use Tools for Complex Tasks

tools = [
    {
        "type": "function",
        "function": {
            "name": "analyze_code",
            "description": "Analyze code quality",
            "parameters": {...}
        }
    }
]

response = await gateway.chat_completion(
    messages,
    model,
    tools=tools
)

Troubleshooting

Provider Not Compatible

Problem: Provider doesn't support OpenResponses

Solution: Use standard gateway instead:

from superqode.providers.gateway import LiteLLMGateway

gateway = LiteLLMGateway()  # Standard gateway

Reasoning Not Available

Problem: Provider doesn't support reasoning

Solution: Reasoning will be empty; output still works:

response = await gateway.chat_completion(messages, model)
if response.reasoning:
    print(f"Reasoning: {response.reasoning}")
else:
    print("Reasoning not available for this provider")

Streaming Errors

Problem: Streaming events not parsed correctly

Solution: Check event parser:

# Enable debug logging
import logging
logging.basicConfig(level=logging.DEBUG)

Integration Examples

With Ollama

gateway = OpenResponsesGateway(
    base_url="http://localhost:11434"
)

response = await gateway.chat_completion(
    messages=[{"role": "user", "content": "Test"}],
    model="qwen3:8b"
)

With vLLM

gateway = OpenResponsesGateway(
    base_url="http://localhost:8000"
)

response = await gateway.chat_completion(
    messages=[{"role": "user", "content": "Test"}],
    model="Qwen/Qwen2.5-Coder-7B-Instruct"
)

Next Steps