UI Testing Guide¶

Planned Feature

UI testing is currently a planned feature, not yet implemented.

This guide shows how you can implement UI testing yourself by creating custom behaviors. The dependencies (selenium, playwright) are available as optional dependencies, but you'll need to implement the evaluators yourself.

Installation¶

Install UI testing dependencies:

pip install codeoptix[ui-testing]

This installs: - selenium - Web automation framework - playwright - Modern browser automation

Use Cases¶

UI testing in CodeOptiX would be useful for:

✅ Evaluating web applications generated by coding agents
✅ Testing user interactions (clicks, forms, navigation)
✅ Validating UI behavior matches requirements
✅ Checking accessibility in real browsers
✅ Performance testing of web pages

Implementation Guide¶

Custom Implementation Required

Since UI testing is not yet implemented in CodeOptiX, you'll need to create your own custom behavior and evaluator. Here's how:

1. Create a UI Test Behavior¶

Create a custom behavior that uses UI testing:

from codeoptix.behaviors.base import BehaviorSpec
from codeoptix.evaluation.evaluators import LLMEvaluator

class UIBehavior(BehaviorSpec):
    """Test UI functionality."""

    def get_name(self) -> str:
        return "ui-functionality"

    def get_description(self) -> str:
        return "Validates UI functionality and user interactions"

    def create_evaluator(self):
        # For now, use LLMEvaluator or create your own
        # TODO: Implement UITestEvaluator
        return None  # You'll need to implement this

2. Implement UI Test Evaluator (Example)¶

Example Implementation

This is an example of how you could implement UI testing. You'll need to integrate this into CodeOptiX's evaluation system yourself.

# Example: Custom UI Test Evaluator
# This is NOT part of CodeOptiX yet - you need to implement this yourself

from playwright.sync_api import sync_playwright
from typing import Dict, Any

class UITestEvaluator:
    """Example: Evaluates UI using Playwright."""

    def evaluate(self, code: str, context: Dict[str, Any] = None) -> Dict[str, Any]:
        """Run UI tests on generated code."""
        url = context.get("url", "http://localhost:8000")

        with sync_playwright() as p:
            browser = p.chromium.launch()
            page = browser.new_page()

            # Navigate to the application
            page.goto(url)

            # Run tests
            issues = []

            # Check if page loads
            if page.title() == "":
                issues.append("Page title is missing")

            # Check for key elements
            if not page.query_selector("button"):
                issues.append("No buttons found on page")

            # Test form submission
            try:
                page.fill('input[name="email"]', "test@example.com")
                page.click('button[type="submit"]')
                page.wait_for_selector(".success", timeout=5000)
            except Exception as e:
                issues.append(f"Form submission failed: {e}")

            browser.close()

            # Calculate score
            score = 1.0 if len(issues) == 0 else max(0.0, 1.0 - len(issues) * 0.2)

            return {
                "passed": len(issues) == 0,
                "score": score,
                "evidence": issues,
            }

Integration Required

To use this in CodeOptiX, you'll need to: 1. Integrate it with CodeOptiX's behavior system 2. Connect it to the evaluation engine 3. Handle the results format expected by CodeOptiX

This is currently not implemented in the core CodeOptiX codebase.

3. Use with Selenium (Alternative)¶

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

class SeleniumUITestEvaluator(BaseEvaluator):
    """Evaluates UI using Selenium."""

    def evaluate(self, code: str, context: Dict[str, Any] = None) -> Dict[str, Any]:
        """Run UI tests using Selenium."""
        url = context.get("url", "http://localhost:8000")

        # Setup Selenium
        options = webdriver.ChromeOptions()
        options.add_argument("--headless")
        driver = webdriver.Chrome(options=options)

        try:
            driver.get(url)

            issues = []

            # Check page title
            if not driver.title:
                issues.append("Page title is missing")

            # Check for elements
            try:
                button = WebDriverWait(driver, 5).until(
                    EC.presence_of_element_located((By.TAG_NAME, "button"))
                )
            except:
                issues.append("No buttons found on page")

            # Test interactions
            try:
                email_input = driver.find_element(By.NAME, "email")
                email_input.send_keys("test@example.com")

                submit_button = driver.find_element(By.CSS_SELECTOR, "button[type='submit']")
                submit_button.click()

                WebDriverWait(driver, 5).until(
                    EC.presence_of_element_located((By.CLASS_NAME, "success"))
                )
            except Exception as e:
                issues.append(f"Form submission failed: {e}")

            score = 1.0 if len(issues) == 0 else max(0.0, 1.0 - len(issues) * 0.2)

            return {
                "passed": len(issues) == 0,
                "score": score,
                "evidence": issues,
            }
        finally:
            driver.quit()

Configuration¶

Playwright Setup¶

Install Playwright browsers:

playwright install chromium

Selenium Setup¶

Install ChromeDriver:

# macOS
brew install chromedriver

# Linux
sudo apt-get install chromium-chromedriver

# Or download from https://chromedriver.chromium.org/

from codeoptix.behaviors.base import BehaviorSpec
from codeoptix.evaluation.evaluators import BaseEvaluator
from playwright.sync_api import sync_playwright

class LoginFormBehavior(BehaviorSpec):
    """Tests login form functionality."""

    def get_name(self) -> str:
        return "login-form"

    def get_description(self) -> str:
        return "Validates login form works correctly"

    def create_evaluator(self) -> BaseEvaluator:
        return LoginFormEvaluator()

class LoginFormEvaluator(BaseEvaluator):
    def evaluate(self, code: str, context: Dict[str, Any] = None) -> Dict[str, Any]:
        url = context.get("url", "http://localhost:8000/login")

        with sync_playwright() as p:
            browser = p.chromium.launch()
            page = browser.new_page()
            page.goto(url)

            issues = []

            # Check form elements exist
            if not page.query_selector('input[name="username"]'):
                issues.append("Username input missing")

            if not page.query_selector('input[name="password"]'):
                issues.append("Password input missing")

            if not page.query_selector('button[type="submit"]'):
                issues.append("Submit button missing")

            # Test form submission
            try:
                page.fill('input[name="username"]', "testuser")
                page.fill('input[name="password"]', "testpass")
                page.click('button[type="submit"]')

                # Wait for redirect or success message
                page.wait_for_url("**/dashboard", timeout=5000)
            except Exception as e:
                issues.append(f"Login failed: {e}")

            browser.close()

            score = 1.0 if len(issues) == 0 else max(0.0, 1.0 - len(issues) * 0.25)

            return {
                "passed": len(issues) == 0,
                "score": score,
                "evidence": issues,
            }

Running UI Tests¶

With CodeOptiX CLI¶

# Set up your application URL
export APP_URL="http://localhost:8000"

# Run evaluation with UI behavior
codeoptix eval \
  --agent codex \
  --behaviors login-form \
  --context '{"url": "http://localhost:8000"}' \
  --llm-provider openai

With Python API¶

from codeoptix.adapters.factory import create_adapter
from codeoptix.evaluation import EvaluationEngine
from codeoptix.utils.llm import create_llm_client, LLMProvider
import os

# Create adapter
adapter = create_adapter("codex", {
    "llm_config": {
        "provider": "openai",
        "api_key": os.getenv("OPENAI_API_KEY"),
    }
})

# Create evaluation engine
llm_client = create_llm_client(LLMProvider.OPENAI)
engine = EvaluationEngine(adapter, llm_client)

# Run UI test
results = engine.evaluate_behaviors(
    behavior_names=["login-form"],
    context={"url": "http://localhost:8000"}
)

print(f"UI Test Score: {results['overall_score']:.2%}")

Best Practices¶

1. Use Headless Mode¶

For CI/CD, always use headless mode:

browser = p.chromium.launch(headless=True)

2. Set Timeouts¶

Always set reasonable timeouts:

page.wait_for_selector(".element", timeout=5000)

3. Clean Up Resources¶

Always close browsers:

try:
    # Run tests
    pass
finally:
    browser.close()

4. Use Context for Configuration¶

Pass configuration via context:

context = {
    "url": "http://localhost:8000",
    "timeout": 5000,
    "headless": True,
}

Troubleshooting¶

Playwright Browser Not Found¶

playwright install chromium

Selenium ChromeDriver Not Found¶

# Check ChromeDriver version matches Chrome
chromedriver --version
google-chrome --version

# Install matching version
brew install chromedriver  # macOS

Tests Timeout¶

Increase timeout in your evaluator:

page.wait_for_selector(".element", timeout=10000)

Headless Mode Issues¶

Some applications behave differently in headless mode. Test in non-headless first:

browser = p.chromium.launch(headless=False)

Next Steps¶

Custom Behaviors Guide - Create your own UI test behaviors
Python API Guide - Advanced usage
Configuration Guide - Configure UI testing