UI Testing Guide¶
Planned Feature
UI testing is currently a planned feature, not yet implemented.
This guide shows how you can implement UI testing yourself by creating custom behaviors. The dependencies (selenium, playwright) are available as optional dependencies, but you'll need to implement the evaluators yourself.
Installation¶
Install UI testing dependencies:
This installs: - selenium - Web automation framework - playwright - Modern browser automation
Use Cases¶
UI testing in CodeOptiX would be useful for:
- ✅ Evaluating web applications generated by coding agents
- ✅ Testing user interactions (clicks, forms, navigation)
- ✅ Validating UI behavior matches requirements
- ✅ Checking accessibility in real browsers
- ✅ Performance testing of web pages
Implementation Guide¶
Custom Implementation Required
Since UI testing is not yet implemented in CodeOptiX, you'll need to create your own custom behavior and evaluator. Here's how:
1. Create a UI Test Behavior¶
Create a custom behavior that uses UI testing:
from codeoptix.behaviors.base import BehaviorSpec
from codeoptix.evaluation.evaluators import LLMEvaluator
class UIBehavior(BehaviorSpec):
"""Test UI functionality."""
def get_name(self) -> str:
return "ui-functionality"
def get_description(self) -> str:
return "Validates UI functionality and user interactions"
def create_evaluator(self):
# For now, use LLMEvaluator or create your own
# TODO: Implement UITestEvaluator
return None # You'll need to implement this
2. Implement UI Test Evaluator (Example)¶
Example Implementation
This is an example of how you could implement UI testing. You'll need to integrate this into CodeOptiX's evaluation system yourself.
# Example: Custom UI Test Evaluator
# This is NOT part of CodeOptiX yet - you need to implement this yourself
from playwright.sync_api import sync_playwright
from typing import Dict, Any
class UITestEvaluator:
"""Example: Evaluates UI using Playwright."""
def evaluate(self, code: str, context: Dict[str, Any] = None) -> Dict[str, Any]:
"""Run UI tests on generated code."""
url = context.get("url", "http://localhost:8000")
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
# Navigate to the application
page.goto(url)
# Run tests
issues = []
# Check if page loads
if page.title() == "":
issues.append("Page title is missing")
# Check for key elements
if not page.query_selector("button"):
issues.append("No buttons found on page")
# Test form submission
try:
page.fill('input[name="email"]', "test@example.com")
page.click('button[type="submit"]')
page.wait_for_selector(".success", timeout=5000)
except Exception as e:
issues.append(f"Form submission failed: {e}")
browser.close()
# Calculate score
score = 1.0 if len(issues) == 0 else max(0.0, 1.0 - len(issues) * 0.2)
return {
"passed": len(issues) == 0,
"score": score,
"evidence": issues,
}
Integration Required
To use this in CodeOptiX, you'll need to: 1. Integrate it with CodeOptiX's behavior system 2. Connect it to the evaluation engine 3. Handle the results format expected by CodeOptiX
This is currently not implemented in the core CodeOptiX codebase.
3. Use with Selenium (Alternative)¶
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
class SeleniumUITestEvaluator(BaseEvaluator):
"""Evaluates UI using Selenium."""
def evaluate(self, code: str, context: Dict[str, Any] = None) -> Dict[str, Any]:
"""Run UI tests using Selenium."""
url = context.get("url", "http://localhost:8000")
# Setup Selenium
options = webdriver.ChromeOptions()
options.add_argument("--headless")
driver = webdriver.Chrome(options=options)
try:
driver.get(url)
issues = []
# Check page title
if not driver.title:
issues.append("Page title is missing")
# Check for elements
try:
button = WebDriverWait(driver, 5).until(
EC.presence_of_element_located((By.TAG_NAME, "button"))
)
except:
issues.append("No buttons found on page")
# Test interactions
try:
email_input = driver.find_element(By.NAME, "email")
email_input.send_keys("test@example.com")
submit_button = driver.find_element(By.CSS_SELECTOR, "button[type='submit']")
submit_button.click()
WebDriverWait(driver, 5).until(
EC.presence_of_element_located((By.CLASS_NAME, "success"))
)
except Exception as e:
issues.append(f"Form submission failed: {e}")
score = 1.0 if len(issues) == 0 else max(0.0, 1.0 - len(issues) * 0.2)
return {
"passed": len(issues) == 0,
"score": score,
"evidence": issues,
}
finally:
driver.quit()
Configuration¶
Playwright Setup¶
Install Playwright browsers:
Selenium Setup¶
Install ChromeDriver:
# macOS
brew install chromedriver
# Linux
sudo apt-get install chromium-chromedriver
# Or download from https://chromedriver.chromium.org/
Example: Testing a Login Form¶
from codeoptix.behaviors.base import BehaviorSpec
from codeoptix.evaluation.evaluators import BaseEvaluator
from playwright.sync_api import sync_playwright
class LoginFormBehavior(BehaviorSpec):
"""Tests login form functionality."""
def get_name(self) -> str:
return "login-form"
def get_description(self) -> str:
return "Validates login form works correctly"
def create_evaluator(self) -> BaseEvaluator:
return LoginFormEvaluator()
class LoginFormEvaluator(BaseEvaluator):
def evaluate(self, code: str, context: Dict[str, Any] = None) -> Dict[str, Any]:
url = context.get("url", "http://localhost:8000/login")
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
page.goto(url)
issues = []
# Check form elements exist
if not page.query_selector('input[name="username"]'):
issues.append("Username input missing")
if not page.query_selector('input[name="password"]'):
issues.append("Password input missing")
if not page.query_selector('button[type="submit"]'):
issues.append("Submit button missing")
# Test form submission
try:
page.fill('input[name="username"]', "testuser")
page.fill('input[name="password"]', "testpass")
page.click('button[type="submit"]')
# Wait for redirect or success message
page.wait_for_url("**/dashboard", timeout=5000)
except Exception as e:
issues.append(f"Login failed: {e}")
browser.close()
score = 1.0 if len(issues) == 0 else max(0.0, 1.0 - len(issues) * 0.25)
return {
"passed": len(issues) == 0,
"score": score,
"evidence": issues,
}
Running UI Tests¶
With CodeOptiX CLI¶
# Set up your application URL
export APP_URL="http://localhost:8000"
# Run evaluation with UI behavior
codeoptix eval \
--agent codex \
--behaviors login-form \
--context '{"url": "http://localhost:8000"}' \
--llm-provider openai
With Python API¶
from codeoptix.adapters.factory import create_adapter
from codeoptix.evaluation import EvaluationEngine
from codeoptix.utils.llm import create_llm_client, LLMProvider
import os
# Create adapter
adapter = create_adapter("codex", {
"llm_config": {
"provider": "openai",
"api_key": os.getenv("OPENAI_API_KEY"),
}
})
# Create evaluation engine
llm_client = create_llm_client(LLMProvider.OPENAI)
engine = EvaluationEngine(adapter, llm_client)
# Run UI test
results = engine.evaluate_behaviors(
behavior_names=["login-form"],
context={"url": "http://localhost:8000"}
)
print(f"UI Test Score: {results['overall_score']:.2%}")
Best Practices¶
1. Use Headless Mode¶
For CI/CD, always use headless mode:
2. Set Timeouts¶
Always set reasonable timeouts:
3. Clean Up Resources¶
Always close browsers:
4. Use Context for Configuration¶
Pass configuration via context:
Troubleshooting¶
Playwright Browser Not Found¶
Selenium ChromeDriver Not Found¶
# Check ChromeDriver version matches Chrome
chromedriver --version
google-chrome --version
# Install matching version
brew install chromedriver # macOS
Tests Timeout¶
Increase timeout in your evaluator:
Headless Mode Issues¶
Some applications behave differently in headless mode. Test in non-headless first:
Next Steps¶
- Custom Behaviors Guide - Create your own UI test behaviors
- Python API Guide - Advanced usage
- Configuration Guide - Configure UI testing