This guide will get you up and running with the HUD SDK using a simple browser task and the OpenAI Operator Agent.

1. Prerequisites

  • Python: Ensure you have Python 3.10 or later installed.
  • API Keys: You’ll need API keys for both HUD and the agent you want to use (e.g., OpenAI).

2. Installation

Install the HUD SDK using pip:

pip install hud-python

For more details, see the Installation Guide.

3. API Key Setup

The SDK automatically loads API keys from environment variables or a .env file in your project root. Set the following:

  • HUD_API_KEY: Your key from app.hud.so.
  • OPENAI_API_KEY: Your OpenAI API key (if using OperatorAgent).
  • ANTHROPIC_API_KEY: Your Anthropic API key (if using ClaudeAgent).

Example .env file:

HUD_API_KEY=hud_...
OPENAI_API_KEY=sk-...
# ANTHROPIC_API_KEY=sk-ant-...

4. Run Your First Agent

This example uses the OperatorAgent to interact with a browser environment. It defines a task, creates an environment, runs the agent, and evaluates the result.

import asyncio
import os
from hud import gym, job                  # Import gym for environments and job decorator
from hud.task import Task                 # Import Task to define the goal
from hud.agent import OperatorAgent       # Import the agent
# hud.settings automatically loads keys from .env or environment variables

# Decorator to group this run under a job named "quickstart-run"
@job("quickstart-run")
async def main():
    # 1. Define a Task: What should the agent do?
    task = Task(
        prompt="Search for 'capybara' on Google",
        gym="hud-browser",               # Use a browser environment
        setup=("goto", "google.com"),    # Action to perform before the agent starts
        evaluate=("contains_text", "capybara") # How to check if the task succeeded
    )

    # 2. Create Environment: The runtime for the task
    print("Creating environment...")
    env = await gym.make(task)          # Creates the environment specified in the task

    # 3. Initialize Agent: Perform the task
    #    API keys are loaded automatically by hud.settings
    print("Initializing agent...")
    agent = OperatorAgent(environment="browser") # Specify environment type for the agent

    # 4. Interaction Loop: Agent observes and acts
    print("Starting interaction loop...")
    # Get initial observation (screenshot, text, etc.) by stepping without actions
    obs, _ = env.reset()

    for i in range(5): # Limit to 5 steps for this example
        print(f"--- Step {i+1} ---")
        # Agent predicts the next action(s) based on the observation
        actions, done = await agent.predict(obs)
        print(f"Agent action(s): {actions}")

        if done:
            print("Agent signaled task completion.")
            break

        # Execute the action(s) in the environment
        obs, reward, terminated, info = await env.step(actions)

        if terminated:
            print("Environment terminated.")
            break

    # 5. Evaluate & Close
    print("Evaluating task...")
    result = await env.evaluate()       # Run the evaluation defined in the Task
    print(f"Evaluation result: {result}")

    # Trajectory is automatically saved if a @job decorator is used
    # trajectory = await env.get_trajectory() # You can optionally get trajectory data
    # print(f"Trajectory ID: {trajectory.id}")

    print("Closing environment...")
    await env.close()                   # Clean up environment resources

if __name__ == "__main__":
    # Ensure API keys are set before running
    if not os.getenv("HUD_API_KEY") or not os.getenv("OPENAI_API_KEY"):
        print("Error: Please set HUD_API_KEY and OPENAI_API_KEY environment variables or in a .env file.")
    else:
        asyncio.run(main())

Explanation:

  1. Task: Defines the goal (prompt), the type of environment (gym), initial setup steps (setup), and how success is measured (evaluate).
  2. Environment: gym.make(task) creates the specified browser environment instance.
  3. Agent: OperatorAgent is initialized. It automatically uses the OPENAI_API_KEY found by hud.settings.
  4. Interaction Loop:
    • env.step() with no actions gets the initial observation.
    • agent.predict(obs) gets the next action(s) from the agent.
    • env.step(actions) executes the actions and gets the new observation.
  5. Evaluation & Close: env.evaluate() checks if the task succeeded based on the evaluate definition. env.close() shuts down the environment.
  6. @job Decorator: Wrapping main with @job("quickstart-run") automatically creates a Job. When env.close() is called, the recorded interactions (trajectory) are associated with this Job. You can view the job and its trajectory video on the HUD Jobs page.

Next Steps

  • Explore the Core Concepts to understand the SDK architecture in more detail.
  • Check out the Examples folder in the GitHub repo for more detailed, runnable notebooks covering different agents and environments.
  • Review the API Reference for comprehensive documentation on specific functions and classes.