HUD SDK | Docs
A Python SDK for interacting with HUD environments and evaluation benchmarks for browser use and computer use models.
Alpha Release Notice: This SDK is currently in beta status (v0.1.0-beta). The API is still evolving and may change in future releases as we gather feedback and improve functionality.
Overview
HUD provides an interface for:
-
Creating and running evaluation environments for browser and computer use agents
-
Recording agent interactions through detailed trajectories
-
Managing evaluation jobs across multiple tasks
-
Supporting different agent adapters
-
Providing telemetry to track agent performance
Key Concepts
-
Environment - A running instance where an agent can interact with code execution capabilities and built-in telemetry
-
Task - Configuration for reproducibly creating an evaluation environment with a defined problem statement
-
Trajectory - Record of agent actions, observations, and environment states during a runthrough
-
Job - Collection of related trajectories for evaluating agent performance across multiple tasks
Get Started
Follow our Installation Guide to get set up, then check out the Quickstart Guide to run your first example.