Agent
Understanding the Agent architecture and built-in implementations
Agent
The Agent
is the component responsible for processing observations from an Environment and deciding on the next actions to take to achieve the Task goal.
Overview
Agents interact with environments through a defined loop (explained in Environment Concepts): receive an observation, predict an action, execute the action, repeat. The SDK provides a base Agent
class and specific implementations (ClaudeAgent
, OperatorAgent
).
Base Agent Class (hud.agent.Agent
)
The abstract base class hud.agent.Agent
defines the core structure and prediction pipeline.
Three-Stage Prediction (predict
method)
The main method agent.predict(observation)
orchestrates three stages:
preprocess(observation)
: (Handled internally if an Adapter is provided) Prepares the rawObservation
, typically usingadapter.rescale()
to adjust the screenshot for the model.fetch_response(processed_observation)
: (Implemented by subclasses likeClaudeAgent
,OperatorAgent
) Sends the observation to the underlying AI model (Claude, OpenAI) and returns the model’s raw suggested actions and adone
flag.postprocess(raw_actions)
: (Handled internally if an Adapter is provided) Converts raw actions fromfetch_response
into the standardizedCLA
format usingadapter.adapt_list()
, including coordinate rescaling.
If an adapter is provided to the agent, predict
automatically handles stages 1 and 3. If no adapter is provided, predict
returns the raw actions from stage 2.
Initialization
While the base class can accept a client
and Adapter
, the built-in agents handle client creation automatically using API keys from hud.settings
.
Built-in Agents
hud.agent.ClaudeAgent
: Uses Anthropic Claude models (Computer Use API). RequiresANTHROPIC_API_KEY
in settings. Optionally takes aClaudeAdapter
.hud.agent.OperatorAgent
: Uses OpenAI models (Computer Use Preview API). RequiresOPENAI_API_KEY
in settings. Requiresenvironment
type (‘browser’, ‘windows’, etc.) on initialization. Optionally takes anOperatorAdapter
.
Creating Built-in Agents
Initialization is simple thanks to automatic client creation:
Custom Agents
Inherit from hud.agent.Agent
and implement fetch_response
to integrate your own model or logic.
Related Concepts
- Environment: Provides
Observation
to the Agent, executes itsCLA
actions. - Adapter: Used by the Agent for preprocessing/postprocessing.
- Task: Defines the goal the Agent is trying to achieve.