@cms42
2025-04-05T01:16:58.000000Z
字数 4915
阅读 8
Certainly, let's break down the concepts of state representation and spatial/non-spatial policies as described in the StarCraft II paper.
1. State Representation
In the context of this paper, "state representation" refers to how the game environment of StarCraft II is presented to the reinforcement learning agent. Instead of raw RGB pixels, the SC2LE environment provides the agent with "feature layers". These feature layers are designed to abstract away from the visual complexity of the 3D game and provide more structured information.
Here's a breakdown of the state representation:
Feature Layers: The core of the state representation is a set of "feature layers". These are essentially 2D grids that represent different aspects of the game world. They are generated by the StarCraft II API and are not raw pixel images.
Types of Feature Layers: Each feature layer represents a specific type of information. Examples include:
Pixel Representation: These feature layers are rendered at a configurable resolution (e.g., 64x64 pixels). Each pixel in a feature layer corresponds to the same amount of game world area due to the use of an orthographic (top-down) projection camera.
Non-Spatial Features: Beyond the spatial feature layers, the agent also receives non-spatial observations, which are scalar values or lists. These include:
Purpose of Feature Layers: Using feature layers simplifies the input for RL agents compared to raw RGB pixels. They provide a more direct and structured representation of the game world, focusing on game-relevant information rather than visual details. This is intended to make learning more efficient and focus on strategic decision-making.
2. Spatial Policy and Non-spatial Policy
The paper distinguishes between "spatial policies" and "non-spatial policies" based on the type of action they control. This distinction is crucial because StarCraft II involves both actions that target locations in the game world (spatial) and actions that don't (non-spatial).
Non-Spatial Policy: This policy is responsible for choosing actions that do not involve targeting a specific location on the screen or minimap. These actions are typically related to:
select_add
in select_rect
, or a categorical argument like specifying a unit type to build. These arguments are chosen by the non-spatial policy when applicable.Spatial Policy: This policy is responsible for choosing actions that do require targeting a location on the screen or minimap. These actions generally involve:
Independent Policies: In the baseline agents described in the paper, the spatial and non-spatial policies are often modeled as independent components of the overall policy. The agent first chooses an action identifier (non-spatial policy) and then, if necessary, chooses spatial coordinates (spatial policy).
Fully Convolutional Approach: The "FullyConv" agent architecture is specifically designed to handle spatial policies effectively. It uses convolutional layers to process the feature layers and output spatial action probabilities directly as a 2D map. This allows the network to maintain spatial awareness and directly predict where to click on the screen or minimap.
Example: Consider the action "move_screen(queued, screen_coordinates)".
queued
argument (e.g., whether to queue the action).screen_coordinates
(x, y) on the screen feature layer, specifying the destination for the move command.In summary, the state representation in SC2LE is based on feature layers that provide a structured, abstracted view of the game world. The policy is then decomposed into non-spatial and spatial components to handle the different types of actions in StarCraft II, enabling the agent to make both strategic decisions (action type) and tactical decisions (where to act in the game world).