a=\frac{dv}{dt}\\ - 作业部落 Cmd Markdown 编辑阅读器

@cms42 2025-04-05T01:16:58.000000Z 字数 4915 阅读 25

Certainly, let's break down the concepts of state representation and spatial/non-spatial policies as described in the StarCraft II paper.

1. State Representation

In the context of this paper, "state representation" refers to how the game environment of StarCraft II is presented to the reinforcement learning agent. Instead of raw RGB pixels, the SC2LE environment provides the agent with "feature layers". These feature layers are designed to abstract away from the visual complexity of the 3D game and provide more structured information.

Here's a breakdown of the state representation:

Feature Layers: The core of the state representation is a set of "feature layers". These are essentially 2D grids that represent different aspects of the game world. They are generated by the StarCraft II API and are not raw pixel images.
- Screen Features: These layers provide a detailed view of a subsection of the game world that corresponds to the player's on-screen view. Think of it as a focused window onto the game.
- Minimap Features: These layers offer a coarser, bird's-eye view of the entire game map. It's a smaller, lower-resolution representation of the whole world.
Types of Feature Layers: Each feature layer represents a specific type of information. Examples include:
- Unit type: Indicates what kind of unit is at each location (e.g., marine, zergling, building).
- Hit points: Shows the health of units.
- Owner: Identifies which player (or neutral) owns a unit or building.
- Visibility: Indicates whether a location is visible to the agent (not obscured by fog-of-war).
- Height map: Represents the terrain elevation.
- Creep: Shows zerg creep areas.
Pixel Representation: These feature layers are rendered at a configurable resolution (e.g., 64x64 pixels). Each pixel in a feature layer corresponds to the same amount of game world area due to the use of an orthographic (top-down) projection camera.
Non-Spatial Features: Beyond the spatial feature layers, the agent also receives non-spatial observations, which are scalar values or lists. These include:
- Resources: Amount of minerals and vespene gas collected.
- Available actions: A list of actions the agent can currently take based on the game state (legal actions).
- Build queue: Information about units or buildings currently being constructed.
- Selected units information: Detailed data about units currently selected by the agent.
Purpose of Feature Layers: Using feature layers simplifies the input for RL agents compared to raw RGB pixels. They provide a more direct and structured representation of the game world, focusing on game-relevant information rather than visual details. This is intended to make learning more efficient and focus on strategic decision-making.

2. Spatial Policy and Non-spatial Policy

The paper distinguishes between "spatial policies" and "non-spatial policies" based on the type of action they control. This distinction is crucial because StarCraft II involves both actions that target locations in the game world (spatial) and actions that don't (non-spatial).

Non-Spatial Policy: This policy is responsible for choosing actions that do not involve targeting a specific location on the screen or minimap. These actions are typically related to:
- Action Function Identifier (a⁰): Selecting what type of action to perform. Examples include: "build supply depot", "attack", "stop", "no-op". This is the highest-level decision - what action to take.
- Action Arguments (Non-Spatial): Some actions require non-spatial arguments. For example, an action might have a boolean argument like select_add in select_rect, or a categorical argument like specifying a unit type to build. These arguments are chosen by the non-spatial policy when applicable.
Spatial Policy: This policy is responsible for choosing actions that do require targeting a location on the screen or minimap. These actions generally involve:
- Screen Coordinates (x, y): If an action targets the screen, the spatial policy determines the (x, y) pixel coordinates on the screen feature layer. For example, for a "move" action, the spatial policy selects where to move to on the screen.
- Minimap Coordinates (x, y): Similarly, if an action targets the minimap, the spatial policy selects the (x, y) coordinates on the minimap feature layer. This is used for actions like moving the camera or attacking a distant location.
Independent Policies: In the baseline agents described in the paper, the spatial and non-spatial policies are often modeled as independent components of the overall policy. The agent first chooses an action identifier (non-spatial policy) and then, if necessary, chooses spatial coordinates (spatial policy).
Fully Convolutional Approach: The "FullyConv" agent architecture is specifically designed to handle spatial policies effectively. It uses convolutional layers to process the feature layers and output spatial action probabilities directly as a 2D map. This allows the network to maintain spatial awareness and directly predict where to click on the screen or minimap.
Example: Consider the action "move_screen(queued, screen_coordinates)".
- Non-spatial policy: Would choose the action function identifier "move_screen" and potentially the queued argument (e.g., whether to queue the action).
- Spatial policy: Would choose the screen_coordinates (x, y) on the screen feature layer, specifying the destination for the move command.

In summary, the state representation in SC2LE is based on feature layers that provide a structured, abstracted view of the game world. The policy is then decomposed into non-spatial and spatial components to handle the different types of actions in StarCraft II, enabling the agent to make both strategic decisions (action type) and tactical decisions (where to act in the game world).

内容目录