Letter World: The Ground Environment

The Letter World environment is a simple grid world where an agent navigates between labeled positions. It serves as the foundation for our Counting Reward Machine examples.

Overview

The ground environment in our framework refers to the base environment that defines the world dynamics, independent of any reward structure. For Letter World, this is a grid-based environment implemented as a standard OpenAI Gymnasium environment. See our paper for more details.

Environment Description

Letter World consists of:

A 3×7 grid where the agent can move in four directions
Special positions labeled with letters ‘A’, ‘B’, and ‘C’
Stochastic behavior: when visiting position ‘A’, it may randomly change to ‘B’ (with 50% probability)
A simple observation space containing the agent’s position and a flag indicating if A has changed to B.

Here’s a visualisation of the environment:

+-------------+
|. . . . . . .|
|A . x . . C .|
|. . . . . . .|
+-------------+

Where:

x represents the agent (starting at position [1,3])
A represents the position of letter A (which may change to B)
C represents the position of letter C
. represents empty spaces

Implementation Details

The Letter World environment is implemented as a subclass of gymnasium.Env, following the standard Gym interface.

import gymnasium as gym
import numpy as np

class LetterWorld(gym.Env):
    """The Letter World environment."""

    RIGHT = 0
    LEFT = 1
    UP = 2
    DOWN = 3
    A_POSITION = np.array([1, 1])
    C_POSITION = np.array([1, 5])
    N_ROWS = 3
    N_COLS = 7

    def __init__(self) -> None:
        """Initialize the environment."""
        self.action_space = gym.spaces.Discrete(4)  # Four movement directions
        self.observation_space = gym.spaces.Box(
            low=0, high=100, shape=(3,), dtype=np.int32
        )  # [symbol_seen, row, col]

Action Space

The environment supports four actions for movement:

0: Move right
1: Move left
2: Move up
3: Move down

These actions are defined as constants in the LetterWorld class for better code readability.

Observation Space

The observation is a 3-dimensional vector:

First element: Binary flag (0/1) indicating if A has changed to B
Second element: Row position of the agent
Third element: Column position of the agent

This is implemented as a Box space with shape (3,) in the code.

Movement Mechanics

When the agent attempts to move outside the grid boundaries, it remains in the same position. Here’s the code that handles movement:

def _update_agent_position(self, action: int) -> None:
    """Moves that take agent out of the grid leave it in the same position."""
    row, col = self.agent_position

    if action == self.RIGHT:
        n_row = row
        n_col = col + 1 if col < self.N_COLS - 1 else col
    elif action == self.LEFT:
        n_row = row
        n_col = col - 1 if col > 0 else col
    elif action == self.UP:
        n_col = col
        n_row = row - 1 if row > 0 else row
    elif action == self.DOWN:
        n_col = col
        n_row = row + 1 if row < self.N_ROWS - 1 else row
    else:
        raise ValueError(f"Invalid action {action}.")
    self.agent_position = (n_row, n_col)

The method includes boundary checks for each direction to prevent the agent from moving outside the grid.

Stochastic Behavior

The environment includes a stochastic element: when the agent visits position A, there’s a 50% chance that the symbol changes from ‘A’ to ‘B’.

def step(self, action: int) -> tuple[np.ndarray, float, bool, bool, dict]:
    """Take a step in the environment."""
    self._update_agent_position(action)

    if not self.symbol_seen and np.array_equal(
        self.agent_position, self.A_POSITION
    ):
        self.symbol_seen = np.random.random() < 0.5
    return self._get_obs(), 0, False, False, {}

This stochastic behavior creates an interesting challenge for reinforcement learning algorithms, as they must adapt to uncertain outcomes. The change in symbol is tracked by the symbol_seen flag in the observation.

Using the Environment

The environment follows the standard Gym interface, making it easy to use with existing RL algorithms:

# Create the environment
env = LetterWorld()

# Reset to get initial observation
observation, info = env.reset()

# Take random actions
for _ in range(10):
    action = env.action_space.sample()  # Random action
    observation, reward, terminated, truncated, info = env.step(action)
    
    # Visualize the state
    env.render()
    
    if terminated or truncated:
        observation, info = env.reset()

Environment Visualization

When you call render(), the environment prints a colored grid representation to the console. Here’s what it looks like at different states:

Initial State

+---------------+
|. . . . . . . .|
|. A . x . . C .|
|. . . . . . . .|
+---------------+

After Moving to Position A

+---------------+
|. . . . . . . .|
|. x . . . . C .|
|. . . . . . . .|
+---------------+

After Symbol Change (A to B)

+---------------+
|. . . . . . . .|
|. B . . . . C .|
|. . . x . . . .|
+---------------+

Note that the environment itself doesn’t provide rewards - the reward structure will be defined by the RM/CRM we’ll connect to this environment.

Key Points

The base environment is reward-free - it only defines the dynamics of the world
Navigation is deterministic, but the state transition at position A is stochastic
The environment follows the standard Gym interface for compatibility with RL algorithms
The agent’s task will be defined by an RM/CRM, which will assign rewards based on the sequence of visited letters

Next Steps

Learn about the Labelling Function that maps environment states to symbolic events
Explore how the Counting Reward Machine defines rewards based on these events
See how everything comes together in the Cross-Product Environment

Get Started

Example

Code Concepts

1 - Ground Environment

Letter World: The Ground Environment

Overview

Environment Description

Implementation Details

Action Space

Observation Space

Movement Mechanics

Stochastic Behavior

Using the Environment

Environment Visualization

Initial State

After Moving to Position A

After Symbol Change (A to B)

Key Points

Next Steps

Get Started

Example

Code Concepts

​Letter World: The Ground Environment

​Overview

​Environment Description

​Implementation Details

​Action Space

​Observation Space

​Movement Mechanics

​Stochastic Behavior

​Using the Environment

​Environment Visualization

​Initial State

​After Moving to Position A

​After Symbol Change (A to B)

​Key Points

​Next Steps

Letter World: The Ground Environment

Overview

Environment Description

Implementation Details

Action Space

Observation Space

Movement Mechanics

Stochastic Behavior

Using the Environment

Environment Visualization

Initial State

After Moving to Position A

After Symbol Change (A to B)

Key Points

Next Steps