Letter World: The Labelling Function
Overview
In the RM/CRM framework, the labelling function is responsible for translating low-level environment states into meaningful symbolic events. This abstraction allows us to define rewards in terms of high-level concepts rather than raw state values.The Labelling Function Concept
The labelling function serves as a critical component that:- Abstracts away low-level environment details
- Produces symbolic events that the RM/CRM can understand
- Enables task specification using high-level symbols
- Creates a clear separation between environment dynamics and reward logic
Implementation Details
In PyCRM, the labelling function is defined as a class that inherits from the baseLabellingFunction
class.
Symbolic Events
TheSymbol
enum defines the set of possible symbolic events that can be detected in the environment:
Symbol.A
: Represents seeing the letter ASymbol.B
: Represents seeing the letter BSymbol.C
: Represents seeing the letter C
Event Detection Methods
The labelling function defines methods that detect specific events based on the agent’s observations. Each method is decorated with@LabellingFunction.event
to indicate that it’s an event detector.
Detecting Symbol A
[1, 1]
) and the symbol has not been seen yet (next_obs[0] == 0
). If true, it returns the Symbol.A
event.
Detecting Symbol B
[1, 1]
) and the symbol has been seen (next_obs[0] == 1
). If true, it returns the Symbol.B
event.
Detecting Symbol C
[1, 5]
) and the symbol has been seen (next_obs[0] == 1
). If true, it returns the Symbol.C
event.
Using the Labelling Function
The labelling function is used by passing it to the cross-product environment along with the ground environment and the CRM. Here’s a simple example:The Role of the Labelling Function
Within the RM/CRM framework, the labelling function serves several important roles:1. Abstraction
It abstracts away the low-level details of the environment, allowing the reward machine to operate on meaningful symbolic events rather than raw observations.2. Event Detection
It detects important events that should trigger state transitions in the reward machine, such as visiting specific locations or achieving subgoals.Key Points
- The labelling function translates low-level observations to high-level symbols
- For Letter World, it detects when the agent sees letters A, B, or C
- Each event detector method returns a symbolic event or
None
- The function handles the stochastic nature of the environment (A changing to B)
- The events form the “alphabet” used by the RM/CRM
Next Steps
- Explore how the RM/CRM defines rewards based on these symbolic events
- See how everything comes together in the Cross-Product Environment
- Learn about Counterfactual Q-Learning for efficient learning with RM/CRMs