Defining rewards based on symbolic event sequences
A
, then B
, then C
.
RewardMachine
base class:
u_0
: Returns the initial state (0) where the agent begins the task_get_state_transition_function
: Defines state transitions based on observed events:
_get_reward_transition_function
: Defines rewards for each transition:
A
moves to state 1, with a small reward.B
moves to state 2, with a small reward.C
completes the sequence, reaching the terminal state with a larger reward.A → B → C
.
A
n times,B
,C
n times,A
s and C
s match does the agent receive a reward.A
s and C
s — something a plain RM cannot do.
u_0
: Returns the initial state (0) where the agent begins the taskc_0
: Returns the initial counter values as a tuple (starts with counter = 0)encoded_configuration_size
: Returns the size (2) for encoding state-counter configurations_get_state_transition_function
: Defines state transitions with counter conditions:
_get_counter_transition_function
: Defines how counters change with events:
_get_reward_transition_function
: Defines rewards with counter conditions:
A
and C
.
Only when the counter reaches 0 (all A
s matched by C
s) does the agent receive the reward.
A → B → C
).A
s and C
s”).