Agent implementations that leverage Counting Reward Machines
pycrm.agents
module provides reinforcement learning algorithms that integrate with Reward Machines and Counting Reward Machines to efficiently learn task policies. These agents are designed to take advantage of the counterfactual experience generation capabilities provided by the pycrm framework.
The framework includes two main types of agent implementations:
pycrm.agents.tabular.ql
. This provides a baseline implementation that uses the standard Q-learning update rule:
pycrm.agents.tabular.cql
module implements Counterfactual Q-Learning, which extends standard Q-Learning to take advantage of the counterfactual experience generation capabilities of Reward Machines and Counting Reward Machines.
pycrm.agents.sb3.dqn.cdqn
module implements Counterfactual Deep Q-Network, extending the DQN algorithm from Stable Baselines 3 to learn from counterfactual experiences.
pycrm.agents.sb3.sac.csac
module implements Counterfactual Soft Actor-Critic (C-SAC), extending the SAC algorithm from Stable Baselines 3 to learn from counterfactual experiences.
pycrm.agents.sb3.td3.ctd3
module implements Counterfactual Twin Delayed Deep Deterministic Policy Gradient (C-TD3), extending the TD3 algorithm from Stable Baselines 3.
pycrm.agents.sb3.ddpg.cddpg
module implements Counterfactual Deep Deterministic Policy Gradient (C-DDPG), extending the DDPG algorithm from Stable Baselines 3.
pycrm.agents.sb3.wrapper
module, which includes:
DispatchSubprocVecEnv
: An extension of Stable Baselines 3’s SubprocVecEnv
that enables efficient parallel generation of counterfactual experiences