scope_rl.ope.online.visualize_on_policy_interquartile_range#

scope_rl.ope.online.visualize_on_policy_interquartile_range(env, policies, policy_names, n_trajectories=100, step_per_trajectory=None, evaluate_on_stationary_distribution=False, gamma=1.0, alpha=0.05, use_custom_reward_scale=False, scale_min=None, scale_max=None, n_partition=None, random_state=None, fig_dir=None, fig_name='on_policy_interquartile_range.png')[source]#

Visualize the interquartile range of the on-policy policy value.

Parameters:

env (gym.Env) – Reinforcement learning (RL) environment.
policy ({QLearningAlgoBase, BaseHead}) – A policy to be evaluated.
n_trajectories (int, default=100 (> 0)) – Number of trajectories to rollout.
step_per_trajectory (int, default=None (> 0)) – Number of timesteps in an trajectory.
evaluate_on_stationary_distribution (bool, default=False) – Whether to evaluate a policy based on the stationary state distribution induced by it. When True, the evaluation policy is evaluated by rollout without resetting environment at each trajectory. This argument is irrelevant when working on the finite horizon setting.
gamma (float, default=1.0) – Discount factor. The value should be within (0, 1].
alpha (float, default=0.05) – Significance level. The value should be within [0, 1).
use_custom_reward_scale (bool, default=False) – Whether to use a customized reward scale or the reward observed under the behavior policy. If True, the reward scale is uniform, following Huang et al. (2021). If False, the reward scale follows the one defined in Chundak et al. (2021).
scale_min (float, default=None) – Minimum value of the reward scale in the CDF. When use_custom_reward_scale == True, a value must be given.
scale_max (float, default=None) – Maximum value of the reward scale in the CDF. When use_custom_reward_scale == True, a value must be given.
n_partition (int, default=None) – Number of partitions in the reward scale (x-axis of the CDF). When use_custom_reward_scale == True, a value must be given.
random_state (int, default=None (>= 0)) – Random state.
fig_dir (Path, default=None) – Path to store the bar figure. If None is given, the figure will not be saved.
fig_name (str, default="on_policy_conditional_value_at_risk.png") – Name of the bar figure.