scope_rl.ope.online.visualize_on_policy_policy_value_with_variance#

scope_rl.ope.online.visualize_on_policy_policy_value_with_variance(env, policies, policy_names, n_trajectories=100, step_per_trajectory=None, evaluate_on_stationary_distribution=False, gamma=1.0, alpha=0.05, random_state=None, fig_dir=None, fig_name='estimated_policy_value.png')[source]#

Visualize the policy value estimated by OPE estimators.

Parameters:
  • env (gym.Env) – Reinforcement learning (RL) environment.

  • policies (list of {QLearningAlgoBase, BaseHead}) – List of policies to be evaluated.

  • policy_names (list of str) – Name of policies.

  • n_trajectories (int, default=100 (> 0)) – Number of trajectories to rollout.

  • step_per_trajectory (int, default=None (> 0)) – Number of timesteps in an trajectory.

  • evaluate_on_stationary_distribution (bool, default=False) – Whether to evaluate a policy based on the stationary state distribution induced by it. When True, the evaluation policy is evaluated by rollout without resetting environment at each trajectory. This argument is irrelevant when working on the finite horizon setting.

  • gamma (float, default=1.0) – Discount factor. The value should be within (0, 1].

  • alpha (float, default=0.05) – Significance level. The value should be within [0, 1).

  • random_state (int, default=None (>= 0)) – Random state.

  • fig_dir (Path, default=None) – Path to store the bar figure. If None is given, the figure will not be saved.

  • fig_name (str, default="estimated_policy_value.png") – Name of the bar figure.