scope_rl.utils#

Useful tools.

Functions

check_array

Input validation on array.

check_input_dict

Check input dict keys.

check_logged_dataset

Check logged dataset keys.

cosine_kernel

Cosine kernel.

defaultdict_to_dict

Transform a defaultdict into a corresponding dict.

epanechnikov_kernel

Epanechnikov kernel.

estimate_confidence_interval_by_bootstrap

Estimate the confidence interval by a nonparametric bootstrap-like procedure.

estimate_confidence_interval_by_empirical_bernstein

Estimate the confidence interval by the empirical bernstein inequality.

estimate_confidence_interval_by_hoeffding

Estimate the confidence interval by the Hoeffding's inequality.

estimate_confidence_interval_by_t_test

Estimate the confidence interval by Student T-test.

gaussian_kernel

Gaussian kernel.

l2_distance

Calcilate L2 distance.

triangular_kernel

Triangular kernel.

uniform_kernel

Uniform kernel.

Classes

MultipleInputDict

This class contains paths to multiple input dictionaries for OPE and returns input_dict.

MultipleLoggedDataset

This class contains paths to multiple logged datasets and returns logged_dataset.

NewGymAPIWrapper

This class converts old gym outputs (gym<0.26.0) to the new ones (gym>=0.26.0).

OldGymAPIWrapper

This class converts new gym outputs (gym>=0.26.0) to the old ones (gym<0.26.0).

class scope_rl.utils.MultipleLoggedDataset(action_type, path, save_relative_path=False)[source]#

This class contains paths to multiple logged datasets and returns logged_dataset.

Parameters:
  • action_type ({"discrete", "continuous"}) – Type of the action space.

  • path (str) – Path to the directory. Either absolute or relative path is acceptable.

  • save_relative_path (bool, default=False.) –

    Whether to save a relative path. If True, a path relative to the scope-rl directory will be saved. If False, the absolute path will be saved.

    Note that this option was added in order to run examples in the documentation properly. Otherwise, the default setting (False) is recommended.

Attributes:
behavior_policy_names
n_datasets

Methods

add(logged_dataset, behavior_policy_name)

Save logged dataset.

get(behavior_policy_name, dataset_id)

Load logged dataset.

add(logged_dataset, behavior_policy_name)[source]#

Save logged dataset.

Parameters:
  • logged_dataset (LoggedDataset.) – Logged dataset to save.

  • behavior_policy_name (str) – Name of the behavior policy that generated the logged dataset.

get(behavior_policy_name, dataset_id)[source]#

Load logged dataset.

Parameters:
  • behavior_policy_name (str) – Name of the behavior policy that generated the logged dataset.

  • dataset_id (int) – Id of the logged dataset.

Returns:

logged_dataset – Logged dataset.

Return type:

LoggedDataset.

class scope_rl.utils.MultipleInputDict(action_type, path, save_relative_path=False)[source]#

This class contains paths to multiple input dictionaries for OPE and returns input_dict.

Parameters:
  • action_type ({"discrete", "continuous"}) – Type of the action space.

  • path (str) – Path to the directory. Either absolute or relative path is acceptable.

  • save_relative_path (bool, default=False.) –

    Whether to save a relative path. If True, a path relative to the scope-rl directory will be saved. If False, the absolute path will be saved.

    Note that this option was added in order to run examples in the documentation properly. Otherwise, the default setting (False) is recommended.

Attributes:
behavior_policy_names
n_datasets
n_eval_policies

Check the number of evaluation policies of each input dict.

use_same_eval_policy_across_dataset

Check if the contained logged datasets use the same evaluation policies.

Methods

add(input_dict, behavior_policy_name, dataset_id)

Save input_dict.

get(behavior_policy_name, dataset_id)

Load input_dict.

add(input_dict, behavior_policy_name, dataset_id)[source]#

Save input_dict.

Parameters:
  • input_dict (OPEInputDict.) – Input dictionary for OPE to save.

  • behavior_policy_name (str) – Name of the behavior policy that generated the logged dataset.

  • dataset_id (int) – Id of the logged dataset.

get(behavior_policy_name, dataset_id)[source]#

Load input_dict.

Parameters:
  • behavior_policy_name (str) – Name of the behavior policy that generated the logged dataset.

  • dataset_id (int) – Id of the logged dataset.

Returns:

input_dict – Input dictionary for OPE.

Return type:

OPEInputDict.

property use_same_eval_policy_across_dataset#

Check if the contained logged datasets use the same evaluation policies.

property n_eval_policies#

Check the number of evaluation policies of each input dict.

scope_rl.utils.l2_distance(x, y, bandwidth=1.0)[source]#

Calcilate L2 distance.

Parameters:
  • x (array-like of shape (n_samples, n_dim)) – Input array 1.

  • y (array-like of shape (n_samples, n_dim)) – Input array 2.

Returns:

distance – distance between x and y.

Return type:

ndarray of (n_samples, )

scope_rl.utils.gaussian_kernel(x, y, bandwidth=1.0)[source]#

Gaussian kernel.

x: array-like of shape (n_samples, n_dim)

Input array 1.

y: array-like of shape (n_samples, n_dim)

Input array 2.

bandwidth: float, default=1.0

Bandwidth hyperparameter of the Gaussian kernel.

Returns:

kernel_density – kernel density of x given y.

Return type:

ndarray of (n_samples, )

scope_rl.utils.triangular_kernel(x, y, bandwidth=1.0)[source]#

Triangular kernel.

Parameters:
  • x (array-like of shape (n_samples, n_dim)) – Input array 1.

  • y (array-like of shape (n_samples, n_dim)) – Input array 2.

  • bandwidth (float, default=1.0) – Bandwidth hyperparameter of the Trianglar kernel.

Returns:

kernel_density – kernel density of x given y.

Return type:

ndarray of (n_samples, )

scope_rl.utils.epanechnikov_kernel(x, y, bandwidth=1.0)[source]#

Epanechnikov kernel.

Parameters:
  • x (array-like of shape (n_samples, n_dim)) – Input array 1.

  • y (array-like of shape (n_samples, n_dim)) – Input array 2.

  • bandwidth (float, default=1.0) – Bandwidth hyperparameter of the Trianglar kernel.

Returns:

kernel_density – kernel density of x given y.

Return type:

ndarray of (n_samples, )

scope_rl.utils.cosine_kernel(x, y, bandwidth=1.0)[source]#

Cosine kernel.

x: array-like of shape (n_samples, n_dim)

Input array 1.

y: array-like of shape (n_samples, n_dim)

Input array 2.

bandwidth: float, default=1.0

Bandwidth hyperparameter of the Trianglar kernel.

Returns:

kernel_density – kernel density of x given y.

Return type:

ndarray of (n_samples, )

scope_rl.utils.uniform_kernel(x, y, bandwidth=1.0)[source]#

Uniform kernel.

Parameters:
  • x (array-like of shape (n_samples, n_dim)) – Input array 1.

  • y (array-like of shape (n_samples, n_dim)) – Input array 2.

  • bandwidth (float, default=1.0) – Bandwidth hyperparameter of the Trianglar kernel.

Returns:

kernel_density – kernel density of x given y.

Return type:

ndarray of (n_samples, )

scope_rl.utils.estimate_confidence_interval_by_bootstrap(samples, alpha=0.05, n_bootstrap_samples=100, random_state=None)[source]#

Estimate the confidence interval by a nonparametric bootstrap-like procedure.

Parameters:
  • samples (array-like) – Samples.

  • alpha (float, default=0.05) – Significance level. The value should be within [0, 1).

  • n_bootstrap_samples (int, default=10000 (> 0)) – Number of resampling performed in the bootstrap procedure.

  • random_state (int, default=None (>= 0)) – Random state.

Returns:

estimated_confidence_interval – Dictionary storing the estimated mean and upper-lower confidence bounds.

Return type:

dict

scope_rl.utils.estimate_confidence_interval_by_hoeffding(samples, alpha=0.05, **kwargs)[source]#

Estimate the confidence interval by the Hoeffding’s inequality.

Note

The Hoeffding’s inequality provides high-probability bounds of the expectation \(\mu := \mathbb{E}[X], X \sim p(X)\) as follows.

\[|\hat{\mu} - \mu| \leq X_{\max} \sqrt{\frac{\log(1 / \alpha)}{2 n}},\]

which holds with probability \(1 - \alpha\) where \(n\) is the data size.

Parameters:
  • samples (array-like) – Samples.

  • alpha (float, default=0.05) – Significance level. The value should be within [0, 1).

Returns:

estimated_confidence_interval – Dictionary storing the estimated mean and upper-lower confidence bounds.

Return type:

dict

scope_rl.utils.estimate_confidence_interval_by_empirical_bernstein(samples, alpha=0.05, **kwargs)[source]#

Estimate the confidence interval by the empirical bernstein inequality.

Note

The empirical bernstein inequality provides high-probability bounds of the expectation \(\mu := \mathbb{E}[X], X \sim p(X)\) as follows.

\[|\hat{\mu} - \mu| \leq \frac{7 X_{\max} \log(2 / \alpha)}{3 (n - 1)} + \sqrt{\frac{2 \hat{\mathbb{V}}(X) \log(2 / \alpha)}{n(n - 1)}},\]

which holds with probability \(1 - \alpha\) where \(n\) is the data size and \(\hat{\mathbb{V}}\) is the sample variance.

Parameters:
  • samples (array-like) – Samples.

  • alpha (float, default=0.05) – Significance level. The value should be within [0, 1).

Returns:

estimated_confidence_interval – Dictionary storing the estimated mean and upper-lower confidence bounds.

Return type:

dict

scope_rl.utils.estimate_confidence_interval_by_t_test(samples, alpha=0.05, **kwargs)[source]#

Estimate the confidence interval by Student T-test.

Note

Student T-test assumes that \(X \sim p(X)\) follows a normal distribution. Based on this assumption, the \(1 - \alpha\) % confidence interval of \(\mu := \mathbb{E}[X]\) is derived as follows.

\[|\hat{\mu} - \mu| \leq \frac{T_{\mathrm{test}}(1 - \alpha, n-1)}{\sqrt{n} / \hat{\sigma}},\]

where \(n\) is the data size, \(T_{\mathrm{test}}(\cdot,\cdot)\) is the T-value, and \(\sigma\) is the standard deviation, respectively.

Parameters:
  • samples (NDArray) – Samples.

  • alpha (float, default=0.05) – Significance level. The value should be within [0, 1).

Returns:

estimated_confidence_interval – Dictionary storing the estimated mean and upper-lower confidence bounds.

Return type:

dict

scope_rl.utils.defaultdict_to_dict(dict_)[source]#

Transform a defaultdict into a corresponding dict.

scope_rl.utils.check_array(array, name, expected_dim=1, expected_dtype=None, min_val=None, max_val=None)[source]#

Input validation on array.

Parameters:
  • array (object) – Input array to check.

  • name (str) – Name of the input array.

  • expected_dim (int, default=1) – Expected dimension of the input array.

  • expected_dtype ({type, tuple of type}, default=None) – Expected dtype of the input array.

  • min_val (float, default=None) – Minimum value allowed in the input array.

  • max_val (float, default=None) – Maximum value allowed in the input array.

scope_rl.utils.check_logged_dataset(logged_dataset)[source]#

Check logged dataset keys.

Parameters:

logged_dataset (LoggedDataset) – Logged dataset.

scope_rl.utils.check_input_dict(input_dict)[source]#

Check input dict keys.

Parameters:

input_dict (OPEInputDict) – Input Dict.

class scope_rl.utils.NewGymAPIWrapper(env)[source]#

This class converts old gym outputs (gym<0.26.0) to the new ones (gym>=0.26.0).

Methods

reset

step

class scope_rl.utils.OldGymAPIWrapper(env)[source]#

This class converts new gym outputs (gym>=0.26.0) to the old ones (gym<0.26.0).

Methods

reset

seed

step