scope_rl.ope.estimators_base#

Abstract base class for Off-Policy Estimator.

Classes

`BaseCumulativeDistributionOPEEstimator`	Base class for Cumulative Distribution OPE estimators.
`BaseMarginalOPEEstimator`	Base class for OPE estimators with marginal importance sampling.
`BaseOffPolicyEstimator`	Base class for (basic) OPE estimators.
`BaseStateActionMarginalOPEEstimator`	Base class for State-Action Marginal OPE estimators.
`BaseStateMarginalOPEEstimator`	Base class for State Marginal OPE estimators.

class scope_rl.ope.estimators_base.BaseOffPolicyEstimator[source]#

Base class for (basic) OPE estimators.

Imported as: scope_rl.ope.BaseOffPolicyEstimator

Note

This abstract base class also implements the following private methods.

abstract _estimate_trajectory_value:

Estimate the trajectory-wise expected reward.

_calc_behavior_policy_pscore_discrete:

Calculate the behavior policy pscore (action choice probability) in the case of discrete action spaces.

_calc_behavior_policy_pscore_continuous:

Calculate the behavior policy pscore (action choice probability) in the case of continuous action spaces.

_calc_evaluation_policy_pscore_discrete:

Calculate the evaluation policy pscore (action choice probability) in the case of discrete action spaces.

_calc_similarity_weight:

Calculate the similarity weight (for continuous action case) in the case of continuous action spaces.

property _estimate_confidence_interval:

Dictionary containing names and functions of ci methods.

key: [
    bootstrap,
    hoeffding,
    bernstein,
    ttest,
]

property _kernel_function:

Dictionary containing names and functions of kernels.

key: [
    gaussian,
    epanechnikov,
    triangular,
    cosine,
    uniform,
]

Methods

`estimate_interval`()	Estimate the confidence interval of the policy value.
`estimate_policy_value`()	Estimate the policy value of the evaluation policy.

abstract estimate_policy_value()[source]#

Estimate the policy value of the evaluation policy.

abstract estimate_interval()[source]#

Estimate the confidence interval of the policy value.

class scope_rl.ope.estimators_base.BaseMarginalOPEEstimator[source]#

Base class for OPE estimators with marginal importance sampling.

Bases: scope_rl.ope.BaseOffPolicyEstimator

Imported as: scope_rl.ope.estimators_base.BaseMarginalOPEEstimator

Note

This abstract base class also implements the following private methods.

abstract _estimate_trajectory_value:

Estimate the trajectory-wise expected reward.

_calc_behavior_policy_pscore_discrete:

Calculate the behavior policy pscore (action choice probability) in the case of discrete action spaces.

_calc_behavior_policy_pscore_continuous:

Calculate the behavior policy pscore (action choice probability) in the case of continuous action spaces.

_calc_evaluation_policy_pscore_discrete:

Calculate the evaluation policy pscore (action choice probability) in the case of discrete action spaces.

_calc_similarity_weight:

Calculate the similarity weight (for continuous action case) in the case of continuous action spaces.

_calc_marginal_importance_weight(self):

Calculate the marginal importance weight. (Specified either in BaseStateMarginalOffPolicyEstimator or BaseStateActionMarginalOffPolicyEstimator)

property _estimate_confidence_interval:

Dictionary containing names and functions of ci methods.

key: [
    bootstrap,
    hoeffding,
    bernstein,
    ttest,
]

Methods

`estimate_interval`()	Estimate the confidence interval of the policy value.
`estimate_policy_value`()	Estimate the policy value of the evaluation policy.

class scope_rl.ope.estimators_base.BaseStateMarginalOPEEstimator[source]#

Base class for State Marginal OPE estimators.

Bases: scope_rl.ope.BaseMarginalOPEEstimator -> scope_rl.ope.BaseOffPolicyEstimator

Imported as: scope_rl.ope.BaseStateMarginalOPEEstimator

Note

This abstract base class also implements the following private methods.

abstract _estimate_trajectory_value:

Estimate the trajectory-wise expected reward.

_calc_behavior_policy_pscore_discrete:

Calculate the behavior policy pscore (action choice probability) in the case of discrete action spaces.

_calc_behavior_policy_pscore_continuous:

Calculate the behavior policy pscore (action choice probability) in the case of continuous action spaces.

_calc_evaluation_policy_pscore_discrete:

Calculate the evaluation policy pscore (action choice probability) in the case of discrete action spaces.

_calc_similarity_weight:

Calculate the similarity weight (for continuous action case) in the case of continuous action spaces.

_calc_marginal_importance_weight(self):

Calculate the marginal importance weight.

property _estimate_confidence_interval:

Dictionary containing names and functions of ci methods.

key: [
    bootstrap,
    hoeffding,
    bernstein,
    ttest,
]

Methods

`estimate_interval`()	Estimate the confidence interval of the policy value.
`estimate_policy_value`()	Estimate the policy value of the evaluation policy.

class scope_rl.ope.estimators_base.BaseStateActionMarginalOPEEstimator[source]#

Base class for State-Action Marginal OPE estimators.

Bases: scope_rl.ope.BaseMarginalOPEEstimator -> scope_rl.ope.BaseOffPolicyEstimator

Imported as: scope_rl.ope.BaseStateActionMarginalOPEEstimator

Note

This abstract base class also implements the following private methods.

abstract _estimate_trajectory_value:

Estimate the trajectory-wise expected reward.

_calc_behavior_policy_pscore_discrete:

Calculate the behavior policy pscore (action choice probability) in the case of discrete action spaces.

_calc_behavior_policy_pscore_continuous:

Calculate the behavior policy pscore (action choice probability) in the case of continuous action spaces.

_calc_evaluation_policy_pscore_discrete:

Calculate the evaluation policy pscore (action choice probability) in the case of discrete action spaces.

_calc_similarity_weight:

Calculate the similarity weight (for continuous action case) in the case of continuous action spaces.

_calc_marginal_importance_weight(self):

Calculate the marginal importance weight.

property _estimate_confidence_interval:

Dictionary containing names and functions of ci methods.

key: [
    bootstrap,
    hoeffding,
    bernstein,
    ttest,
]

Methods

`estimate_interval`()	Estimate the confidence interval of the policy value.
`estimate_policy_value`()	Estimate the policy value of the evaluation policy.

class scope_rl.ope.estimators_base.BaseCumulativeDistributionOPEEstimator[source]#

Base class for Cumulative Distribution OPE estimators.

Imported as: scope_rl.ope.BaseCumulativeDistributionOPEEstimator

Note

This abstract base class also implements the following private methods.

_aggregate_trajectory_wise_statistics_discrete:

Calculate trajectory-wise summary statistics based on step-wise observations in the case of discrete action spaces.

_aggregate_trajectory_wise_statistics_continuous:

Calculate trajectory-wise summary statistics based on step-wise observations in the case of continuous action spaces.

_target_value_given_idx:

Obtain the reward value corresponding to the given idx when estimating the CDF.

property _kernel_function:

Dictionary containing names and functions of kernels.

key: [
    gaussian,
    epanechnikov,
    triangular,
    cosine,
    uniform,
]

Methods

`estimate_conditional_value_at_risk`()	Estimate the conditional value at risk (CVaR) of the reward under the evaluation policy.
`estimate_cumulative_distribution_function`()	Estimate the cumulative distribution function (CDF) of the policy value.
`estimate_interquartile_range`()	Estimate the interquartile range of the reward under the evaluation policy.
`estimate_mean`()	Estimate the mean of the reward under the evaluation policy.
`estimate_variance`()	Estimate the variance of the reward under the evaluation policy.

abstract estimate_cumulative_distribution_function()[source]#

Estimate the cumulative distribution function (CDF) of the policy value.

abstract estimate_mean()[source]#

Estimate the mean of the reward under the evaluation policy.

abstract estimate_variance()[source]#

Estimate the variance of the reward under the evaluation policy.

abstract estimate_conditional_value_at_risk()[source]#

Estimate the conditional value at risk (CVaR) of the reward under the evaluation policy.

abstract estimate_interquartile_range()[source]#

Estimate the interquartile range of the reward under the evaluation policy.