scope_rl.ope.estimators_base#

Abstract base class for Off-Policy Estimator.

Classes

BaseCumulativeDistributionOPEEstimator

Base class for Cumulative Distribution OPE estimators.

BaseMarginalOPEEstimator

Base class for OPE estimators with marginal importance sampling.

BaseOffPolicyEstimator

Base class for (basic) OPE estimators.

BaseStateActionMarginalOPEEstimator

Base class for State-Action Marginal OPE estimators.

BaseStateMarginalOPEEstimator

Base class for State Marginal OPE estimators.

class scope_rl.ope.estimators_base.BaseOffPolicyEstimator[source]#

Base class for (basic) OPE estimators.

Imported as: scope_rl.ope.BaseOffPolicyEstimator

Note

This abstract base class also implements the following private methods.

abstract _estimate_trajectory_value:

Estimate the trajectory-wise expected reward.

_calc_behavior_policy_pscore_discrete:

Calculate the behavior policy pscore (action choice probability) in the case of discrete action spaces.

_calc_behavior_policy_pscore_continuous:

Calculate the behavior policy pscore (action choice probability) in the case of continuous action spaces.

_calc_evaluation_policy_pscore_discrete:

Calculate the evaluation policy pscore (action choice probability) in the case of discrete action spaces.

_calc_similarity_weight:

Calculate the similarity weight (for continuous action case) in the case of continuous action spaces.

property _estimate_confidence_interval:

Dictionary containing names and functions of ci methods.

key: [
    bootstrap,
    hoeffding,
    bernstein,
    ttest,
]
property _kernel_function:

Dictionary containing names and functions of kernels.

key: [
    gaussian,
    epanechnikov,
    triangular,
    cosine,
    uniform,
]

Methods

estimate_interval()

Estimate the confidence interval of the policy value.

estimate_policy_value()

Estimate the policy value of the evaluation policy.

abstract estimate_policy_value()[source]#

Estimate the policy value of the evaluation policy.

abstract estimate_interval()[source]#

Estimate the confidence interval of the policy value.

class scope_rl.ope.estimators_base.BaseMarginalOPEEstimator[source]#

Base class for OPE estimators with marginal importance sampling.

Bases: scope_rl.ope.BaseOffPolicyEstimator

Imported as: scope_rl.ope.estimators_base.BaseMarginalOPEEstimator

Note

This abstract base class also implements the following private methods.

abstract _estimate_trajectory_value:

Estimate the trajectory-wise expected reward.

_calc_behavior_policy_pscore_discrete:

Calculate the behavior policy pscore (action choice probability) in the case of discrete action spaces.

_calc_behavior_policy_pscore_continuous:

Calculate the behavior policy pscore (action choice probability) in the case of continuous action spaces.

_calc_evaluation_policy_pscore_discrete:

Calculate the evaluation policy pscore (action choice probability) in the case of discrete action spaces.

_calc_similarity_weight:

Calculate the similarity weight (for continuous action case) in the case of continuous action spaces.

_calc_marginal_importance_weight(self):

Calculate the marginal importance weight. (Specified either in BaseStateMarginalOffPolicyEstimator or BaseStateActionMarginalOffPolicyEstimator)

property _estimate_confidence_interval:

Dictionary containing names and functions of ci methods.

key: [
    bootstrap,
    hoeffding,
    bernstein,
    ttest,
]

Methods

estimate_interval()

Estimate the confidence interval of the policy value.

estimate_policy_value()

Estimate the policy value of the evaluation policy.

class scope_rl.ope.estimators_base.BaseStateMarginalOPEEstimator[source]#

Base class for State Marginal OPE estimators.

Bases: scope_rl.ope.BaseMarginalOPEEstimator -> scope_rl.ope.BaseOffPolicyEstimator

Imported as: scope_rl.ope.BaseStateMarginalOPEEstimator

Note

This abstract base class also implements the following private methods.

abstract _estimate_trajectory_value:

Estimate the trajectory-wise expected reward.

_calc_behavior_policy_pscore_discrete:

Calculate the behavior policy pscore (action choice probability) in the case of discrete action spaces.

_calc_behavior_policy_pscore_continuous:

Calculate the behavior policy pscore (action choice probability) in the case of continuous action spaces.

_calc_evaluation_policy_pscore_discrete:

Calculate the evaluation policy pscore (action choice probability) in the case of discrete action spaces.

_calc_similarity_weight:

Calculate the similarity weight (for continuous action case) in the case of continuous action spaces.

_calc_marginal_importance_weight(self):

Calculate the marginal importance weight.

property _estimate_confidence_interval:

Dictionary containing names and functions of ci methods.

key: [
    bootstrap,
    hoeffding,
    bernstein,
    ttest,
]

Methods

estimate_interval()

Estimate the confidence interval of the policy value.

estimate_policy_value()

Estimate the policy value of the evaluation policy.

class scope_rl.ope.estimators_base.BaseStateActionMarginalOPEEstimator[source]#

Base class for State-Action Marginal OPE estimators.

Bases: scope_rl.ope.BaseMarginalOPEEstimator -> scope_rl.ope.BaseOffPolicyEstimator

Imported as: scope_rl.ope.BaseStateActionMarginalOPEEstimator

Note

This abstract base class also implements the following private methods.

abstract _estimate_trajectory_value:

Estimate the trajectory-wise expected reward.

_calc_behavior_policy_pscore_discrete:

Calculate the behavior policy pscore (action choice probability) in the case of discrete action spaces.

_calc_behavior_policy_pscore_continuous:

Calculate the behavior policy pscore (action choice probability) in the case of continuous action spaces.

_calc_evaluation_policy_pscore_discrete:

Calculate the evaluation policy pscore (action choice probability) in the case of discrete action spaces.

_calc_similarity_weight:

Calculate the similarity weight (for continuous action case) in the case of continuous action spaces.

_calc_marginal_importance_weight(self):

Calculate the marginal importance weight.

property _estimate_confidence_interval:

Dictionary containing names and functions of ci methods.

key: [
    bootstrap,
    hoeffding,
    bernstein,
    ttest,
]

Methods

estimate_interval()

Estimate the confidence interval of the policy value.

estimate_policy_value()

Estimate the policy value of the evaluation policy.

class scope_rl.ope.estimators_base.BaseCumulativeDistributionOPEEstimator[source]#

Base class for Cumulative Distribution OPE estimators.

Imported as: scope_rl.ope.BaseCumulativeDistributionOPEEstimator

Note

This abstract base class also implements the following private methods.

_aggregate_trajectory_wise_statistics_discrete:

Calculate trajectory-wise summary statistics based on step-wise observations in the case of discrete action spaces.

_aggregate_trajectory_wise_statistics_continuous:

Calculate trajectory-wise summary statistics based on step-wise observations in the case of continuous action spaces.

_target_value_given_idx:

Obtain the reward value corresponding to the given idx when estimating the CDF.

property _kernel_function:

Dictionary containing names and functions of kernels.

key: [
    gaussian,
    epanechnikov,
    triangular,
    cosine,
    uniform,
]

Methods

estimate_conditional_value_at_risk()

Estimate the conditional value at risk (CVaR) of the reward under the evaluation policy.

estimate_cumulative_distribution_function()

Estimate the cumulative distribution function (CDF) of the policy value.

estimate_interquartile_range()

Estimate the interquartile range of the reward under the evaluation policy.

estimate_mean()

Estimate the mean of the reward under the evaluation policy.

estimate_variance()

Estimate the variance of the reward under the evaluation policy.

abstract estimate_cumulative_distribution_function()[source]#

Estimate the cumulative distribution function (CDF) of the policy value.

abstract estimate_mean()[source]#

Estimate the mean of the reward under the evaluation policy.

abstract estimate_variance()[source]#

Estimate the variance of the reward under the evaluation policy.

abstract estimate_conditional_value_at_risk()[source]#

Estimate the conditional value at risk (CVaR) of the reward under the evaluation policy.

abstract estimate_interquartile_range()[source]#

Estimate the interquartile range of the reward under the evaluation policy.