scope_rl.ope.estimators_base#
Abstract base class for Off-Policy Estimator.
Classes
Base class for Cumulative Distribution OPE estimators. |
|
Base class for OPE estimators with marginal importance sampling. |
|
Base class for (basic) OPE estimators. |
|
Base class for State-Action Marginal OPE estimators. |
|
Base class for State Marginal OPE estimators. |
- class scope_rl.ope.estimators_base.BaseOffPolicyEstimator[source]#
Base class for (basic) OPE estimators.
Imported as:
scope_rl.ope.BaseOffPolicyEstimatorNote
This abstract base class also implements the following private methods.
- abstract _estimate_trajectory_value:
Estimate the trajectory-wise expected reward.
- _calc_behavior_policy_pscore_discrete:
Calculate the behavior policy pscore (action choice probability) in the case of discrete action spaces.
- _calc_behavior_policy_pscore_continuous:
Calculate the behavior policy pscore (action choice probability) in the case of continuous action spaces.
- _calc_evaluation_policy_pscore_discrete:
Calculate the evaluation policy pscore (action choice probability) in the case of discrete action spaces.
- _calc_similarity_weight:
Calculate the similarity weight (for continuous action case) in the case of continuous action spaces.
- property _estimate_confidence_interval:
Dictionary containing names and functions of ci methods.
key: [ bootstrap, hoeffding, bernstein, ttest, ]
- property _kernel_function:
Dictionary containing names and functions of kernels.
key: [ gaussian, epanechnikov, triangular, cosine, uniform, ]
Methods
Estimate the confidence interval of the policy value.
Estimate the policy value of the evaluation policy.
- class scope_rl.ope.estimators_base.BaseMarginalOPEEstimator[source]#
Base class for OPE estimators with marginal importance sampling.
Bases:
scope_rl.ope.BaseOffPolicyEstimatorImported as:
scope_rl.ope.estimators_base.BaseMarginalOPEEstimatorNote
This abstract base class also implements the following private methods.
- abstract _estimate_trajectory_value:
Estimate the trajectory-wise expected reward.
- _calc_behavior_policy_pscore_discrete:
Calculate the behavior policy pscore (action choice probability) in the case of discrete action spaces.
- _calc_behavior_policy_pscore_continuous:
Calculate the behavior policy pscore (action choice probability) in the case of continuous action spaces.
- _calc_evaluation_policy_pscore_discrete:
Calculate the evaluation policy pscore (action choice probability) in the case of discrete action spaces.
- _calc_similarity_weight:
Calculate the similarity weight (for continuous action case) in the case of continuous action spaces.
- _calc_marginal_importance_weight(self):
Calculate the marginal importance weight. (Specified either in
BaseStateMarginalOffPolicyEstimatororBaseStateActionMarginalOffPolicyEstimator)- property _estimate_confidence_interval:
Dictionary containing names and functions of ci methods.
key: [ bootstrap, hoeffding, bernstein, ttest, ]
Methods
Estimate the confidence interval of the policy value.
Estimate the policy value of the evaluation policy.
- class scope_rl.ope.estimators_base.BaseStateMarginalOPEEstimator[source]#
Base class for State Marginal OPE estimators.
Bases:
scope_rl.ope.BaseMarginalOPEEstimator->scope_rl.ope.BaseOffPolicyEstimatorImported as:
scope_rl.ope.BaseStateMarginalOPEEstimatorNote
This abstract base class also implements the following private methods.
- abstract _estimate_trajectory_value:
Estimate the trajectory-wise expected reward.
- _calc_behavior_policy_pscore_discrete:
Calculate the behavior policy pscore (action choice probability) in the case of discrete action spaces.
- _calc_behavior_policy_pscore_continuous:
Calculate the behavior policy pscore (action choice probability) in the case of continuous action spaces.
- _calc_evaluation_policy_pscore_discrete:
Calculate the evaluation policy pscore (action choice probability) in the case of discrete action spaces.
- _calc_similarity_weight:
Calculate the similarity weight (for continuous action case) in the case of continuous action spaces.
- _calc_marginal_importance_weight(self):
Calculate the marginal importance weight.
- property _estimate_confidence_interval:
Dictionary containing names and functions of ci methods.
key: [ bootstrap, hoeffding, bernstein, ttest, ]
Methods
Estimate the confidence interval of the policy value.
Estimate the policy value of the evaluation policy.
- class scope_rl.ope.estimators_base.BaseStateActionMarginalOPEEstimator[source]#
Base class for State-Action Marginal OPE estimators.
Bases:
scope_rl.ope.BaseMarginalOPEEstimator->scope_rl.ope.BaseOffPolicyEstimatorImported as:
scope_rl.ope.BaseStateActionMarginalOPEEstimatorNote
This abstract base class also implements the following private methods.
- abstract _estimate_trajectory_value:
Estimate the trajectory-wise expected reward.
- _calc_behavior_policy_pscore_discrete:
Calculate the behavior policy pscore (action choice probability) in the case of discrete action spaces.
- _calc_behavior_policy_pscore_continuous:
Calculate the behavior policy pscore (action choice probability) in the case of continuous action spaces.
- _calc_evaluation_policy_pscore_discrete:
Calculate the evaluation policy pscore (action choice probability) in the case of discrete action spaces.
- _calc_similarity_weight:
Calculate the similarity weight (for continuous action case) in the case of continuous action spaces.
- _calc_marginal_importance_weight(self):
Calculate the marginal importance weight.
- property _estimate_confidence_interval:
Dictionary containing names and functions of ci methods.
key: [ bootstrap, hoeffding, bernstein, ttest, ]
Methods
Estimate the confidence interval of the policy value.
Estimate the policy value of the evaluation policy.
- class scope_rl.ope.estimators_base.BaseCumulativeDistributionOPEEstimator[source]#
Base class for Cumulative Distribution OPE estimators.
Imported as:
scope_rl.ope.BaseCumulativeDistributionOPEEstimatorNote
This abstract base class also implements the following private methods.
- _aggregate_trajectory_wise_statistics_discrete:
Calculate trajectory-wise summary statistics based on step-wise observations in the case of discrete action spaces.
- _aggregate_trajectory_wise_statistics_continuous:
Calculate trajectory-wise summary statistics based on step-wise observations in the case of continuous action spaces.
- _target_value_given_idx:
Obtain the reward value corresponding to the given idx when estimating the CDF.
- property _kernel_function:
Dictionary containing names and functions of kernels.
key: [ gaussian, epanechnikov, triangular, cosine, uniform, ]
Methods
Estimate the conditional value at risk (CVaR) of the reward under the evaluation policy.
Estimate the cumulative distribution function (CDF) of the policy value.
Estimate the interquartile range of the reward under the evaluation policy.
Estimate the mean of the reward under the evaluation policy.
Estimate the variance of the reward under the evaluation policy.
- abstract estimate_cumulative_distribution_function()[source]#
Estimate the cumulative distribution function (CDF) of the policy value.
- abstract estimate_variance()[source]#
Estimate the variance of the reward under the evaluation policy.