scope_rl.policy.head.BaseHead#
- class scope_rl.policy.head.BaseHead[source]#
Base class to convert a greedy policy into a stochastic policy.
Bases:
d3rlpy.algos.QLearningAlgoBaseImported as:
scope_rl.policy.BaseHeadNote
To ensure API compatibility with d3rlpy,
BaseHeadinheritsd3rlpy.algos.QLearningAlgoBase. This base class also has additional methods includingfit,predict, andpredict_value. Please also refer to the following documentation for the methods that are not described in this API reference.See also
- Attributes:
- action_scalar
Methods
Calculate the action choice probabilities.
calc_pscore_given_action(x, action)Calculate the pscore of the given action.
Predict the best action in an online environment.
predict_value_online(x, action[, with_std])Predict the state action value in an online environment.
Sample an action stochastically with its pscore.
Sample an action and calculate its pscore in an online environment.
Sample an action in an online environment.
- abstract sample_action_and_output_pscore(x)[source]#
Sample an action stochastically with its pscore.
Methods,