scope_rl.policy.head.BaseHead#

class scope_rl.policy.head.BaseHead[source]#

Base class to convert a greedy policy into a stochastic policy.

Bases: d3rlpy.algos.QLearningAlgoBase

Imported as: scope_rl.policy.BaseHead

Note

To ensure API compatibility with d3rlpy, BaseHead inherits d3rlpy.algos.QLearningAlgoBase. This base class also has additional methods including fit, predict, and predict_value. Please also refer to the following documentation for the methods that are not described in this API reference.

Attributes:

action_scalar

Methods

`calc_action_choice_probability`(x)	Calculate the action choice probabilities.
`calc_pscore_given_action`(x, action)	Calculate the pscore of the given action.
`predict_online`(x)	Predict the best action in an online environment.
`predict_value_online`(x, action[, with_std])	Predict the state action value in an online environment.
`sample_action_and_output_pscore`(x)	Sample an action stochastically with its pscore.
`sample_action_and_output_pscore_online`(x)	Sample an action and calculate its pscore in an online environment.
`sample_action_online`(x)	Sample an action in an online environment.

abstract sample_action_and_output_pscore(x)[source]#

Sample an action stochastically with its pscore.

abstract calc_action_choice_probability(x)[source]#

Calculate the action choice probabilities.

abstract calc_pscore_given_action(x, action)[source]#

Calculate the pscore of the given action.

predict_online(x)[source]#

Predict the best action in an online environment.

predict_value_online(x, action, with_std=False)[source]#

Predict the state action value in an online environment.

sample_action_online(x)[source]#

Sample an action in an online environment.

sample_action_and_output_pscore_online(x)[source]#

Sample an action and calculate its pscore in an online environment.

Methods,