scope_rl.policy.head.BaseHead#

class scope_rl.policy.head.BaseHead[source]#

Base class to convert a greedy policy into a stochastic policy.

Bases: d3rlpy.algos.QLearningAlgoBase

Imported as: scope_rl.policy.BaseHead

Note

To ensure API compatibility with d3rlpy, BaseHead inherits d3rlpy.algos.QLearningAlgoBase. This base class also has additional methods including fit, predict, and predict_value. Please also refer to the following documentation for the methods that are not described in this API reference.

Attributes:
action_scalar

Methods

calc_action_choice_probability(x)

Calculate the action choice probabilities.

calc_pscore_given_action(x, action)

Calculate the pscore of the given action.

predict_online(x)

Predict the best action in an online environment.

predict_value_online(x, action[, with_std])

Predict the state action value in an online environment.

sample_action_and_output_pscore(x)

Sample an action stochastically with its pscore.

sample_action_and_output_pscore_online(x)

Sample an action and calculate its pscore in an online environment.

sample_action_online(x)

Sample an action in an online environment.

abstract sample_action_and_output_pscore(x)[source]#

Sample an action stochastically with its pscore.

abstract calc_action_choice_probability(x)[source]#

Calculate the action choice probabilities.

abstract calc_pscore_given_action(x, action)[source]#

Calculate the pscore of the given action.

predict_online(x)[source]#

Predict the best action in an online environment.

predict_value_online(x, action, with_std=False)[source]#

Predict the state action value in an online environment.

sample_action_online(x)[source]#

Sample an action in an online environment.

sample_action_and_output_pscore_online(x)[source]#

Sample an action and calculate its pscore in an online environment.

Methods,