scope_rl.policy.head.EpsilonGreedyHead#
- class scope_rl.policy.head.EpsilonGreedyHead(base_policy, name, n_actions, epsilon, random_state=None)[source]#
Class to convert a deterministic policy into an epsilon-greedy policy (applicable to discrete action case).
Bases:
scope_rl.policy.BaseHeadImported as:
scope_rl.policy.EpsilonGreedyHeadNote
Epsilon-greedy policy stochastically chooses actions (i.e., \(a \in \mathcal{A}\)) given state \(s\) as follows.
\[\pi(a \mid s) := (1 - \epsilon) * \mathbb{I}(a = a*)) + \epsilon / |\mathcal{A}|\]where \(\epsilon\) is the probability of taking random actions and \(a*\) is the greedy action. \(\mathbb{I}(\cdot)\) denotes the indicator function.
Note
To ensure API compatibility with d3rlpy,
BaseHeadinheritsd3rlpy.algos.QLearningAlgoBase. This base class also has additional methods includingfit,predict, andpredict_value. Please also refer to the following documentation for the methods that are not described in this API reference.See also
- Parameters:
- Attributes:
- random_state
Methods
Calculate action choice probabilities.
calc_pscore_given_action(x, action)Calculate the pscore of a given action.
Sample an action stochastically based on the pscore.
- sample_action_and_output_pscore(x)[source]#
Sample an action stochastically based on the pscore.
- Parameters:
x (array-like of shape (n_samples, state_dim)) – State (we will follow the implementation of d3rlpy and thus use ‘x’ rather than ‘s’).
- Returns:
action (ndarray of shape (n_samples, )) – Sampled action.
pscore (ndarray of shape (n_samples, )) – Propensity of the observed action being chosen under the behavior policy (pscore stands for propensity score).
- calc_action_choice_probability(x)[source]#
Calculate action choice probabilities.
- Parameters:
x (array-like of shape (n_samples, state_dim)) – State (we will follow the implementation of d3rlpy and thus use ‘x’ rather than ‘s’).
- Returns:
pscore – Propensity of the observed action being chosen under the behavior policy (pscore stands for propensity score).
- Return type:
ndarray of shape (n_samples, n_actions)
- calc_pscore_given_action(x, action)[source]#
Calculate the pscore of a given action.
- Parameters:
x (array-like of shape (n_samples, state_dim)) – State (we will follow the implementation of d3rlpy and thus use ‘x’ rather than ‘s’).
action (array-like of shape (n_samples, )) – Action.
- Returns:
pscore – Pscore of the given state and action.
- Return type:
ndarray of shape (n_samples, )
Methods,