scope_rl.dataset.base.BaseDataset#

class scope_rl.dataset.base.BaseDataset[source]#

Base class for logged dataset.

Imported as: scope_rl.dataset.BaseDataset

Methods

`obtain_episodes`(n_trajectories)	Rollout behavior policy and obtain episodes.
`obtain_steps`(n_trajectories)	Rollout behavior policy and obtain steps.

abstract obtain_episodes(n_trajectories)[source]#

Rollout behavior policy and obtain episodes.

Parameters:

n_trajectories (int, default=10000 (> 0)) – Number of trajectories to generate by rolling out the behavior policy.

Returns:

logged_dataset(s) – MultipleLoggedDataset is an instance containing (multiple) logged datasets.

For API consistency, each logged dataset should contain the following.

key: [
    size,
    n_trajectories,
    step_per_trajectory,
    action_type,
    n_actions,
    action_dim,
    action_keys,
    action_meaning,
    state_dim,
    state_keys,
    state,
    action,
    reward,
    done,
    terminal,
    info,
    pscore,
]

size: int (> 0): Number of steps the dataset records.
n_trajectories: int (> 0): Number of trajectories the dataset records.
step_per_trajectory: int (> 0): Number of timesteps in an trajectory.
action_type: str: Type of the action space. Either “discrete” or “continuous”.
n_actions: int (> 0): Number of actions. If action_type is “continuous”, None is recorded.
action_dim: int (> 0): Dimensions of the action space. If action_type is “discrete”, None is recorded.
action_keys: list of str: Name of each dimension in the action space. If action_type is “discrete”, None is recorded.
action_meaning: dict: Dictionary to map discrete action index to a specific action. If action_type is “continuous”, None is recorded.
state_dim: int (> 0): Dimensions of the state space.
state_keys: list of str: Name of each dimension of the state space.
state: ndarray of shape (size, state_dim): State observed under the behavior policy.
action: ndarray of shape (size, ) or (size, action_dim): Action chosen by the behavior policy.
reward: ndarray of shape (size, ): Reward observed for each (state, action) pair.
done: ndarray of shape (size, ): Whether an episode ends or not.
terminal: ndarray of shape (size, ): Whether an episode reaches the pre-defined maximum steps.
info: dict: Additional feedbacks from the environment.
pscore: ndarray of shape (size, ): Propensity of the observed action being chosen under the behavior policy (pscore stands for propensity score).

Return type:

LoggedDataset or MultipleLoggedDataset

abstract obtain_steps(n_trajectories)[source]#

Rollout behavior policy and obtain steps.