recgym.envs.simulator.base#

Abstract Base Class for Simulation.

Classes

BaseUserModel

Base class to define user_preference_dynamics and reward_function.

class recgym.envs.simulator.base.BaseUserModel[source]#

Base class to define user_preference_dynamics and reward_function.

Imported as: class:recgym.BaseUserModel

Methods

`reward_function`(state, action, ...)	Reward function.
`user_preference_dynamics`(state, action, ...)	Function that determines the user state transition (i.e., user preference) based on the recommended item.

abstract user_preference_dynamics(state, action, item_feature_vector)[source]#

Function that determines the user state transition (i.e., user preference) based on the recommended item.

Parameters:

state (array-like of shape (user_feature_dim, )) – A vector representing user preference. The preference changes over time in an episode depending on the actions presented by the RL agent. When the true state is unobservable, you can gain observation instead of state.
action (int or array-like of shape (1, )) – Indicating which item to present to the user.
item_feature_vector (ndarray of shape (n_items, item_feature_dim), default=None) – Feature vectors that characterizes each item.

Returns:

state – A vector representing user preference. The preference changes over time in an episode depending on the actions presented by the RL agent. When the true state is unobservable, you can gain observation instead of state.

Return type:

array-like of shape (user_feature_dim, )

abstract reward_function(state, action, item_feature_vector)[source]#

Reward function.

Parameters:

state (array-like of shape (user_feature_dim, )) – A vector representing user preference. The preference changes over time in an episode depending on the actions presented by the RL agent. When the true state is unobservable, you can gain observation instead of state.
action (int or array-like of shape (1, )) – Indicating which item to present to the user.
item_feature_vector (ndarray of shape (n_items, item_feature_dim), default=None) – Feature vectors that characterizes each item.

Returns:

reward – User engagement signal as a reward. Either binary or continuous.

Return type:

bool or float