recgym.envs.simulator.function.UserModel#

class recgym.envs.simulator.function.UserModel(user_feature_dim, item_feature_dim, reward_type='continuous', reward_std=0.0, random_state=None)[source]#

Class to define a user model based on user_preference_dynamics and reward_function.

Bases: recgym.BaseUserModel

Imported as: recgym.envs.UserModel

Tip

Use BaseUserModel to define a custom UserModel.

Parameters:

user_feature_dim (int) – Dimension of the user feature vectors. (API consistency.)
item_feature_dim (int) – Dimension of the item feature vectors.
reward_type ({"continuous", "binary"}, default="continuous") – Reward type.
reward_std (float, default=0.0 (>=0)) – Noise level of the reward. Applicable only when reward_type is “continuous”.
random_state (int, default=None (>= 0)) – Random state.

References

Sarah Dean, Jamie Morgenstern. “Preference Dynamics Under Personalized Recommendations.” 2022.

Attributes:

random_state

Methods

`reward_function`(state, action, ...)	Reward function.
`user_preference_dynamics`(state, action, ...)	Function that determines the user state transition (i.e., user preference) based on the recommended item.

user_preference_dynamics(state, action, item_feature_vector, alpha=1.0)[source]#

Function that determines the user state transition (i.e., user preference) based on the recommended item. user_feature is amplified by the recommended item_feature

Parameters:

state (array-like of shape (user_feature_dim, )) – A vector representing user preference. The preference changes over time in an episode depending on the actions presented by the RL agent. When the true state is unobservable, you can gain observation instead of state.
action (int or array-like of shape (1, )) – Indicating which item to present to the user.
item_feature_vector (array-like of shape (n_items, item_feature_dim), default=None) – Feature vectors that characterize each item.
alpha (float, default = 1.0 (0=<alpha=<1)) – Step size controlling how fast the user preference evolves over time.

Returns:

state – A vector representing user preference. The preference changes over time in an episode depending on the actions presented by the RL agent. When the true state is unobservable, you can gain observation instead of state.

Return type:

array-like of shape (user_feature_dim, )

reward_function(state, action, item_feature_vector)[source]#

Reward function. inner product of state and recommended item_feature

Parameters:

state (array-like of shape (user_feature_dim, )) – A vector representing user preference. The preference changes over time in an episode depending on the actions presented by the RL agent. When the true state is unobservable, you can gain observation instead of state.
action (int or array-like of shape (1, )) – Indicating which item to present to the user.
item_feature_vector (array-like of shape (n_items, item_feature_dim), default=None) – Feature vectors that characterize each item.

Returns:

reward – User engagement signal as a reward. Either binary or continuous.

Return type:

float

Methods