rtbgym.envs.simulator.bidder.Bidder#

class rtbgym.envs.simulator.bidder.Bidder(simulator, objective='conversion', reward_predictor=None, scaler=None, random_state=None)[source]#

Class to determine bid price.

Imported as: rtbgym.envs.simulator.Bidder

Note

Intended to be called and initialized from RTBEnv class in env.py.

Determine bid price by the following formula.

\[{bid price}_{t, i} = {adjust rate}_{t} \times {predicted reward}_{t,i} ( \times {const.})\]

Parameters:

simulator (BaseSimulator) – Auction simulator.
objective ({"click", "conversion"}, default="conversion") – Objective outcome (i.e., reward) of the auction.
reward_predictor (BaseEstimator, default=None) – A machine learning model to predict the reward to determine the bidding price. If None, the ground-truth (expected) reward is used instead of the predicted one.
scaler ({int, float}, default=None (> 0)) – Scaling factor (constant value) used for bid price determination. If None, one should call auto_fit_scaler().
random_state (int, default=None (>= 0)) – Random state.

References

Di Wu, Xiujun Chen, Xun Yang, Hao Wang, Qing Tan, Xiaoxun Zhang, Jian Xu, and Kun Gai. “Budget Constrained Bidding by Model-free Reinforcement Learning in Display Advertising.” 2018.

Jun Zhao, Guang Qiu, Ziyu Guan, Wei Zhao, and Xiaofei He. “Deep Reinforcement Learning for Sponsored Search Real-time Bidding.” 2018.

Attributes:

random_state
reward_predictor
scaler
standard_bid_price

Methods

`auto_fit_scaler`(step_per_episode[, n_samples])	Fit scaling factor used for bid price calculation.
`custom_set_reward_predictor`(reward_predictor)	Set reward predictor used for bid price calculation.
`custom_set_scaler`(scaler)	Set scaling factor used for bid price calculation.
`determine_bid_price`(timestep, adjust_rate, ...)	Determine the bidding price using given adjust rate and the predicted/ground-truth rewards.
`fit_reward_predictor`(step_per_episode[, ...])	Fit reward predictor in advance (pre-train) to use prediction in bidding price determination.

determine_bid_price(timestep, adjust_rate, ad_ids, user_ids)[source]#

Determine the bidding price using given adjust rate and the predicted/ground-truth rewards.

Note

Determine bid price as follows.

\[{bid price}_{t, i} = {adjust rate}_{t} \times {predicted reward}_{t,i} ( \times {const.})\]

Parameters:

timestep (int (> 0)) – Timestep of the RL environment.
adjust_rate (float (>= 0)) – Adjust rate parameter for the bidding price.
ad_ids (array-like of shape (search_volume, )) – IDs of the ads.
user_ids (array-like of shape (search_volume, )) – IDs of the users.

Returns:

bid_prices – Bid price for each auction.

Return type:

ndarray of shape(search_volume, )

custom_set_scaler(scaler)[source]#

Set scaling factor used for bid price calculation.

Parameters:: scaler ({int, float} (> 0)) – Scaling factor (constant value) used in bid price calculation.

auto_fit_scaler(step_per_episode, n_samples=100000)[source]#

Fit scaling factor used for bid price calculation.

Note

scaler is set to approximate reciprocal of the mean predicted/ground-truth rewards.: scaler ~= 1 / mean of predicted/ground-truth rewards

Parameters:

step_per_episode (int (> 0)) – Number of timesteps in an episode.
n_samples (int, default=100000 (> 0)) – Number of samples to fit bid_scaler.

custom_set_reward_predictor(reward_predictor)[source]#

Set reward predictor used for bid price calculation.

Parameters:: reward_predictor (BaseEstimator, default=None) – A machine learning model to predict the reward to determine the bidding price. If None, the ground-truth (expected) reward is used instead of the predicted one.

fit_reward_predictor(step_per_episode, n_samples=100000)[source]#

Fit reward predictor in advance (pre-train) to use prediction in bidding price determination.

Note

Intended to be used only when use_reward_predictor=True option.

X and y of the prediction model is given as follows.

X: array-like of shape (search_volume, ad_feature_dim + user_feature_dim + 1): Concatenated vector of contexts (ad_feature_vector + user_feature_vector) and timestep.
y: array-like of shape (search_volume, ): Reward (i.e., auction outcome) obtained in each auction.

Parameters:

step_per_episode (int (> 0)) – Number of timesteps in an episode.
n_samples (int, default=100000 (> 0)) – Number of samples to fit reward predictor.

Methods