rtbgym.envs.simulator.bidder.Bidder#

class rtbgym.envs.simulator.bidder.Bidder(simulator, objective='conversion', reward_predictor=None, scaler=None, random_state=None)[source]#

Class to determine bid price.

Imported as: rtbgym.envs.simulator.Bidder

Note

Intended to be called and initialized from RTBEnv class in env.py.

Determine bid price by the following formula.

\[{bid price}_{t, i} = {adjust rate}_{t} \times {predicted reward}_{t,i} ( \times {const.})\]
Parameters:
  • simulator (BaseSimulator) – Auction simulator.

  • objective ({"click", "conversion"}, default="conversion") – Objective outcome (i.e., reward) of the auction.

  • reward_predictor (BaseEstimator, default=None) – A machine learning model to predict the reward to determine the bidding price. If None, the ground-truth (expected) reward is used instead of the predicted one.

  • scaler ({int, float}, default=None (> 0)) – Scaling factor (constant value) used for bid price determination. If None, one should call auto_fit_scaler().

  • random_state (int, default=None (>= 0)) – Random state.

References

Di Wu, Xiujun Chen, Xun Yang, Hao Wang, Qing Tan, Xiaoxun Zhang, Jian Xu, and Kun Gai. “Budget Constrained Bidding by Model-free Reinforcement Learning in Display Advertising.” 2018.

Jun Zhao, Guang Qiu, Ziyu Guan, Wei Zhao, and Xiaofei He. “Deep Reinforcement Learning for Sponsored Search Real-time Bidding.” 2018.

Attributes:
random_state
reward_predictor
scaler
standard_bid_price

Methods

auto_fit_scaler(step_per_episode[, n_samples])

Fit scaling factor used for bid price calculation.

custom_set_reward_predictor(reward_predictor)

Set reward predictor used for bid price calculation.

custom_set_scaler(scaler)

Set scaling factor used for bid price calculation.

determine_bid_price(timestep, adjust_rate, ...)

Determine the bidding price using given adjust rate and the predicted/ground-truth rewards.

fit_reward_predictor(step_per_episode[, ...])

Fit reward predictor in advance (pre-train) to use prediction in bidding price determination.

determine_bid_price(timestep, adjust_rate, ad_ids, user_ids)[source]#

Determine the bidding price using given adjust rate and the predicted/ground-truth rewards.

Note

Determine bid price as follows.

\[{bid price}_{t, i} = {adjust rate}_{t} \times {predicted reward}_{t,i} ( \times {const.})\]
Parameters:
  • timestep (int (> 0)) – Timestep of the RL environment.

  • adjust_rate (float (>= 0)) – Adjust rate parameter for the bidding price.

  • ad_ids (array-like of shape (search_volume, )) – IDs of the ads.

  • user_ids (array-like of shape (search_volume, )) – IDs of the users.

Returns:

bid_prices – Bid price for each auction.

Return type:

ndarray of shape(search_volume, )

custom_set_scaler(scaler)[source]#

Set scaling factor used for bid price calculation.

Parameters:

scaler ({int, float} (> 0)) – Scaling factor (constant value) used in bid price calculation.

auto_fit_scaler(step_per_episode, n_samples=100000)[source]#

Fit scaling factor used for bid price calculation.

Note

scaler is set to approximate reciprocal of the mean predicted/ground-truth rewards.

scaler ~= 1 / mean of predicted/ground-truth rewards

Parameters:
  • step_per_episode (int (> 0)) – Number of timesteps in an episode.

  • n_samples (int, default=100000 (> 0)) – Number of samples to fit bid_scaler.

custom_set_reward_predictor(reward_predictor)[source]#

Set reward predictor used for bid price calculation.

Parameters:

reward_predictor (BaseEstimator, default=None) – A machine learning model to predict the reward to determine the bidding price. If None, the ground-truth (expected) reward is used instead of the predicted one.

fit_reward_predictor(step_per_episode, n_samples=100000)[source]#

Fit reward predictor in advance (pre-train) to use prediction in bidding price determination.

Note

Intended to be used only when use_reward_predictor=True option.

X and y of the prediction model is given as follows.
X: array-like of shape (search_volume, ad_feature_dim + user_feature_dim + 1)

Concatenated vector of contexts (ad_feature_vector + user_feature_vector) and timestep.

y: array-like of shape (search_volume, )

Reward (i.e., auction outcome) obtained in each auction.

Parameters:
  • step_per_episode (int (> 0)) – Number of timesteps in an episode.

  • n_samples (int, default=100000 (> 0)) – Number of samples to fit reward predictor.

Methods