Multi-Armed Bandit Strategies for Non-Stationary Reward Distributions and Delayed Feedback Processes

Published on ● Video Link: https://www.youtube.com/watch?v=V0VZEBqCiPw



Duration: 55:16
934 views
22


For slides and more information on the paper, visit https://aisc.ai.science/events/2019-07-29

Discussion lead: Larkin Liu


Motivation:
A survey is performed of various Multi-Armed Bandit (MAB) strategies in order to examine their performance in circumstances exhibiting non-stationary stochastic reward functions in conjunction with delayed feedback. We run several MAB simulations to simulate an online eCommerce platform for grocery pick up, optimizing for product availability. In this work, we evaluate several popular MAB strategies, such as ϵ-greedy, UCB1, and Thompson Sampling. We compare the respective performances of each MAB strategy in the context of regret minimization. We run the analysis in the scenario where the reward function is non-stationary. Furthermore, the process experiences delayed feedback, where the reward function is not immediately responsive to the arm played. We devise a new adaptive technique (AG1) tailored for non-stationary reward functions in the delayed feedback scenario. The results of the simulation show show superior performance in the context of regret minimization compared to traditional MAB strategies.




Other Videos By LLMs Explained - Aggregate Intellect - AI.SCIENCE


2019-09-04Overview of Reinforcement Learning | AISC
2019-09-03Ernie 2.0: A Continual Pre-Training Framework for Language Understanding | AISC
2019-08-28Consistency by Agreement in Zero-shot Neural Machine Translation | AISC
2019-08-26TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing | AISC
2019-08-21Science of science: Identifying Fundamental Drivers of Science | AISC
2019-08-19AI Product Stream Meet and Greet | AISC
2019-08-12[Original ResNet paper] Deep Residual Learning for Image Recognition | AISC
2019-08-11[GAT] Graph Attention Networks | AISC Foundational
2019-08-06XLNet: Generalized Autoregressive Pretraining for Language Understanding | AISC
2019-07-31Overview of Generative Adversarial Networks | AISC
2019-07-29Multi-Armed Bandit Strategies for Non-Stationary Reward Distributions and Delayed Feedback Processes
2019-07-22AISC Abstract Night
2019-07-15The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words & Sentences From Natural Supervision
2019-07-10TF-Encrypted: Private machine learning in tensorflow with secure computing | AISC Lunch & Learn
2019-07-08Unsupervised Data Augmentation | AISC
2019-07-04Mathematics of Deep Learning Overview | AISC Lunch & Learn
2019-07-02Generating High Fidelity Images with Subscale Pixel Networks and Multidimensional Upscaling
2019-06-26Neural Models of Text Normalization for Speech Applications | AISC Author Speaking
2019-06-24Assessing Modeling Variability in Autonomous Vehicle Accelerated Evaluation
2019-06-20AISC Abstract Night June 20 2019
2019-06-17Learnability can be undecidable | AISC