Multi-Armed Bandit Strategies for Non-Stationary Reward Distributions and Delayed Feedback Processes

Channel:

LLMs Explained - Aggregate Intellect - AI.SCIENCE

Subscribers:

22,600

Published on July 30, 2019 12:06:32 AM ● Video Link: https://www.youtube.com/watch?v=V0VZEBqCiPw

Duration: 55:16

934 views

For slides and more information on the paper, visit https://aisc.ai.science/events/2019-07-29

Discussion lead: Larkin Liu

Motivation:
A survey is performed of various Multi-Armed Bandit (MAB) strategies in order to examine their performance in circumstances exhibiting non-stationary stochastic reward functions in conjunction with delayed feedback. We run several MAB simulations to simulate an online eCommerce platform for grocery pick up, optimizing for product availability. In this work, we evaluate several popular MAB strategies, such as ϵ-greedy, UCB1, and Thompson Sampling. We compare the respective performances of each MAB strategy in the context of regret minimization. We run the analysis in the scenario where the reward function is non-stationary. Furthermore, the process experiences delayed feedback, where the reward function is not immediately responsive to the arm played. We devise a new adaptive technique (AG1) tailored for non-stationary reward functions in the delayed feedback scenario. The results of the simulation show show superior performance in the context of regret minimization compared to traditional MAB strategies.

Other Videos By LLMs Explained - Aggregate Intellect - AI.SCIENCE

2019-09-04	Overview of Reinforcement Learning \| AISC
2019-09-03	Ernie 2.0: A Continual Pre-Training Framework for Language Understanding \| AISC
2019-08-28	Consistency by Agreement in Zero-shot Neural Machine Translation \| AISC
2019-08-26	TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing \| AISC
2019-08-21	Science of science: Identifying Fundamental Drivers of Science \| AISC
2019-08-19	AI Product Stream Meet and Greet \| AISC
2019-08-12	[Original ResNet paper] Deep Residual Learning for Image Recognition \| AISC
2019-08-11	[GAT] Graph Attention Networks \| AISC Foundational
2019-08-06	XLNet: Generalized Autoregressive Pretraining for Language Understanding \| AISC
2019-07-31	Overview of Generative Adversarial Networks \| AISC
2019-07-29	Multi-Armed Bandit Strategies for Non-Stationary Reward Distributions and Delayed Feedback Processes
2019-07-22	AISC Abstract Night
2019-07-15	The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words & Sentences From Natural Supervision
2019-07-10	TF-Encrypted: Private machine learning in tensorflow with secure computing \| AISC Lunch & Learn
2019-07-08	Unsupervised Data Augmentation \| AISC
2019-07-04	Mathematics of Deep Learning Overview \| AISC Lunch & Learn
2019-07-02	Generating High Fidelity Images with Subscale Pixel Networks and Multidimensional Upscaling
2019-06-26	Neural Models of Text Normalization for Speech Applications \| AISC Author Speaking
2019-06-24	Assessing Modeling Variability in Autonomous Vehicle Accelerated Evaluation
2019-06-20	AISC Abstract Night June 20 2019
2019-06-17	Learnability can be undecidable \| AISC

Channel	Latest
Merg	8 hours ago
EmaNG91	8 hours ago
Rincón de jugones	8 hours ago
Mandenmoris A.	8 hours ago
ThA NaTiOn T3 Tv FaBDiCeMaN	9 hours ago
CaptainFRACAS	9 hours ago
jester_VII	9 hours ago
RTV Dukagjini	9 hours ago
ennohex	9 hours ago
NeoEk Channel	9 hours ago
fenom	9 hours ago
Lazycorner07	9 hours ago
EmiRóża89 The Playerka	9 hours ago
MePlayingGTA	9 hours ago
Hyun's Dojo Community	9 hours ago
Captain Oats	9 hours ago
圍棋愛好者	9 hours ago
Thinknoodles	9 hours ago
Spider Shark	9 hours ago
Daizo Dee Von	9 hours ago
Dan Toppy	9 hours ago
CJR Gaming	9 hours ago
Anto scama play	9 hours ago
EYETA	9 hours ago
Games Longplays	9 hours ago