Multi-Layered Perceptron (Part 2: Optimizers, Cross Entropy & TensorFlow)
Channel:
Subscribers:
5,330
Published on ● Video Link: https://www.youtube.com/watch?v=J5ST4xMbD-o
Notebook and Slides can be found at: https://github.com/tanchongmin/TensorFlow-Implementations
Correction to my Explanation: The Batch Gradient Descent actually gets stuck in local optima more easily as it is a smoother update than Stochastic Gradient Descent, hence less likely to "bump out" of local optima.
The reason why Batch Gradient Descent may be preferred is because it is requires less computation time on the backward pass, and in general heads towards the global minima directly. If we had unlimited compute time, it is ideal to actually use Stochastic Gradient Descent so as to get out of local optima more easily. Minibatch gradient descent is a compromise between Batch and Stochastic gradient descent and is typically used.