Optimization Algorithms for Deep Learning

AI/ ML/ Data Science

Ash Pahwa

Instructor at Caltech

Deep Learning is a branch of Artificial Intelligence that is based on the architecture of Neural Networks. When the number of hidden layers in a neural network is extended, it becomes a Deep Learning Neural Network. The applications of Deep Learning Neural Networks are Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN).Gradient Descent optimization technique has been used successfully in many Machine Learning models. However, Gradient Descent algorithm is slow to converge for Deep Learning models like CNN and RNN. Convergence means iteratively moving towards the minimum point of the cost function.Recently many new optimization algorithms have been introduced based on Momentum which converges faster than Gradient Descent. The other optimization algorithms are based on slowing down the learning rate as we move towards convergence. These algorithms are called AdaGrad (Adaptive Gradient), RMS Prop (Root Mean Square Propagation), and Adam (Adaptive Moments). This talk will cover the details of these optimization algorithms and discuss the advantages they offer compared with Gradient Descent algorithm.