Deep learning is an exceptionally iterative procedure. One needs to evaluate different combinations of the hyper parameters to make sense of which mix works the best. Consequently it is important that the deep learning model trains in a shorter time without ruining the purpose of the model. From this article you’ll learn about the most important optimization algorithms in deep learning.
WHAT ARE OPTIMIZATION ALGORITHMS AND WHY IS IT NEEDED?
Optimization algorithms are a set of procedure which is executed iteratively by comparing various results till a satisfactory result is found. For an example in the algorithm g(x), an optimization algorithm could be of help in either minimizing or maximizing the value of g(x). In the field of deep learning, one can use optimization algorithms to train the neural network by optimizing the cost function C.
The value of the cost function C is calculated as the mean of the loss L between the estimated value y’ and the actual value y. The value y’ can be obtained during the forward propagation step and makes use of the parameters Weights W and biases b of that network. With the help of optimization algorithms, one can minimize the value of Cost Function C by changing the values of the trainable parameters W and b.
IMPORTANT OPTIMIZATION ALGORITHMS
The most important kind of optimization algorithms currently are the ones that could be utilized to solve constrained non-linear, non-smooth large-scale optimization problems as these are the challenging problems which have increasing importance in modern Machine Learning.
They are mainly these first-order (i.e. gradient-based) methods that could solve a relaxed problem like the augmented Lagrangian functions, some of these methods are:
*The alternating Directions Methods/Coordinate Descent with the Augmented Lagrange Multipliers methods
*The proximal gradient methods such as ISTA and more importantly the (Nesterov) accelerated versions such as FISTA
*The Primal/Dual methods
The other major categories can be classifieds are:
*The Gradient based methods which uses first order information e.g. Batch Gradient Descent and Stochastic gradient descent
*Gradient based methods which uses the second order information (either by computing the Hessian or approximating it e.g. Newton method, conjugate gradient, scaled conjugate gradient)
*The Search based techniques for example the genetic algorithms, simulated annealing etc. These techniques usually don’t require the functions to be optimized for being differentiable; they try to find a solution by sampling from the probability distribution.
*The Backtracking Gradient Descent and its modifications is also one of the best guaranteed methods among other gradient descent methods and also other iterative methods.
There are some other optimization algorithms too which work effectively namely:
Optimization is one of the major components of deep learning. The intention of most deep learning algorithms is to build an optimization model which would learn the parameters in the objective function from the given set of data. This practice can be extremely helpful in getting the desired output.
The above mentioned are some important optimization algorithms in deep learning.