Batch SGD with Momentum.

2 min readJan 14, 2023

Let's say you have a big bag of marbles. Each marble represents a piece of information, like a picture or a piece of data. We want to use all of these marbles to make a machine learning model that can make predictions.

But it's hard to use all the marbles at once because it would take a very long time and a lot of computation power. So, instead, we take a small number of marbles from the bag at a time and use those to update our model. We call this small number of marbles a "batch". We keep taking small batches of marbles from the bag and using them to update our model, until we've used all the marbles in the bag. This is what we call "Batch Stochastic Gradient Descent".

Now, imagine you're on a giant trampoline and you're trying to jump to a specific spot. As you're jumping, you might overshoot the spot you're trying to reach. In this case, you'd want to take the momentum of your jumps into account and use it to adjust your trajectory to help you land more closely to your target spot.

This is similar to what happens in machine learning model training. The model can sometimes overshoot the optimal solution because of the learning rate. So we add an extra term to the update rule, called "momentum term", which makes the model move more in the direction it was already moving in. This can help the model converge more quickly to the optimal solution.

So Batch SGD with momentum is a way of updating a machine learning model where we use a small number of pieces of data at a time, and take into account the previous updates to make the current update more efficient.

#BatchSGD #StochasticGradientDescent #MomentumOptimization #MachineLearningOptimization #DeepLearningOptimization #NeuralNetworksTraining

Written by Charan H U

No responses yet