decthings

Epochs and batching

The terms epoch and batch are used in the context of training. You can configure the number of epochs to use by changing the value of the input parameter "epochs" when starting the training session, and you can configure the batch size on the left panel in the visual editor.

One epoch is defined as one iteration of all available training data. That is, one epoch has passed when the model has been trained on the entire input dataset one time. This is often not enough to achieve the best possible accuracy, so we may want to train for 10, 100 or even more epochs in order to improve the performance of our final model. The training time goes up approximately linearly with the number of epochs.

The batch size, which can be changed on the left panel in the visual editor, configures how many input samples to train on at a time. When using a batch size of more than one, multiple input elements are loaded and processed simultaneously. So how do we know what batch size to choose? First, when using a large batch size, the training can be faster because modern computers can often process many things simultaneously, so by loading just a single element at a time we may be wasting some computing power. Second, and perhaps more importantly, it can affect the converge rate of our model. That is, the number of training steps may increase or decrease. It can also affect the final model accuracy, either in a positive or negative way. A clear downside to increasing the batch size is memory usage. This is because the batch is loaded into memory, so you need to run your model on a launcher with at least as much memory as a batch (and probably a bit more than that because of some overhead).

The reason as to why batching can affect our model converge rate, and perhaps also the final accuracy, is that the optimizer will optimize the mean of the entire batch, not the mean of each sample. Remember, the optimizer defines our loss function. By taking the mean of multiple samples, random noise that inevitably is present in our input dataset gets smoothed out. By reducing this random noise, the optimizer will "see more clearly". This can be a good thing, because the optimizer can more efficiently do its job. But there is also a downside, which is that by reducing the random noise our model may not be as good in the end at edge cases. That is, the model is not able to generalize, and may therefore achieve a worse final accuracy.

If you use a dataset where different samples are of different sizes, for example images with different widths and heights, then internally, the value will be split into multiple tensors. The reason for this is that our machine learning implementation can only handle tensors with a consistent shape. This will decrease the compute performance, because only one tensor will be processed at a time. It will not affect the final loss function however, because the tensors are merged before the optimizer.

Product

  • Documentation
  • Pricing
  • API reference
  • Guides

Company

  • Support

Get going!

Sign up
  • Terms and conditions
  • Privacy policy
  • Cookie policy
  • GitHub
  • LinkedIn

This website uses cookies to enhance the experience.

Learn more