decthings

BatchNorm

This node normalizes the entire batch so that the total mean is close to zero and standard deviation is close to one. The mean and standard deviation is calculated for each dimension in the input. That is, given an input of shape "(C, X, Y)", the normalization occurs over for each feature in the dimension C. When training, the node learns the mean and standard deviation using a moving average, and when evaluating, this moving average is applied.

Additionally, learnable weight \(\gamma\) and bias \(\beta\) are applied after normalization. For an input of shape (C, X, Y), these parameters have size C, meaning they operate per feature of dimension C. The mathematical expression is:

\(y = \cfrac{x - E[x]}{\sqrt{V[x] + \epsilon}} * \gamma + \beta\)
Where \(E[x]\) is the mean, \(V[x]\) is the variance (standard deviation squared) and \(\epsilon\) is a small configurable constant.

Batch normalization is described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. It is shown that normalizing inputs can significantly reduce training time, and can also improve model performance.

The only supported input data type is Float32, and the input must have at least 1 dimension. In order to determine the sizes of the weight and bias parameters, the first dimension size must be known. For example, the shape (?, 2, 2) can not be used.

The output will have the same shape and data type as the input.

By clicking the node the following parameters can be configured on the right panel:

  • Epsilon: A small constant added to the variance to avoid dividing by zero.
  • Momentum: Momentum for computing the moving average. A larger value means that the mean and standard deviation is adjusted faster.
  • Use moving average: If false, the moving average is not used when evaluating but instead the actual mean and standard deviation is computed.
  • Number of connections: Increases the number of connections on this node. The first input will lead to the first output, the second input to the second output and so on. This is useful because even though the connections are separate they share the same learned parameters, i.e the learned mean and standard deviation.
  • Weight initializer: Initial value for the weight. If unsure, use a value of 1.
  • Bias initializer: Initial value for the bias. If unsure, use a value of 0.

Product

  • Documentation
  • Pricing
  • API reference
  • Guides

Company

  • Support

Get going!

Sign up
  • Terms and conditions
  • Privacy policy
  • Cookie policy
  • GitHub
  • LinkedIn

This website uses cookies to enhance the experience.

Learn more