decthings

Training

In machine learning, training is the process of letting our model look at real-world information and learn to recognize, replicate or otherwise utilize patterns in this data. To train your model that was created using the visual editor, use an Optimizer node to define your loss function. Then, go to the models page, click "train" and provide the necessary input parameters. Let's discuss how training works in detail when using the visual editor.

The optimizer node takes one or two inputs. Either way, the value that we provide to these inputs define what is called our loss function. When training, the goal of our optimizer is to make our loss function output as small as possible. For example, if we create a super-simple network with just a single parameter node and one optimizer and connect the parameter to the optimizer, the optimizer will when training decrease the value stored in the parameter node. If we also connect the parameter node to an output node, we can by evaluating the model see that after training the model output is a smaller number than before training. This is because the optimizer has reached out to the parameter and modified its stored value. The longer you train your model, the smaller the number in the parameter will become.

Simple parameter setup
A super simple model which when training decreases the value stored in the parameter.

If we now connect our parameter not directly to our optimizer, but to a Mul node to multiply our value with the constant -1, and then connect the output of the Mul node to our optimizer, we will see that the optimizer instead increases the value stored in the parameter. This is because we have now modified our loss function to be the negative of what is stored in our parameter, and to decrease the loss function will now actually correspond to an increase in our parameter.

Simple parameter with multiplication setup
A super simple model which when training increases the value stored in the parameter. The value of the constant is -1. Notice that the paremeter is blue, but the output from the constant is grey. This is because the value sent by the parameter includes gradients (as described below), while the value sent by the constant does not include gradients. The output of the multiply node also include gradients, because the multiply node forwards incoming gradients.



The natural question to ask now is: How does the optimizer actually know what to modify in order to decrease the loss function? Our network could be super-complex, with convolutional layers, tensors that are being concatenated and then split into parts, and millions of parameters that each have a specific purpose in our model. Even in the most complex case, our optimizer manages to know precisely how to modify the parameters of each node so that the loss function becomes smaller. Turns out the magic is in mathematics, and the concept of gradients.

A gradient is a mathematical concept which in machine learning and AI is what actually allows us to train our model. In the visual editor, values (also called "tensors") are propagated through nodes, starting at an input, constant, or other node, travelling through "lines", being transformed by some mathematical operations along the way, to finally end up at an output or optimizer. When training, not only the values are flowing from one node to the next, but included with these values are also gradients. The gradients "record" everything that happens along the way, and this way, the optimizer has an understanding of everything that has happened to the value. Therefore, the optimizer can modify the appropriate parameters in order to minimize the loss function.

Some "lines" in the visual editor only send values, and some send both values and gradients. You can see if gradients are being sent by checking if a "line" (a connection between two nodes) is blue or gray. If it is blue, then gradients are included in that tensor, and if it is gray, it means that gradients are not included in that tensor. If gradients are included, then you can connect that value to an optimizer and it will train your model. If gradients are not included, it in effect means that there is nothing to optimize for that value. For example, it is not possible to connect a constant node to an optimizer, because a constant does not include gradients with its output. This would not make sense, because for a constant, there is nothing that can be changed. In order for there to be something to change, you must either use a parameter node, or wire your tensors through a node which contains parameters, such as Linear, Conv2D or BatchNorm.

Some nodes, such as Add, Reshape and Activation will propagate incoming gradients to their output. This means that the values that outputs from these nodes can be optimized, if the input to the node has gradients. Some other nodes, such as Compare, will not output any gradients, even if their input has gradients. This means that it is actually possible for you to "lose" gradients if you use nodes that are not able to propagate gradients forward. Most nodes can however propagate gradients, and the reason for why compare cannot is because it handles Boolean values. Make sure to check if your connections are blue or gray to see how your gradients are flowing through your network.

Product

  • Documentation
  • Pricing
  • API reference
  • Guides

Company

  • Support

Get going!

Sign up
  • Terms and conditions
  • Privacy policy
  • Cookie policy
  • GitHub
  • LinkedIn

This website uses cookies to enhance the experience.

Learn more