Create an image classifier using the visual editor

In this guide, we will create an image classifier using the visual editor. We will create the model using basic building blocks, and then train it on the dataset CIFAR-10 , which contains 50 thousand images, each with an assigned class. There are 10 different classes in the dataset - "airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship" and "truck". Here are three samples:

Truck

Cat

Airplane

You do not need to use the Cifar-10 dataset to follow along with this guide. You can take the same steps and apply them to train a model on your own dataset.

We will apply the same steps as in the guide Train a pre-made image classifier using the visual editor, but instead of using the pre-made model EfficientNet, we will create our model entirely from scratch.

1. Create your model

To create a new model, go to the models page, click "create" and select "visual editor" as the model type. Once it is ready, click "Go to the visual editor" on the model page.

2. Create a new state

Before we can begin editing, we need to add a new state. Click the button "New version" on the top of the page. Then, select this new version.

3. Clear all nodes

By default, your model contains a few nodes. Let's start with a fresh canvas - delete each node by clicking on it and pressing delete on the panel on the right.

4. Define inputs and outputs

Let's start building the model! The first thing we need is to create input and output points. The input for our network is an image, so create an input by clicking "Input" on the panel on the left. Click on the input node, and on the panel on the right, select "Image" as the data type, and give it the name "data". We also need an input for the class. To create this input, click on "Train-input" on the lanel on the left. Click on your new node, select the input type "String" and give it the name "class".

A train-input is an input node which only is used when training, and not when evaluating the model. When training, we know the classes because they are included in the dataset, but when evaluating, we don't have that information. That is what we want to calculate!

Important! Name your inputs exactly "data" and "class". This is required because the CIFAR-10 dataset is defined with the names "data" and "class", so you need to use the exact same names. If your dataset has other entries, you should name the inputs appropriately.

Lastly, create an output node by clicking on "Output" on the panel on the left, and give it any name you'd like. Set the data type of this output to dictionary, and assign it two entries - one of type Float32, and one of type String. Optionally, name the float entry "probability" and the string entry "class". Then, set the "Dimensions" field to 1. This is what we have so far:

A little explanation on what we see here:

The text "Float32" on the image input means that the input image is converted into 32-bit floating point numbers between 0 and 1, where 0 is the darkest color and 1 is the brightest. The text "(3, ?, ?)" is the "shape" of the value. A shape is a list of numbers, where each number represents the number of elements in the dimension. For example, the shape "(3)" tells us that the value is a list containing 3 numbers, and the shape "(3, 5)" tells us that the value is a list that contains 3 lists that each contains 5 numbers. A question mark, which we have in this case, tells us that we cannot know how many elements there will be. The shape we have here, "(3, ?, ?)" means that our image contains three channels (red, green and blue), and that we do not know the height and width of the image. The shape represents (channels, height, width).

Here is the configuration of the output node:

5. Create the convolutional layers

Connect your image input to a resize image node and enter the size 32x32. To connect, click the output port and drag. We know that the images in our dataset has that size, but this makes sure that evaluation inputs also are that size.

We will now create the structure of our neural network. We will use two convolutional layers, and two linear (also known as "dense" or "fully connected") layers. Convolutional layers are very useful for detecting features in images. Linear layers are useful for converting these detected features into a final classification.

Each layer will consist of either a 2d convolution node or a linear node, followed by one activation node of type ReLU. In the convolutional layers, we will also use one pooling node of type max pooling. An activation function is used to introduce non-linearity into our network. ReLU is a very simple, and often effective, activation function. For each element in the input, it outputs zero if that element is negative, or the value itself if it is positive:

ReLU (x) = max(x, 0)

Max pooling works by spltting the data into chunks, and for each chunk, it only outputs the maximum value in that chunk. This results in an output which is smaller than the input. The result of this method is that we build a network which is less sensitive to small changes in the input. For example, if a single pixel is moved, we do not expect a completely different output. Pooling ensures that this is not the case.

Connect the output from the resize image node to a Conv2D node, with filters set to 8 and kernel size 5 for both width and height. Connect this to an activation node and select ReLU, and lastly to a MaxPool2D with the default settings. Then, for the second convolutional layer, create a Conv2D node with 16 filters and kernel size 5 for both width and height. Connect this to another ReLU activation and MaxPool2D. We have now created two layers:

6. Create the linear layers

We will now create the two final layers of our network, which will use the linear node. The linear node operates only on the last dimension of the data, but our data is three-dimensional. For this reason, we need to reshape it from the shape "(16, 5, 5)" to a shape with just a single dimension. The value with shape "(16, 5, 5)" has 400 elements (16 * 5 * 5 = 400), so let's reshape it to the shape "(400)". To do this, add a "Reshape" node and then click on it and enter "400" in the "Shape" field. The reshape node only works if the number of elements is the same as before.

Now, add a "Linear" node and set "Units" to 130. Connect this to an activation of type ReLU. Finally, connect this to another linear node with 10 units.

We now have our entire neural network! The final linear layer outputs 10 numbers, and our aim is for each of these to represent the probability for each of the 10 output classes.

7. Convert the classes to numbers

In order to use our text classes in our network we need to convert them to numbers. We can do this using a "StringLookup" node. This node uses a vocabulary to convert each input text to an integer. In our case, the text "airplane" would be converted to 0, "automobile" to 1, "bird" to 2, and so on.

Start by creating the vocabulary. Add a "Constant" node, select the data type "String" and set the shape to "10". Add all the labels of your dataset to the constant, as shown in the image below. Then, add a "StringLookup" node and connect the class input and vocabulary.

The output from the string lookup node will now be an integer between 0 and 9, where the first vocabulary entry ("airplane") will correspond to the integer 0, the second ("automobile") to 1, etc.

8. Connect optimizer

In order to train our network, we need an "Optimizer" node. The optimizer node defines what to train - in this case, we want to compare the output from our neural network to the output from our string lookup. The difference between these two is called the "loss". When training, the optimizer will automatically attempt to modify our neural network such that the loss becomes as small as possible. The lower the loss, the better our neural network will be at classifying our images.

Create an "Optimizer" node, select the "Loss" called "Cross entropy" and set the learning rate to 0.001. Connect the output of the EfficientNet node to the "input" port of the optimizer, and connect the output of the string lookup to the "target" of the optimizer.

Also, configure the batch size on the left panel. You can experiment with different values, but for now I will set it to 20. The batch size defines how many input elements to process simultaneously. A larger batch size allows for more efficient computing and faster convergence of our model. That is, the model will train faster. However, a larger batch size can cause the final model to have a lower accuracy. Also, a larger batch size will use up more memory (RAM) because more input elements are loaded at the same time.

8. Connect output

We will now connect the output node. The output node defines what will be calculated when evaluating the model. We have two values to provide to the output node - a list of probabilities, and a list of labels. Let's start with the probabilities. Add an activation node of type Softmax. The softmax activation function converts numbers into probabilities such that the total sum of all numbers sum to 1. Connect the output of the the last linear node to our activation function, and then connect the activation function to the Float32 output.

Next, connect the constant to the other port of the output node. This is our final model:

9. Compile and test it

Before we can use the model, we need to "compile" it. First save your edits by clicking "save" on the top of the page, and then click "compile". Compiling will start up the model and call the method "createModelState" within the model code. This function will create a new state, and the new state will then be stored on the model. This new state will contain all the weights and biases of our neural network.

Once the compilation is complete, click the new state that was created and then click "set as active". By doing so, the state will be used when we evaluate or training the model. You will see a dialog appear - click okay. The dialog warns you that the input/output types of our model has been changed. This is expected.

Before we begin training, let's test the model by performing an evaluation. At this point, we expect the evaluation to just return some random output, because it is not yet trained.

Exit the visual editor, go to the "Deploy" tab on the model page, and press the "Perform evaluation" button. Upload an image, hit evaluate, and see if it works!

10. Train it

On the model page, got to the "Train" tab and press the button "Train" to bring up the training dialog.

The result of the training will be placed in a new state on the model. The first input field defines the name of this new state. Then, configure launcher CPU, memory and disk. It is a good idea to specify at least one CPU core, since our training will be doing alot of work. Specifying more CPU cores will lead to faster training, but only if all cores are being utilized. When the training session is running, you can check CPU usage to see if they are being used. Ideally, CPU usage should be close to 100% - this means all our cores that we are paying for are actually working.

As for memory, the model will crash and the training will fail if you don't specify enough memory. You can always start by specifying a high amount, and then when your training session is running, you can check the memory usage and restart the training with a lower amount if you specified too much. Specifying more memory than necessary does not improve the training speed - it is only more expensive. In this case, I will select 1 CPU core and 512 MiB of memory.

Next, it is highly recommended that you select at least as much disk space as you have input data. In our case, we can see on the CIFAR-10 page that the dataset is roughly 125 MiB, so I will select 200 MiB. There is always some overhead, so select a bit more than what your input size is. We need disk because the input data will be loaded into our launcher, and stored on disk. If we don't specify enough disk space, the input data will be stored in memory.

You want to increase the maximum duration to something like 48 hours. The maximum duration is there to prevent you from getting a huge monthly cost if the training session takes a very long time and you forget about it. In this case, we know that training will take a while and we know that we will keep an eye on it.

The final thing to do is to provide input parameters to our training session. For the data, I will select the dataset CIFAR-10. This is done by pressing "use dataset", and selecting the dataset from the list. You can also choose any other dataset, such as one that you have created yourself, as long as the data types match the expected input data types of our model. Then, enter a value for the "Epochs" parameter. This parameter tells our model how many times to go through the training data. A higher value will increase the training time. You can experiment with the number of epochs as well, but I will go for 50 for now.

Now hit train!

After a while, you will see that the model has sent some metrics. One graph displays the accuracy, and another the loss. If our model is improving, we expect the loss to decrease and the accuracy to increase. The loss function was defined on our optimizer - it is equal to the error between the output of our neural network and the labels that are defined on our dataset. If the loss descreases, it means that the error is getting smaller, and the model is improving. The loss does not give us an understanding of the actual, real-world performance however. For that, it is better to look at the accuracy graph.

If your training was successful, a new state was created and added to the model. This new state contains the updated weights and biases of our neural network. To use it, go to the "State" panel on the model page, and click "Manage". There, select the state that was created and hit "Set as active". When we now evaluate our model, the trained state will be used, so we expect it to be able to perform it's task with good accuracy.

To use this model from your own system, install an API client by following the API documentation. The following code example shows how to use the model. Replace the /path/to/my/api/key.txt with the path to your API key, /path/to/my/image.png with the path to your image, and <place-your-model-id-here> with the model id, which can be found at the top of the model page.

// Run this code on your own machine and it will communicate with Decthings servers and execute the model in the cloud.
import * as fs from 'fs';
import { DecthingsClient, Data, DataElement } from '@decthings/api-client';

async function main() {
    let apiKey = fs.readFileSync('/path/to/my/api/key.txt').toString();
    let client = new DecthingsClient({ apiKey });

    let image = fs.readFileSync('/path/to/my/image.png');

    let inputData = new Data([DataElement.image('png', image)]);
    let response = await client.model.evaluate("<place-your-model-id-here>", [{ name: 'input', data: inputData }]);

    // Do something with the result!
    if (response.error) {
        console.log('Could not start evaluation:', response.error);
    }
    else if (response.result.failed) {
        console.log('The evaluation failed:', response.result.failed);
    }
    else {
        let outputs = response.result.success.outputs;
        // Outputs is a list of all the output parameters. The model provides just a single output parameter, so read it using [0].
        let data = outputs[0].data;
        for (let dictionary of data.values()) {
            let className = dictionary.get('class').getString();
            let probability = dictionary.get('probability').getNumber();
            console.log(`${className}: ${probability}`);
        }
    }
}

main()