decthings

Train a pre-made image classifier using the visual editor

In this guide, we will train an image classifier by using the Decthings visual editor and the EfficientNet pre-made model. We will train the model on the dataset CIFAR-10, which contains 50 thousand images, but you can use any other dataset, or create your own.

Each image in the dataset has an assigned class. There are 10 different classes in the dataset - "airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship" and "truck". Here are three samples:

Truck

Truck

Cat

Cat

Airplane

Airplane

As you can see, the images are quite small - 32x32 pixels to be exact. You could of course use larger images!

Transfer learning

The technique we'll use is known as transfer learning. It means that we take a model which has been trained on another dataset, and then re-train that model on our own dataset. The EfficientNet model that we'll use has been pre-trained on the ImageNet dataset. It is a very large dataset, containing over 14 million images.

In transfer learning, we take advantage of the fact that the original dataset the model was trained on and our new dataset that we want to train on share many similarities - for example edges, patterns, colors, etc. This is true even if the labels are different. Using transfer learning, you can get good accuracy even when using much fewer training samples. The dataset we'll use in this guide contains 50 thousand images, but you could probably get good results with your own dataset of much fewer images.

A neural network consists of many "layers". The final layer, or the "top" layer, is responsible for turning the features detected by our neural network into predictions. Because we want other labels that what EfficientNet was trained on originally, we will completely replace the final layer with a new one. Our new final layer will still have access to all the features detected by the previous layers.

We might also want to "freeze" some of the layers in the EfficientNet model. Freezing means that we do not update the weights and biases of these layers when training. This is a good idea because these layers have already been trained to a high accuracy, so any modification can actually decrease our model performance. You will be able to experiment with how many layers to freeze.

1. Create your model

To create a new model, go to the models page, click "create" and select "visual editor" as the model type. Once it is ready, click "Go to the visual editor" on the model page.

2. Create a new state

Before we can begin editing, we need to add a new state. Click the button "New version" on the top of the page. Then, select this new version.

3. Clear all nodes

By default, your model contains a few nodes. Let's start with a fresh canvas - delete each node by clicking on it and pressing delete on the panel on the right.

4. Define inputs and outputs

Let's start building the model! The first thing we need is to create input and output points. The input for our network is an image, so create an input by clicking "Input" on the panel on the left. Click on the input node, and on the panel on the right, select "Image" as the data type, and give it the name "data". We also need an input for the class. To create this input, click on "Train-input" on the lanel on the left. Click on your new node, select the input type "String" and give it the name "class".

A train-input is an input node which only is used when training, and not when evaluating the model. When training, we know the classes because they are included in the dataset, but when evaluating, we don't have that information. That is what we want to calculate!

Important! Name your inputs exactly "data" and "class". This is required because the CIFAR-10 dataset is defined with the names "data" and "class", so you need to use the exact same names. If your dataset has other entries, you should name the inputs appropriately.

Lastly, create an output node by clicking on "Output" on the panel on the left, and give it any name you'd like. Set the data type of this output to dictionary, and assign it two entries - one of type Float32, and one of type String. Optionally, name the float entry "probability" and the string entry "class". Then, set the "Dimensions" field to 1. This is what we have so far:

Visual editor
A little explanation on what we see here:

The text "Float32" on the image input means that the input image is converted into 32-bit floating point numbers between 0 and 1, where 0 is the darkest color and 1 is the brightest. The text "(3, ?, ?)" is the "shape" of the value. A shape is a list of numbers, where each number represents the number of elements in the dimension. For example, the shape "(3)" tells us that the value is a list containing 3 numbers, and the shape "(3, 5)" tells us that the value is a list that contains 3 lists that each contains 5 numbers. A question mark, which we have in this case, tells us that we cannot know how many elements there will be. The shape we have here, "(3, ?, ?)" means that our image contains three channels (red, green and blue), and that we do not know the height and width of the image. The shape represents (channels, height, width).

Here is the configuration of the output node:

Visual editor
5. Add the EfficientNet image classifier

Scroll down on the left panel and add the nodes "EfficientNetPreProcess" and "EfficientNet". Click the EfficientNet node, and make sure the number of classes are set to 10.

You can experiment with the setting "Number of frozen layers", as described above. I am going to set it to the maximum value. This means that the entire network will be frozen, except for the final layer.

Connect the "data" input to the preprocess node, and then connect the preprocess node to the EfficientNet node.

Visual editor
6. Convert the classes to numbers

In order to use our text classes in our network we need to convert them to numbers. We can do this using a "StringLookup" node. This node uses a vocabulary to convert each input text to an integer. In our case, the text "airplane" would be converted to 0, "automobile" to 1, "bird" to 2, and so on.

Start by creating the vocabulary. Add a "Constant" node, select the data type "String" and set the shape to "10". Add all the labels of your dataset to the constant, as shown in the image below. Then, add a "StringLookup" node and connect the class input and vocabulary.

Visual editorVisual editor

The output from the string lookup node will now be an integer between 0 and 9, where the first vocabulary entry ("airplane") will correspond to the integer 0, the second ("automobile") to 1, etc.

7. Connect optimizer

In order to train our network, we need an "Optimizer" node. The optimizer node defines what to train - in this case, we want to compare the output from our neural network to the output from our string lookup. The difference between these two is called the "loss". When training, the optimizer will automatically attempt to modify our neural network such that the loss becomes as small as possible. The lower the loss, the better our neural network will be at classifying our images.

Create an "Optimizer" node, select the loss called "Cross entropy" and set the learning rate to 0.001. Connect the output of the EfficientNet node to the "input" port of the optimizer, and connect the output of the string lookup to the "target" of the optimizer.

Also, configure the batch size on the left panel. You can experiment with different values, but for now I will set it to 10. The batch size defines how many input elements to process simultaneously. A larger batch size allows for more efficient computing and faster convergence of our model. That is, the model will train faster. However, a larger batch size can cause the final model to have a lower accuracy. Also, a larger batch size will use up more memory (RAM) because more input elements are loaded at the same time. EfficientNet uses quite a bit of memory, so make sure to use enough memory on your launcher when training and evaluating.

8. Connect output

We will now connect the output node. The output node defines what will be calculated when evaluating the model. We have two values to provide to the output node - a list of probabilities, and a list of labels. Let's start with the probabilities. Add an activation node of type Softmax. The softmax activation function converts numbers into probabilities such that the total sum of all numbers sum to 1. Connect the output of the EfficientNet node to our activation function, and then connect the activation function to the Float32 output.

Next, connect the constant to the other port of the output node. This is our final model:

Visual editor
9. Compile and test it

Before we can use the model, we need to "compile" it. First save your edits by clicking "save" on the top of the page, and then click "compile". Compiling will start up the model and call the method "createModelState" within the model code. This function will create a new state, and the new state will then be stored on the model. This new state will contain all the weights and biases of our neural network. You probably need to increase the amount of memory to about 1024 MiB, because the EfficientNet model takes up quite a bit of memory.

Once the compilation is complete, click the new state that was created and then click "set as active". By doing so, the state will be used when we evaluate or training the model. You will see a dialog appear - click okay. The dialog warns you that the input/output types of our model has been changed. This is expected.

Before we begin training, let's test the model by performing an evaluation. At this point, we expect the evaluation to just return some random output, because it is not yet trained. Exit the visual editor, go to the "Deploy" tab on the model page, and scroll down to the "Settings" panel. There, set the amount of memory for the default launcher to 512 MiB.

Now, press the "Perform evaluation" button, upload an image, hit evaluate, and see if it works!

10. Train it

On the model page, go to the "Train" tab and press the button "Train" to bring up the training dialog.

The result of the training will be placed in a new state on the model. The first input field defines the name of this new state. Then, configure launcher CPU, memory and disk. It is a good idea to specify at least one CPU core, since our training will be doing alot of work. Specifying more CPU cores will lead to faster training, but only if all cores are being utilized. When the training session is running, you can check CPU usage to see if they are being used. Ideally, CPU usage should be close to 100% - this means all our cores that we are paying for are actually working.

As for memory, the model will crash and the training will fail if you don't specify enough memory. You can always start by specifying a high amount, and then when your training session is running, you can check the memory usage and restart the training with a lower amount if you specified too much. Specifying more memory than necessary does not improve the training speed - it is only more expensive. In this case, I will select 2 CPU cores and 6 GiB of memory.

Next, it is highly recommended that you select at least as much disk space as you have input data. In our case, we can see on the CIFAR-10 page that the dataset is roughly 125 MiB, so I will select 200 MiB. There is always some overhead, so select a bit more than what your input size is. We need disk because the input data will be loaded into our launcher, and stored on disk. If we don't specify enough disk space, the input data will be stored in memory.

You want to increase the maximum duration to something like 48 hours. The maximum duration is there to prevent you from getting a huge monthly cost if the training session takes a very long time and you forget about it. In this case, we know that training will take a while and we know that we will keep an eye on it.

The final thing to do is to provide input parameters to our training session. For the data, I will select the dataset CIFAR-10. This is done by pressing "use dataset", and selecting the dataset from the list. You can also choose any other dataset, such as one that you have created yourself, as long as the data types match the expected input data types of our model. Then, enter a value for the "Epochs" parameter. This parameter tells our model how many times to go through the training data. A higher value will increase the training time. The CIFAR-10 dataset contains 50 thousand elements, and I happen to know that with that many elements it is enough to run a single epoch to get good results.

Now hit train!

After a while, you will see that the model has sent some metrics. One graph displays the accuracy, and another the loss. If our model is improving, we expect the loss to decrease and the accuracy to increase. The loss function was defined on our optimizer - it is equal to the error between the output of our neural network and the labels that are defined on our dataset. If the loss descreases, it means that the error is getting smaller, and the model is improving. The loss does not give us an understanding of the actual, real-world performance however. For that, it is better to look at the accuracy graph.

Visual editorVisual editor
11. Use the trained model

If your training was successful, a new state was created and added to the model. This new state contains the updated weights and biases of our neural network. To use it, go to the "State" panel on the model page, and click "Manage". There, select the state that was created and hit "Set as active". When we now evaluate our model, the trained state will be used, so we expect it to be able to perform it's task with good accuracy.

To use this model from your own system, install an API client by following the API documentation. The following code example shows how to use the model. Replace the/path/to/my/api/key.txt with the path to your API key, /path/to/my/image.png with the path to your image, and <place-your-model-id-here> with the model id, which can be found at the top of the model page.

// Run this code on your own machine and it will communicate with Decthings servers and execute the model in the cloud.
import * as fs from 'fs';
import { DecthingsClient, Data, DataElement } from '@decthings/api-client';

async function main() {
    let apiKey = fs.readFileSync('/path/to/my/api/key.txt').toString();
    let client = new DecthingsClient({ apiKey });

    let image = fs.readFileSync('/path/to/my/image.png');

    let inputData = new Data([DataElement.image('png', image)]);
    let response = await client.model.evaluate("<place-your-model-id-here>", [{ name: 'input', data: inputData }]);

    // Do something with the result!
    if (response.error) {
        console.log('Could not start evaluation:', response.error);
    }
    else if (response.result.failed) {
        console.log('The evaluation failed:', response.result.failed);
    }
    else {
        let outputs = response.result.success.outputs;
        // Outputs is a list of all the output parameters. The model provides just a single output parameter, so read it using [0].
        let data = outputs[0].data;
        for (let dictionary of data.values()) {
            let className = dictionary.get('class').getString();
            let probability = dictionary.get('probability').getNumber();
            console.log(`${className}: ${probability}`);
        }
    }
}

main()

Product

  • Documentation
  • Pricing
  • API reference
  • Guides

Company

  • Support

Get going!

Sign up
  • Terms and conditions
  • Privacy policy
  • Cookie policy
  • GitHub
  • LinkedIn

This website uses cookies to enhance the experience.

Learn more