decthings

Create a dataset and upload data using the API

When training our models we often need large amounts of data. Decthings Datasets allow you to upload data, either in the browser or via the API, and then access that data within a model. In this guide we will create a dataset containing images and labels using the API, as a way to demonstrate how you can automate the process of uploading data to Decthings. The dataset we will create is the MNIST database of handwritten digits by Yann LeCun (Courant Institute, NYU), Corinna Cortes (Google Labs, New York) and Christopher J.C. Burges (Microsoft Research, Redmond). This dataset is already made available for use within the Decthings platform, but we will recreate it for demonstrational purposes. A similar process can then be used to create any dataset you'd like.

Uploading data using the browser user interface can be quite time consuming. If you have your data on your local computer you can instead write a script which iterates through the data and uploads it all to Decthings. If you have some sensor which is continuously collecting data, for example a camera connected to a computer, you can write a script which continuously uploads data as soon as it is collected by the sensor.

We will be using the Node.js API client to communicate from our computer with Decthings servers.

1. Install Node.js and NPM

Node.js is a runtime for the JavaScript programming language. Using Node, you can write programs in JavaScript which execute on your local computer. NPM is a package manager that is installed when you install Node. NPM allows you to install dependencies that you want to use in your code.

Go to nodejs.org and follow the instructions there to install the latest version of Node.js. This should also install the latest version of NPM for you.

2. Setup a new Node project

Create a new folder anywhere on your computer. This is where we will put the code. Then, open a terminal. On windows, this can be done by pressing ctrl + r, and then typing "cmd" and then clicking OK. On MacOS, press ⌘ + space and then search for "terminal". Inside your terminal, navigate to your directory by typing "cd" followed by the path to your directory.

Type "npm init" and then press enter and follow the instructions. This will setup a new node project in your folder. Next, run "npm install @decthings/api-client" and press enter. This will install the latest version of the Decthings API client.

Create a new file in your folder called "index.js". This file will contain our script.

3. Download the MNIST dataset

The dataset can be downloaded here:

Download the MNIST dataset

This downloads a ZIP file, which you should then extract in your Node.js project folder.

4. Read the data in Node

The following code will read the file data, loop through it byte for byte to construct the images and labels, and convert the images to PNG format. The file format is described on the MNIST website, so I have written this script according to what they specified there. Before you run the script, install the dependency pngjs, which allows us to create PNG images in Node.js, by running the following command in your terminal: "npm install pngjs".

Note: This process will differ widely depending on the dataset that you have. Different datasets may be stored in different formats. The simplest case is if you just have PNG images, because these could then just be loaded without any parsing.

const fs = require('fs')
const decthings = require('@decthings/api-client')
const PNG = require('pngjs').PNG

async function getImagesAndLabels() {
    // Read the two files that we downloaded
    const imagesBytes = fs.readFileSync('./train-images.idx3-ubyte')
    const labelsBytes = fs.readFileSync('./train-labels.idx1-ubyte')

    let parsed = []

    // Now, loop through the files. The data starts at byte position 16 and 8.
    let imagesBytesPos = 16
    let labelsBytesPos = 8
    while (imagesBytesPos < imagesBytes.byteLength) {
        // Construct a new PNG object
        const image = new PNG({ width: 28, height: 28 })

        // For each pixel value, write RGBA values to the PNG object.
        for (let i = 0; i < 28 * 28; i++) {
            let brightness = imagesBytes[imagesBytesPos + i]

            // The data is layed out as Red, green, blue and transparency. We set the RGB to the same brightness, and no transparency.
            image.data[i * 4] = brightness
            image.data[i * 4 + 1] = brightness
            image.data[i * 4 + 2] = brightness
            image.data[i * 4 + 3] = 255
        }

        // We can now save the PNG data
        let packed = image.pack()
        let chunks = []
        packed.on('data', chunk => chunks.push(chunk))
        await new Promise((resolve) => {
            packed.on('end', () => {
                resolve()
            })
        })
        let pngData = Buffer.concat(chunks)

        // Read the label
        let label = labelsBytes[labelsBytesPos]

        // Done! Store the PNG data and label in the list.
        parsed.push({ png: pngData, label: label })
        
        imagesBytesPos += 28 * 28;
        labelsBytesPos += 1
    }

    return parsed
}

async function main() {
    const imagesAndLabels = await getImagesAndLabels()
    fs.writeFileSync(__dirname + '/entry1.png', imagesAndLabels[0].png)
    fs.writeFileSync(__dirname + '/entry2.png', imagesAndLabels[1].png)
    fs.writeFileSync(__dirname + '/entry3.png', imagesAndLabels[2].png)
}

main()

If you run the script above by executing the command "node index.js", you will see that it writes the following three image files to your folder:

Entry 1

entry1.png

Entry 2

entry2.png

Entry 3

entry3.png

Looks good!

5. Create an API key

In order to use the Decthings API, you need to generate an API key. This allows your script to identify itself as your user.

Go to the API keys pageand click "Create". When the API key is created, copy it and paste into a new file in your folder called "auth.txt".

Keep your API key safe! If you suspect that your API key was leaked, login to your Decthings account and delete the API key. This will completely disable the key. We at Decthings will never ask for your API key.

6. Create the dataset

We can create our dataset from code, using the following script. It creates a dataset called "MNIST", and sets the data type. Firstly, the data type has a shape of "[]". The shape is a list of numbers, for example "[2, 3]", where each element represents the number of elements in that dimension. This can allow us to have a dataset where each element is a list of some sort. For example, the shape "[3]", means that each element in the dataset is a list containing 3 elements. The shape "[3, 5]" means that each element in the dataset is a list that contains 3 lists that each contain 5 elements. The shape we selected in this case, "[]", means that the elements of our dataset are not lists.

Then, we specify one "allowedType", which we set to type "dict". A "dict", short for dictionary, is a type of data that has keys and values. In this case, we specify two entries: One with key "data" and value of type image, and one with key "label" and value of type u8, which is a unsigned 8-bit integer. That is, an integer between 0 and 255. Of course, we know that we only have values between 0 and 9 in our dataset, but 8-bit integer is as low as we can go.

const fs = require('fs')
const decthings = require('@decthings/api-client')

let apiKey = fs.readFileSync('auth.txt').toString()
let client = new decthings.DecthingsClient({ apiKey })

async function createDataset() {
    let createResponse = await client.dataset.createDataset(
        'MNIST',
        'The MNIST dataset',
        {
            shape: [],
            allowedTypes: [
                {
                    type: 'dict',
                    entries: [
                        {
                            name: 'data',
                            rules: { shape: [], allowedTypes: ['image'] }
                        },
                        {
                            name: 'label',
                            rules: { shape: [], allowedTypes: ['u8'] }
                        },
                    ]
                }
            ]
        }
    )
    if (createResponse.error) {
        console.log('Got an error from the Decthings API: ' + JSON.stringify(createResponse.error))
        process.exit(1)
    }
    console.log('Dataset created!')
    return createResponse.result.datasetId
}

async function main() {
    let datasetId = await createDataset()
}

main()

7. Upload the data

The following function uploads data to our dataset. It takes the dataset ID and the data to upload as parameters. Then, it converts the images into a list of instances of the decthings.DataElement class. We do this by calling the function "decthings.DataElement.dict()". This function will create a new DataElement of type "dict", which we then can upload to our dataset. The "decthings.DataElement.dict" function takes a Map as input. This map should contain the entries of our dictionary. In this case, we have one entry with name "data", and we use the "decthings.DataElement.image" function to create a DataElement of type image. We also have an element with name "label", and use the "decthings.DataElement.u8" function to create the 8-bit unsigned integer to upload.

async function uploadData(datasetId, imagesAndLabels) {
    for (let i = 0; i < imagesAndLabels.length; i += 50) {
        let chunk = imagesAndLabels.slice(i, i + 50)
        let dataElements = chunk.map(element => {
            return decthings.DataElement.dict(new Map([
                ['data', decthings.DataElement.image('png', element.png)],
                ['label', decthings.DataElement.u8(element.label)]
            ]))
        })

        const uploadResponse = await client.dataset.addEntries(datasetId, dataElements)

        if (uploadResponse.error) {
            console.log('Got an error from the Decthings API: ' + JSON.stringify(uploadResponse.error))
            process.exit(1)
        }
        console.log('Uploaded 50 entries')
    }
}

8. Putting it all together

Here is our final script:

const fs = require('fs')
const decthings = require('@decthings/api-client')
const PNG = require('pngjs').PNG

async function getImagesAndLabels() {
    // Read the two files that we downloaded
    const imagesBytes = fs.readFileSync('./train-images.idx3-ubyte')
    const labelsBytes = fs.readFileSync('./train-labels.idx1-ubyte')

    let parsed = []

    // Now, loop through the files. The data starts at byte position 16 and 8.
    let imagesBytesPos = 16
    let labelsBytesPos = 8
    while (imagesBytesPos < imagesBytes.byteLength) {
        // Construct a new PNG object
        const image = new PNG({ width: 28, height: 28 })

        // For each pixel value, write RGBA values to the PNG object.
        for (let i = 0; i < 28 * 28; i++) {
            let brightness = imagesBytes[imagesBytesPos + i]

            // The data is layed out as Red, green, blue and transparency. We set the RGB to the same brightness, and no transparency.
            image.data[i * 4] = brightness
            image.data[i * 4 + 1] = brightness
            image.data[i * 4 + 2] = brightness
            image.data[i * 4 + 3] = 255
        }

        // We can now save the PNG data
        let packed = image.pack()
        let chunks = []
        packed.on('data', chunk => chunks.push(chunk))
        await new Promise((resolve) => {
            packed.on('end', () => {
                resolve()
            })
        })
        let pngData = Buffer.concat(chunks)

        // Read the label
        let label = labelsBytes[labelsBytesPos]

        // Done! Store the PNG data and label in the list.
        parsed.push({ png: pngData, label: label })
        
        imagesBytesPos += 28 * 28;
        labelsBytesPos += 1
    }

    return parsed
}

let apiKey = fs.readFileSync('auth.txt').toString()
let client = new decthings.DecthingsClient({ apiKey })

async function createDataset() {
    let createResponse = await client.dataset.createDataset(
        'MNIST',
        'The MNIST dataset',
        {
            shape: [],
            allowedTypes: [
                {
                    type: 'dict',
                    entries: [
                        {
                            name: 'data',
                            rules: { shape: [], allowedTypes: ['image'] }
                        },
                        {
                            name: 'label',
                            rules: { shape: [], allowedTypes: ['u8'] }
                        },
                    ]
                }
            ]
        }
    )
    if (createResponse.error) {
        console.log('Got an error from the Decthings API: ' + JSON.stringify(createResponse.error))
        process.exit(1)
    }
    console.log('Dataset created!')
    return createResponse.result.datasetId
}

async function uploadData(datasetId, imagesAndLabels) {
    for (let i = 0; i < imagesAndLabels.length; i += 50) {
        let chunk = imagesAndLabels.slice(i, i + 50)
        let dataElements = chunk.map(element => {
            return decthings.DataElement.dict(new Map([
                ['data', decthings.DataElement.image('png', element.png)],
                ['label', decthings.DataElement.u8(element.label)]
            ]))
        })

        const uploadResponse = await client.dataset.addEntries(datasetId, dataElements)

        if (uploadResponse.error) {
            console.log('Got an error from the Decthings API: ' + JSON.stringify(uploadResponse.error))
            process.exit(1)
        }
        console.log('Uploaded 50 entries')
    }
}

async function main() {
    let imagesAndLabels = await getImagesAndLabels()
    let datasetId = await createDataset()
    await uploadData(datasetId, imagesAndLabels)
    console.log('Everything complete! The dataset was created and our entries uploaded.')
    process.exit(0)
}

main()

If you place the contents of this script in your "index.js" file, and then run it by executing "node index.js" in your terminal, you should see that a dataset is created on your Decthings account and that the MNIST entries are being added.

9. Conclusion

Creating a dataset by using the API does not need to be very difficult, although it requires some programming knowledge. In this example, half of the code we wrote was to parse the MNIST dataset binary files that we downloaded from the website. If you have images on your local computer, it may not be as difficult because you can use the node FS module to read those images without doing any parsing.

Check out our other guides to learn how to create a model with which you can actually utilize your new dataset. Good luck!

Product

  • Documentation
  • Pricing
  • API reference
  • Guides

Company

  • Support

Get going!

Sign up
  • Terms and conditions
  • Privacy policy
  • Cookie policy
  • GitHub
  • LinkedIn

This website uses cookies to enhance the experience.

Learn more