decthings

Tensors

For some methods, such as Model.evaluate and Dataset.addEntries, model input and output data is provided and/or returned. Decthings has a type system which means that the format of this data must follow specified rules. This document describes the format of rules and data.

DecthingsTensor

A DecthingsTensor is a multi-dimensional array of uniform values, similar to a Numpy array, TensorFlow Tensor or PyTorch Tensor. Multi-dimensional means that a DecthingsTensor can either contain a single value, a list of values, a 2D grid of values, and so on. Uniform means that all elements in a DecthingsTensor has the same data type.

DecthingsTensorRules

The DecthingsTensorRules object represents rules which a DecthingsTensor must adhere to. DataRules can for example be used to limit the allowed inputs to a model, for example to only images in the case of an image classifier.

The rules contain two different limits - data type and shape.

Each DecthingsTensor has a data type - the possible values are given below. The rules contain a list of allowed data types.

In addition to limiting the possible data types, the rules also limit the shape of the array. A shape is an array of numbers where each number represents the number of elements in that level of nesting, also called "dimension". The empty shape, [], means that the DecthingsTensor contains just a single value. The shape [2] means that the DecthingsTensor is a list containing two values, and the shape [2, 2, 1] means that the DecthingsTensor contains 2 lists which each contains 2 lists which each contains 1 value.

We first define some JSON types which describe the rules of data. This is used to put constraints on what type and shape of data can be provided to a method - for example, a dataset can be set to only accept images as entries.

/**
 * All available data types.
 */
export type DecthingsElementType =
    // 32-bit and 64-bit floats float
    | 'f32'
    | 'f64'

    // 8-bit, 16-bit, 32-bit and 64-bit signed integers
    | 'i8'
    | 'i16'
    | 'i32'
    | 'i64'

    // 8-bit, 16-bit, 32-bit and 64-bit unsigned integer
    | 'u8'
    | 'u16'
    | 'u32'
    | 'u64'

    | 'string'
    | 'boolean'
    | 'binary'
    | 'image'
    | 'audio'
    | 'video'
/**
 * Specifies rules for the shape and allowed data types for a Data or DataElement.
 */
export type DecthingsTensorRules = {
        /**
     * Shape of the tensor.
     *
     * DecthingsTensors are multi-dimensional arrays. The shape defines how many dimensions there are, and
     * the number of elements in each dimension.
     *
     * [] would mean a scalar.
     * [1] would mean an array that contains just a single element, [2] would mean two elements and so on.
     * [2, 1] would mean an array that contains two arrays that each contain one element.
     * [2, 2, 1] would mean an array that contains two arrays that each contain two arrays that each contain
     * one element, and so on.
     *
     * Providing a value of -1 in any place would allow that dimension to be of any length.
     */
    shape: number[]
    /**
     * A list of the allowed types of elements in the data array.
     */
    allowedTypes: []
}

Binary format


Note: We will now describe the binary representation of DecthingsTensors. If you just want to get started, you can use an API client which performs all the binary serialization and deserialization for you. If you're interested in the details, keep reading!


First, we define a varint. A varint is a way to encode an integer so that small values take up less space. In this protocol, a 64-bit unsigned varint which encodes the 64-bit unsigned integer x is defined as:

  • If x < 253: Encode a single byte with value x.
  • Otherwise, if x < 2^16: Encode a single byte with value 253 followed by a 16-bit big-endian unsigned integer with value x.
  • Otherwise, if x < 2^32: Encode a single byte with value 254 followed by a 32-bit big-endian unsigned integer with value x.
  • Otherwise: Encode a single byte with value 255 followed by a 64-bit big-endian unsigned integer with value x.

For example, the value x = 18 would be encoded as the byte sequence [18]. The value x = 819 would be encoded as the byte sequence [253 3 51], because [3 51] is the 16-bit byte sequence for 819.

The following table lists byte literals which are used to encode the type of data within the binary representation:

TypeType specifier
f321
f642
i83
i164
i325
i646
u87
u168
u329
u6410
string11
binary12
boolean13
image14
audio15
video16

The serialized DecthingsTensor starts with a single byte encoding the data format, from the table above.

Next, the shape is encoded. The number of dimensions (length of the shape array) is encoded as a single byte, followed by the size of each dimension encoded as 64-bit unsigned varints.

Finally, the tensor data is encoded. Numeric values are encoded in little-endian format. A boolean is encoded as the byte value 0 for "false", and 1 for "true". Strings are encoded as valid UTF-8. Binaries are not modified, they are encoded as-is. Images, audio and video are encoded by first encoding exactly three bytes which contain the file format file extension is ASCII encoding, followed by the binary media content, such as the PNG, MP3 or MP4 data.

For fixed-sized data types (f32, f64, i8, i16, i32, i64, u8, u16, u32, u64 and boolean), there is no padding or additional bytes between the elements. All the encoded values are placed one after another.

For variable-sized data types (string, binary, image, audio and video), the byte-length of the encoded tensor element is encoded as a 64-bit unsigned varint before the element data. For strings and binaries, this is just the byte-length of the value. For images, audios and videos, this is the byte-length of the media, but plus 3 because of the three bytes for the format.

For example, for a DecthingsTensor with shape [2], containing the two strings "hello" and ", world!", the serialized tensor would be:

  • The byte literal 11 (for string)
  • The byte literal 1 (for 1 dimension)
  • The byte literal 2 (the size 2 of the first dimension, encoded as a 64-bit unsigned varint)
  • The byte literal 5 (for the length of the first string, encoded as a 64-bit unsigned varint)
  • The UTF-8 encoded string "hello"
  • The byte literal 8 (for the length of the second string, encoded as a 64-bit unsigned varint)
  • The UTF-8 encoded string ", world!"

Product

  • Documentation
  • Pricing
  • API reference
  • Guides

Company

  • Support

Get going!

Sign up
  • Terms and conditions
  • Privacy policy
  • Cookie policy
  • GitHub
  • LinkedIn

This website uses cookies to enhance the experience.

Learn more