DDO Metadata

Specification of the DDO subset dedicated to asset metadata

Overview

This page defines the schema for asset metadata. Metadata is the subset of an Ocean DDO that holds information about the asset.

The schema is based on public schema.org DataSet schema.

Standardizing labels is key to effective searching, sorting and filtering (discovery).

This page specifies metadata attributes that must be included, and that may be included. These attributes are organized hierarchically, from top-layer attributes like "main" to sub-level attributes like "main.type". This page also provides DDO metadata examples.

Rules for Metadata Storage and Control in Ocean

The publisher publishes an asset DDO (including metadata) onto the chain.

The publisher may be the asset owner, or a marketplace acting on behalf of the owner.

Most metadata fields may be modified after creation. The blockchain records the provenance of changes.

DDOs (including metadata) are found in two places:

  • Remote - main storage, on-chain. File URLs are always encrypted. One may actually encrypt all metadata, at a severe cost to discoverability.
  • Local - local cache. All fields are in plaintext.

Ocean Aquarius helps manage metadata. It can be used to write DDOs to the chain, read from the chain, and has a local cache of the DDO in plaintext with fast search.

Fields for Metadata

An asset represents a resource in Ocean, e.g. a dataset or an algorithm.

A metadata object has the following attributes, all of which are objects. Some are only required for local or remote, and are specified as such.

AttributeRequiredDescription
mainYesMain attributes
encryptedFilesRemoteEncrypted string of the attributes.main.files object.
encryptedServicesRemoteEncrypted string of the attributes.main.services object.
statusNoStatus attributes
additionalInformationNoOptional attributes

The main and additionalInformation attributes are independent of the asset type.

Fields for attributes.main

The main object has the following attributes.

AttributeTypeRequiredDescription
nameTextYesDescriptive name or title of the asset.
typeTextYesAsset type. Includes "dataset" (e.g. csv file), "algorithm" (e.g. Python script). Each type needs a different subset of metadata attributes.
authorTextYesName of the entity generating this data (e.g. Tfl, Disney Corp, etc.).
licenseTextYesShort name referencing the license of the asset (e.g. Public Domain, CC-0, CC-BY, No License Specified, etc. ). If it’s not specified, the following value will be added: “No License Specified”.
filesArray of files objectYesArray of File objects including the encrypted file urls.
dateCreatedDateTimeYesThe date on which the asset was created by the originator. ISO 8601 format, Coordinated Universal Time, e.g. 2019-01-31T08:38:32Z.
datePublishedDateTimeRemoteThe date on which the asset DDO is registered into the metadata store (Aquarius)

Fields for attributes.main.files

The files object has a list of file objects.

Each file object has the following attributes, with the details necessary to consume and validate the data.

AttributeRequiredDescription
indexYesIndex number starting from 0 of the file.
contentTypeYesFile format.
urlLocalContent URL. Omitted from the remote metadata. Supports http(s):// and ipfs:// URLs.
nameNoFile name.
checksumNoChecksum of the file using your preferred format (i.e. MD5). Format specified in checksumType. If it’s not provided can’t be validated if the file was not modified after registering.
checksumTypeNoFormat of the provided checksum. Can vary according to server (i.e Amazon vs. Azure)
contentLengthNoSize of the file in bytes.
encodingNoFile encoding (e.g. UTF-8).
compressionNoFile compression (e.g. no, gzip, bzip2, etc).
encryptedNoBoolean. Is the file encrypted? If is not set is assumed the file is not encrypted
encryptionModeNoEncryption mode used. Just valid if encrypted=true
resourceIdNoRemote identifier of the file in the external provider. It is typically the remote id in the cloud provider.
attributesNoKey-Value hash map with additional attributes describing the asset file. It could include details like the Amazon S3 bucket, region, etc.

Fields for attributes.status

A status object has the following attributes.

AttributeTypeRequiredDescription
isListedBooleanNoUse to flag unsuitable content. True by default. If it’s false, the content must not be returned.
isRetiredBooleanNoFlag retired content. False by default. If it’s true, the content may either not be returned, or returned with a note about retirement.
isOrderDisabledBooleanNoFor temporarily disabling ordering assets, e.g. when file host is in maintenance. False by default. If it’s true, no ordering of assets for download or compute should be allowed.

Fields for attributes.additionalInformation

All the additional information will be stored as part of the additionalInformation section.

AttributeTypeRequired
tagsArray of TextNoArray of keywords or tags used to describe this content. Empty by default.
descriptionTextNoDetails of what the resource is. For a dataset, this attribute explains what the data represents and what it can be used for.
copyrightHolderTextNoThe party holding the legal copyright. Empty by default.
workExampleTextNoExample of the concept of this asset. This example is part of the metadata, not an external link.
linksArray of LinkNoMapping of links for data samples, or links to find out more information. Links may be to either a URL or another Asset. We expect marketplaces to converge on agreements of typical formats for linked data: The Ocean Protocol itself does not mandate any specific formats as these requirements are likely to be domain-specific. The links array can be an empty array, but if there is a link object in it, then an “url” is required in that link object.
inLanguageTextNoThe language of the content. Please use one of the language codes from the IETF BCP 47 standard.
categoriesArray of TextNoOptional array of categories associated to the asset. Note: recommended to use "tags" instead of this.

Fields - Other Suggestions

Here are example attributes to help an asset’s discoverability.

AttributeDescription
updateFrequencyAn indication of update latency - i.e. How often are updates expected (seldom, annually, quarterly, etc.), or is the resource static that is never expected to get updated.
structuredMarkupA link to machine-readable structured markup (such as ttl/json-ld/rdf) describing the dataset.

DDO Metadata Example - Local

This is what the DDO metadata looks like. All fields are in plaintext. This is before it’s stored on-chain or when it’s retrieved and decrypted into a local cache.

{
  "main": {
    "name": "Madrid Weather forecast",
    "dateCreated": "2019-05-16T12:36:14.535Z",
    "author": "Norwegian Meteorological Institute",
    "type": "dataset",
    "license": "Public Domain",
    "price": "123000000000000000000",
    "files": [
      {
        "index": 0,
        "url": "https://example-url.net/weather/forecast/madrid/350750305731.xml",
        "contentLength": "0",
        "contentType": "text/xml",
        "compression": "none"
      }
    ]
  },
  "additionalInformation": {
    "description": "Weather forecast of Europe/Madrid in XML format",
    "copyrightHolder": "Norwegian Meteorological Institute",
    "categories": ["Other"],
    "links": [],
    "tags": [],
    "updateFrequency": null,
    "structuredMarkup": []
  },
  "status": {
    "isListed": true,
    "isRetired": false,
    "isOrderDisabled": false
  }
}

DDO Metadata Example - Remote

The previous example was for a local cache, with all fields in plaintext.

Here’s the same example, for remote on-chain storage. That is, it’s how metadata looks as a response to querying Aquarius (remote metadata).

How remote is changed, compared to local:

  • url is removed from all objects in the files array
  • encryptedFiles is added.
{
  "service": [
    {
      "index": 0,
      "serviceEndpoint": "http://aquarius:5000/api/v1/aquarius/assets/ddo/{did}",
      "type": "metadata",
      "attributes": {
        "main": {
          "type": "dataset",
          "name": "Madrid Weather forecast",
          "dateCreated": "2019-05-16T12:36:14.535Z",
          "author": "Norwegian Meteorological Institute",
          "license": "Public Domain",
          "files": [
            {
              "contentLength": "0",
              "contentType": "text/xml",
              "compression": "none",
              "index": 0
            }
          ],
          "datePublished": "2019-05-16T12:41:01Z"
        },
        "encryptedFiles": "0x7a0d1c66ae861…df43aa9",
        "additionalInformation": {
          "description": "Weather forecast of Europe/Madrid in XML format",
          "copyrightHolder": "Norwegian Meteorological Institute",
          "categories": ["Other"],
          "links": [],
          "tags": [],
          "updateFrequency": null,
          "structuredMarkup": []
        },
        "status": {
          "isListed": true,
          "isRetired": false,
          "isOrderDisabled": false
        }
      }
    }
  ]
}

Fields when attributes.main.type = algorithm

An asset of type algorithm has the following additional attributes under main.algorithm:

AttributeTypeRequiredDescription
containerObjectYesObject describing the Docker container image.
languagestringNoLanguage used to implement the software
formatstringNoPackaging format of the software.
versionstringNoVersion of the software.

The container object has the following attributes:

AttributeTypeRequiredDescription
entrypointstringYesThe command to execute, or script to run inside the Docker image.
imagestringYesName of the Docker image.
tagstringYesTag of the Docker image.
checksumstringYesChecksum of the Docker image.
{
  "index": 0,
  "serviceEndpoint": "http://localhost:5000/api/v1/aquarius/assets/ddo/{did}",
  "type": "metadata",
  "attributes": {
    "main": {
      "author": "John Doe",
      "dateCreated": "2019-02-08T08:13:49Z",
      "license": "CC-BY",
      "name": "My super algorithm",
      "type": "algorithm",
      "algorithm": {
        "language": "scala",
        "format": "docker-image",
        "version": "0.1",
        "container": {
          "entrypoint": "node $ALGO",
          "image": "node",
          "tag": "10",
          "checksum": "efb2c764274b745f5fc37f97c6b0e761"
        }
      },
      "files": [
        {
          "name": "build_model",
          "url": "https://raw.gith ubusercontent.com/oceanprotocol/test-algorithm/master/javascript/algo.js",
          "index": 0,
          "checksum": "efb2c764274b745f5fc37f97c6b0e761",
          "contentLength": "4535431",
          "contentType": "text/plain",
          "encoding": "UTF-8",
          "compression": "zip"
        }
      ]
    },
    "additionalInformation": {
      "description": "Workflow to aggregate weather information",
      "tags": ["weather", "uk", "2011", "workflow", "aggregation"],
      "copyrightHolder": "John Doe"
    }
  }
}

Fields when attributes.main.type = compute

An asset with a service of type compute has the following additional attributes under main.privacy:

AttributeTypeRequiredDescription
allowRawAlgorithmbooleanYesIf True, a drag & drop algo can be runned
allowNetworkAccessbooleanYesIf True, the algo job will have network access (stil WIP)
publisherTrustedAlgorithmsArray of ObjectsYesIf Empty , then any published algo is allowed. (see below)

The publisherTrustedAlgorithms is an array of objects with the following structure:

AttributeTypeRequiredDescription
didstringYesThe did of the algo which is trusted by the publisher.
filesChecksumstringYesHash of ( algorithm’s encryptedFiles + files section (as string) )
containerSectionChecksumstringYesHash of the algorithm container section (as string)

To produce filesChecksum:

sha256(
  algorithm_ddo.service['metadata'].attributes.encryptedFiles +
    JSON.Stringify(algorithm_ddo.service['metadata'].attributes.main.files)
)

To produce containerSectionChecksum:

sha256(
  JSON.Stringify(
    algorithm_ddo.service['metadata'].attributes.main.algorithm.container
  )
)

Example of a compute service

{
  "type": "compute",
  "index": 1,
  "serviceEndpoint": "https://provider.oceanprotocol.com",
  "attributes": {
    "main": {
      "name": "dataAssetComputingService",
      "creator": "0xA32C84D2B44C041F3a56afC07a33f8AC5BF1A071",
      "datePublished": "2021-02-17T06:31:33Z",
      "cost": "1",
      "timeout": 3600,
      "privacy": {
        "allowRawAlgorithm": true,
        "allowNetworkAccess": false,
        "publisherTrustedAlgorithms": [
          {
            "did": "0xxxxx",
            "filesChecksum": "1234",
            "containerSectionChecksum": "7676"
          },
          {
            "did": "0xxxxx",
            "filesChecksum": "1232334",
            "containerSectionChecksum": "98787"
          }
        ]
      }
    }
  }
}