Skip to main content

Accepted DCAT and data.json fields

When using or data.json, it’s common to use additional fields from other vocabularies. This guidance explains the fields data.gov.uk will accept.

data.json field DCAT predicate Example value Comments
title dct:title
rdfs:label
Spend over £500 Mandatory
description dct:description
rdfs:comment
Spend transactions published monthly according to the Treasury transparency guidelines. Mandatory
identifier (equivalent to RDF object’s URI or) dct:identifier https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/timeseries/lf24/lms or http://dx.doi.org/10.7927/H4PZ56R2 Mandatory. A string that identifies the dataset now and in the future, ideally even if the dataset’s title changes. If the dataset is in a data catalogue, then supply the URL of the dataset page, or some other catalogue identifier. Or if it is not catalogued already you could assign it a random hexademical string of say 24 digits. Ideally the identifier should be globally unique - not just unique to the organisation - so a URI is highly recommended.
license dct:license http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/ or Open Government Licence Mandatory. Either a license URI or a title. If a title, it must be exact or may not be recognized as open.
keyword dcat:keyword data.json:
"keyword": {"geochemistry", "geology"}
DCAT: <dcat:keyword>geochemistry</dcat:keyword> <dcat:keyword>geology</dcat:keyword>
Not displayed by data.gov.uk but helps theme categorization
issued dct:issued 2012-05 Date the dataset was first published. Formats allowed are: ‘YYYY-MM-DD’, ‘YYYY-MM’, ‘YYYY’ or ‘YYYY-MM-DDTHH:MM:SS.mmmmmm’ (according to ISO8601)
modified dct:modified 2012-06-10 Date the dataset was most recently updated. Formats allowed are: ‘YYYY-MM-DD’, ‘YYYY-MM’, ‘YYYY’ or ‘YYYY-MM-DDTHH:MM:SS.mmmmmm’ (according to ISO8601)
publisher dct:publisher data.json:
"publisher": {"name": "Geological Society", "mbox": "info@gs.org"}
DCAT:
<dct:publisher>
<foaf:Organization>
<foaf:name>Geological Society</foaf:name>
<foaf:mbox>info@gs.org </foaf:mbox>
</foaf:Organization>
</dct:publisher>
Mandatory. However the harvester will attach the dataset to the publisher selected in the harvest source.
distribution dcat:distribution A location for accessing/obtaining the data. See “Distribution” table below
landingPage dcat:landingPage https://www.gov.uk/government/statistical-data-sets/oil-and-petroleum-products-weekly-statistics URL of a web page specifically about this dataset and probably has direct links to its resources and further information.
references foaf:Document ["http://data.dft.gov.uk/road-accidents-safety-data/Brief-guide-to%20road-accidents-and-safety-data.doc"]
OR in data.json you can specify more fields like a distribution, for example
[{"url": "https://indicators.ic.nhs.uk/framework.pdf", "title": "Framework specification", "format": "application/pdf"}]
URL of a web page, PDF or other documentation that gives more information about the dataset. (Although use landingPage instead for a URL if that is more appropriate.) Should be an array, to allow multiple references to be specified.
language dct:language ["en"] or ["http://id.loc.gov/vocabulary/iso639-1/en"] Language of the data. Should be an array of values ["en", ...].
frequency dct:accrualPeriodicity R/P1Y (=annual)
R/P1W (=weekly)
The frequency at which dataset is published. Format: ISO 8601 Repeating Duration (or irregular) See: https://project-open-data.cio.gov/iso8601_guidance/#accrualperiodicity
temporal dct:temporal 2000-01-15/2000-01-20 2010-01/2010-03 2010/2010 The date period that the dataset applies to. Formatted as two ISO 8601 dates (or datetimes) separated by a slash. If the period in question is a whole year or whole month, just put the same value for start and finish - for example 2010/2010 or 2010-06/2010-06.
spatial dct:spatial {\"type\":\"Polygon\",\"coordinates\":[[[2.072, 49.943],[2.072, 55.816], [-6.236, 55.816], [-6.236, 49.943], [2.072, 49.943]]]} The geographic location that the dataset applies to. If not specified, then it is inherited from the dataset’s publisher. Formatted as a GeoJSON point, bounding box or polygon.
theme dcat:theme http://eurovoc.europa.eu/209416 - Police - Eurovoc
COFOG/03.1.0 Police services - COFOG
http://id.esd.org.uk/function/20 - Police services - ESD
Main thematic category of the dataset. Preferably expressed as a URI from a known vocabulary:
Eurovoc
COFOG
ESD Service/Function
More than one can be specified using a [] list.
NB This field is currently ignored by data.gov.uk - use other fields as described in Themes

Local authorities should ensure they add an ESD service or function URI as the dataset’s theme, to ensure good classification.

Distribution

You should use the following fields for datasets not documentation.

data.json field DCAT predicate Example value Comments
downloadURL dcat:downloadURL http://site.gov.uk/river-levels/dec2012.csv The direct URL that downloads a file with the data
accessURL dcat:accessURL http://www.site.gov.uk/api/sparql
http://site.gov.uk/river-level-data.html
If there is not a downloadURL, specify the accessURL, which is the URL of an API or other non-downloadable data. Do not use documentation or other web pages as distributions.
title dct:title Spend transactions, Dec 2018
description dct:description Not currently displayed on data.gov.uk
format dcat:mediaType text/csv Currently recognised mime-types are listed here: DGU formats
conformsTo dct:conformsTo http://schemas.opendata.esd.org.uk/publictoilets/PublicToilets.json?v=0.41 URL of the machine-readable schema that the dataset conforms to. Refer to the Local Authority data schemas](local_authority_data_schemas).
temporal dct:temporal 2000-01-15/2000-01-20 2010-01/2010-03 2010/2010 The date period that the dataset applies to. Formatted as 2 ISO 8601 dates (or datetimes) separated by a slash. If the period in question is a whole year or whole month, put the same value for start and finish, for example 2010/2010 or 2010-06/2010-06. Data.gov.uk only stores the first date of the pair in the “date” field, and if the 2 dates are different, it adds the range into the “name” field.
spatial dct:spatial {\"type\":\"Polygon\",\"coordinates\":[[[2.072, 49.943],[2.072, 55.816], [-6.236, 55.816], [-6.236, 49.943], [2.072, 49.943]]]} The geographic location that the dataset applies to. If not specified, then it is inherited from the dataset if not its organisation. Formatted as a GeoJSON point, bounding box or polygon.
identifier (equivalent to RDF object’s URI or) dct:identifier https://data.some.org/catalog/datasets/9df8df51-63db-37a8-e044-0003ba9b0d98/resource/4252f71c-4f2f-4e37-ab4e-b58b4ac255bb Optional. Use it to help CKAN keep track of a distribution that has changing properties (for example, a new URL or format), or for DCAT compliance. Must be globally unique - not just unique to the organisation. A URI is highly recommended (ideally one that returns a web page about the distribution).

data.json file structure

The data.json file should be structured as an array [ ... ] of dataset objects { ... }.

For example:

[
  {"title": "Live traffic information from the Highways Agency",
  "license": "No license provided",
  "publisher": {"name": "Highways Agency", "mbox": "test@test.com"},
  "distribution": [
    {
      "downloadURL": "https://s3-eu-west-1.amazonaws.com/lmtesting2810/HATRIS_15MinuteMIDAS_YYYY-MM-DD_0.csv.zip",
      "title": "Hatris 15 Min Midas 0",
      "format": "application/zip"
    }]
  },
  {"title": "Roadworks locations",
    ...
  }
]

Alternatively, you can put this information inside a catalogue object under the key datasets. Although data.gov.uk does not harvest the catalogue data, allowing this structure gives reasonable compatibility with Project Open Data.

Character encoding

The data.json file should have ASCII or UTF-8 character encoding (as per the JSON standard). The harvester now detects this and returns an error if the file is not acceptable.

Namespaces

Format 1 Namespace
dcat http://www.w3.org/ns/dcat#
dct http://purl.org/dc/terms/
rdfs http://www.w3.org/2000/01/rdf-schema#