Datasets#

About#

Datasets

See also

For OIH the focus is on generic documents which can scope reports, data and other resources. In those cases where the resources being described are of type Dataset you may wish to review patterns developed for GeoScience Datasets by the ESIP Science on Schema community.

Datasets#

Documents will include maps, reports, guidance and other creative works. Due to this OIH will focus on a generic example of schema.org/CreativeWork and then provide examples for more focused creative work examples.

 1{
 2    "@context": {
 3        "@vocab": "https://schema.org/"
 4    },
 5    "@type": "Dataset",
 6    "@id": "https://example.org/permanentUrlToThisJsonDoc",
 7    "name": "A concise but descriptive name of the dataset",
 8    "description": "An extended, free-text description of what's in the dataset, who created it, and other attributes",
 9    "url": "https://urlToTheDatasetOrLandingPage.org/",
10    "sameAs": [
11        "http://alternativeUrlToTheDatasetOrLandingPage.org"
12    ],
13    "license": "This work is licensed under a  Creative Commons Attribution (CC-BY) 4.0 License",
14    "citation": [
15        "Citation to other work relevant to this dataset",
16        "Citation to other work relevant to this dataset",
17        "Citation to other work relevant to this dataset"
18    ],
19    "version": "2021-04-24T06:34:56.000Z",
20    "keywords": [
21        "Keyword 1",
22        "Keyword 2",
23        "Keyword 3"
24    ],
25    "measurementTechnique": "The URL to or text about the methods, technique or technology used to generate this Dataset",
26    "variableMeasured": [
27        {
28            "@type": "PropertyValue",
29            "name": "Name of a variable in the dataset",
30            "description": "Extended description of this variable"
31        },
32        {
33            "@type": "PropertyValue",
34            "name": "Name of a variable in the dataset",
35            "url": "http://ontology.org/uriToSemanticDescriptorOfThisVariable",
36            "description": "Extended description of this variable?"
37        },
38        {
39            "@type": "PropertyValue",
40            "name": "SamplingDeviceApertureSurfaceArea",
41            "url": "http://ontology.org/uriToSemanticDescriptorOfThisVariable",
42            "description": "Extended description of this variable"
43        }
44    ],
45    "includedInDataCatalog": {
46        "@id": "https://registryOfCatalogs.org/permanentUrlIdentifiyingCatalog",
47        "@type": "DataCatalog",
48        "url": "https://urlOfDataCatalog.org"
49    },
50    "temporalCoverage": "2007/2007",
51    "distribution": {
52        "@type": "DataDownload",
53        "contentUrl": "http://urlToDirectDownloadOfThisDataset.org/",
54        "encodingFormat": "text/csv"
55    },
56    "spatialCoverage": {
57        "@type": "Place",
58        "geo": {
59            "@type": "GeoShape",
60            "description": "schema.org expects lat long (Y X) coordinate order",
61            "polygon": "10.161667 142.014,18.033833 142.014,18.033833 147.997833,10.161667 147.997833,10.161667 142.014"
62        },
63        "additionalProperty": {
64            "@type": "PropertyValue",
65            "propertyID": "https://dbpedia.org/page/Spatial_reference_system",
66            "value": "https://www.w3.org/2003/01/geo/wgs84_pos"
67        }
68    },
69    "provider": [
70        {
71            "@type": "Organization",
72            "legalName": "Legal Name of Organisation which generated the dataset",
73            "name": "Other Name of Organisation which generated the dataset",
74            "url": "https://organisationWebsite.org/"
75        }
76    ],
77    "subjectOf": {
78        "@type": "Event",
79        "description": "Describe the event which is the subject of this dataset. For example, a cruise ID.",
80        "name": "Concise and descriptive name of the Event",
81        "potentialAction": {
82            "@type": "Action",
83            "name": "Concise but descriptive name of action that was part of an Event. For example, the name of a CTD cast",
84            "agent": [
85                "Name or permanent ID of person or thing that performed this action",
86                "Name or permanent ID of person or thing that performed this action",
87                "Name or permanent ID of person or thing that performed this action"
88            ],
89            "startTime": "2007-03-11T14:45UTC",
90            "endTime": "2007-03-11T15:42UTC",
91            "instrument": {
92                "@type": "Thing",
93                "name": "The name of the instrument used in the action. For example, the specific model of a CTD, a glider, a moored sensor",
94                "url": "http://ontology.org/uriToSemanticDescriptorOfThisInstrument",
95                "description": "Extended description of the sampling instrument"
96            }
97        }
98    }
99}

Tip

@id around line#6 should point to whatever resolves eventually to the JSON-LD - if you only have an external JSON-LD file (and not embedded into the html <script> tag) then the @id should point to the .json file itself. Otherwise, @id should point to the landing page of the record (HTML page), that embeds the JSON-LD.

Note

schema.org expects a lat long (Y X) coordinate order, so be aware of that when you are defining your spatialCoverage, in the GeoShape polygon or box parameters.

Using a bounding box for your spatialCoverage is recommended, as it is easy to query & display downstream, such as:

 1    "spatialCoverage": {
 2        "@type": "Place",
 3        "geo": {
 4            "@type": "GeoShape",
 5            "description": "schema.org expects lat long (Y X) coordinate order.  Box syntax is: miny minx maxy maxx",
 6            "box": "-90 -180 90 -180"
 7        },
 8        "additionalProperty": {
 9            "@type": "PropertyValue",
10            "propertyID": "https://dbpedia.org/page/Spatial_reference_system",
11            "value": "https://www.w3.org/2003/01/geo/wgs84_pos"
12        }
13    },

Demo area please ignore#

This area is being used to test out a new repository structure where the data graphs, frames and SHACL shapes are kept in a discrete location.

Hide code cell source
import warnings
with warnings.catch_warnings():
    warnings.filterwarnings("ignore",category=DeprecationWarning)
    
import json
from pyld import jsonld
import os, sys
import urllib
import contextlib

devnull = open(os.devnull, 'w')
contextlib.redirect_stderr(devnull)

currentdir = os.path.dirname(os.path.abspath(''))
parentdir = os.path.dirname(currentdir)
sys.path.insert(0, parentdir)
from lib import jbutils

 
url = "https://raw.githubusercontent.com/iodepo/odis-in/master/dataGraphs/thematics/docs/graphs/map.json"
dgraph = urllib.request.urlopen(url)
doc = json.load(dgraph)

furl = "https://raw.githubusercontent.com/iodepo/odis-in/master/frames/mapFrameID.json"
fgraph = urllib.request.urlopen(furl)
frame = json.load(fgraph)


context = {
    "@vocab": "https://schema.org/",
}

compacted = jsonld.compact(doc, context)

framed = jsonld.frame(compacted, frame)
jd = json.dumps(framed, indent=4)
print(jd)

jbutils.show_graph(framed)
{
    "@context": {
        "@vocab": "https://schema.org/"
    },
    "@id": "https://example.org/permanentUrlToThisJsonDoc",
    "@type": "Map",
    "identifier": {
        "@id": "https://doi.org/10.5066/F7VX0DMQ",
        "@type": "PropertyValue",
        "propertyID": "https://registry.identifiers.org/registry/doi",
        "url": "https://doi.org/10.5066/F7VX0DMQ",
        "value": "doi:10.5066/F7VX0DMQ"
    }
}
../../_images/5ab754c70b8552de1b9cd8d1db146d15b39574e7d90c6b50382a990efc6eff49.svg