---
jupytext:
  formats: md:myst
  text_representation:
    extension: .md
    format_name: myst
kernelspec:
  display_name: Python 3
  language: python
  name: python3
execution:
  allow_errors: true
---


# OIH SPARQL

## About

This page will hold some information about the SPARQL queries we use and 
how they connect with some of the profile guidance in this document.    We will
show how this relates to and depends on the Gleaner prov as well as the 
Authoritative Reference elements of the patterns.  It is expected that the Gleaner 
prov will be present, though this can be made optional in case other 
indexing systems are used that do not provide this prov shape.    The SPARQL will 
be looking for both Gleaner prov and the Authroitative Reference elements. 

This will be different for different patterns.  For example, it might 
relate to the publisher provider elements for Creativeworks, but to 
the identity element for People and Organizations. 


```{literalinclude} ./graphs/basic.rq
:linenos:
:emphasize-lines: 12-14, 18-21, 23-28, 30-32
```

## Lines 12-14

It should be noted that the above SPARQL is not standards compliant.  It leverages some 
vender specific syntax that is not part of the SPARQL standard.  This is not uncommon 
as groups will often add their own syntax to offer additional functionality.

A common one is what is seen here where a full text index is leveraged to allow for more complex
and faster searches than can be done with FILTER regex.  These three lines will
only work in the current OIH triplestore (Blazegraph).   Other triplestores like Jena
and other do similar built in function extensions.  

## Lines 18-21

These line demonstrate the use of the OPTIONAL keyword.  These triples are not required
to be present in a resource.  If they are, they will be returned.  

## Lines 23-28

These lines are standard SPARQL but are searching across triples not from the provider 
graphs.  Rather, they are looking at triples generated by the OIH indexing program
used, Gleaner.   

Note, that Gleaner is not a dependency of this project and other 
indexing approaches and software could be used.  As pointed out in the documentation, 
this approach is based on structured data on the web and web architecture approaches.
So, any indexing system following those approaches can be used.

These triples are used to track the indexing event and the sources indexed.  It provides
some additional provenance to the information collected, but does not change or even 
extend what the providers are publishing.  

As such, these statements could be removed and all that would be lost of indexing
activity information.  

## Lines 30-32

These lines represent three specific SPARQL parameters.  

First is the ORDER BY directive.  This is used to order the results by one of the 
returned variables.  In this case we are using the ?score variable which comes from
the vendor specific syntax noted in lines 11-13.  This score is the ranking score
for a resource search against the full text index.  However, this could be any 
variable coming from standards compliant SPARQL calls too.   Sorting can be done 
on alphanumeric values in ascending (ASC) or descending (DESC) order. 

The  LIMIT is used to limit the number of results returned.  We follow this with, 
OFFSET which is used to skip the first n results.  These two are useful for pagination when 
combined with the ORDER BY directive.