Skip to content

Ontology Stack — Runtime & Tooling

This page is the runtime/tooling companion to Ontology Foundations. Where the foundations page covers the theory (BFO, UFO, gUFO, OntoUML, Common Core Ontologies), this one covers the boxes and arrows: which triple store actually runs, which reasoner answers the query, which rule engine fires in the browser, which validator gates the ingestion path. Every choice is grounded in an ADR or a shipped image manifest in this repo — when a claim is load-bearing, the link goes to the source, not a marketing page.

The headline: we run Apache Jena Fuseki as the SPARQL endpoint, Oxigraph embedded inside sandboxed runners, N3.js in the browser/agent for edge inference, and SHACL as the runtime contract on every ingestion path. The visualization layer is React Flow (ARC-ADR-040). The self-model uses this exact stack as a living instance (ARC-ADR-072).


1. Stack at a glance

Layer Our choice Alternative considered Why ours won
Triple store (server) Apache Jena Fuseki 5.x (TDB2) Blazegraph, GraphDB, Stardog, Virtuoso ASF-governed, OWL-2 + SPARQL 1.1 + SHACL out of the box, container-friendly, sieve manifest contract (template)
Triple store (embedded) Oxigraph (Rust, pyoxigraph / oxigraph crate) rdflib pure-Python, RDF4J embedded Rust → tiny binary, ships inside the runner image (PR #522), SPARQL 1.1 Query + Update + GSP, no JVM in the sandbox
SPARQL endpoint Fuseki /knowledge dataset + Oxigraph /sparql in runner embedded-only, no HTTP Federation + multi-tenant via Fuseki; per-sandbox isolated query via Oxigraph
OWL reasoner ELK for EL profile, HermiT/Openllet for full DL, OWL-RL as a streaming fallback RacerPro, FaCT++ ELK is ~100× faster on the OBO-style EL ontologies we author; HermiT for tableau-complete DL; OWL-RL gives forward-chaining over arbitrary triples
Rules engine (server) Jena Rules (GenericRuleReasoner) + SHACL-AF rules Drools, Stardog ICV In-process with Fuseki, no extra service; SHACL-AF unifies shape + rule under one W3C contract
Rules engine (edge) N3.js + eyereasoner Send every inference back to Fuseki Browser/agent can fire production rules locally (ADR-040, ADR-072) — disambiguator (ADR-046) avoids the round-trip
SHACL validator Apache Jena SHACL (server) + shacl-engine / rdf-validate-shacl (JS) TopBraid SHACL API (commercial) Pure ASF Java on the server matches the Fuseki stack; rdf-validate-shacl mirrors it in the agent
Shape / schema authoring Hand-authored Turtle + the F# ontology compiler (ARC-ADR-033) TopBraid Composer (GUI), Stardog Studio We treat shapes as source; the F# compiler lets us prove projections are functors and generators are catamorphisms
Ontology IDE Protégé 5.6 (occasional), VS Code + ttl/oxigraph extensions (daily) TopBraid Composer, WebProtégé Protégé is the OBO de-facto standard but desktop-only; daily editing is in VS Code beside the F# compiler
ETL / ingestion to ontology dlt sources → R2RML / RML mappings → SHACL gate → Fuseki Custom Python, Kettle, Airbyte dlt gives schema-on-read source connectors; RML/R2RML are the W3C mapping standards; SHACL is the runtime contract (ARC-ADR-030)
Visualization React Flow + dagre/elk.js layout Cytoscape.js, vis.js, D3 force ARC-ADR-040 — typed nodes/edges, headless layout, React-native

The rest of this page walks each row.


2. Apache Jena Fuseki — why a real SPARQL endpoint

Apache Jena is the ASF-stewarded Java RDF stack. Fuseki is its HTTP server: a SPARQL 1.1 Query + Update + Graph Store Protocol endpoint backed by TDB2 (the on-disk triple store) or in-memory datasets.

We could have stayed embedded (Jena library only, or Oxigraph everywhere). We chose to run Fuseki as a first-class platform-tier container (ARC-ADR-023) for three reasons.

2.1 Federation and multi-graph reality

A SPARQL endpoint is a protocol surface, not just a library. Once Fuseki is up, any agent in the fleet — Claude Code, Copilot, GitHub Actions, a self-hosted runner, an Azure Container App — can hit POST /knowledge/sparql with a query body and get RDF back. That single fact dissolves a class of integration glue we'd otherwise hand-write per consumer.

SERVICE keyword federation also becomes free: a query in the self-model dataset can pull context from a separate Fuseki dataset (or any other SPARQL 1.1 endpoint — DBpedia, an OBO mirror, a Wikidata SPARQL endpoint) via SERVICE <https://other/sparql>. The cost of federation collapses to "do you have the URL?".

Why not GraphDB or Stardog?

Both are excellent commercial-leaning OWL stores. GraphDB has the best built-in reasoner ergonomics; Stardog has the strongest ICV story. We avoided them because AgentArmy is a template repo that fleets fork — pinning a closed-source license into the platform tier would make every fork inherit that constraint. ASF licensing keeps the door open.

2.2 The sieve manifest contract

Fuseki ships in this repo as templates/fuseki-ontology-image/ — a deliverable container image with an image.json manifest in the Image Standard. The manifest declares three doctor checks the image must pass to be considered healthy:

  • sieve-accepts-conformant — load a known-good Turtle file via the Graph Store Protocol (PUT /knowledge/data?graph=…) and confirm the SHACL validator returns sh:conforms true.
  • sieve-rejects-violating — load a known-bad Turtle file and confirm the validator returns sh:conforms false with the expected sh:resultPath.
  • construct-emits — fire a known CONSTRUCT query and confirm the inferred triples appear in the result set.

Those three checks are the runtime contract for "this Fuseki is actually a sieve, not just a triple bucket." agentarmy-doctor image fuseki-ontology runs them on every CI build and every local rebuild. The image is not published until all three are green.

2.3 CORS, GSP, and the /knowledge vs admin split

Fuseki's default shiro.ini locks down the /$/ admin endpoints behind basic auth — that's why direct PUT /$/datasets/... returns 403 without credentials. The /knowledge data endpoints are deliberately open inside the fleet's private network so any agent can call GSP (GET/PUT/POST/DELETE /knowledge/data) and SPARQL (POST /knowledge/sparql, POST /knowledge/update) without juggling tokens.

CORS is configured in fuseki-config.ttl to allow Origin: * on the data endpoints only; the admin endpoints carry the default same-origin policy. That posture matches our threat model — fleet repos are private and trusted (memory: threat-model-no-forks), so we'd rather pay for ergonomic edge-agent inference than per-call token plumbing.

The Graph Store Protocol is the under-appreciated part. GSP says: a named graph is a resource at a URL, and you load it with HTTP verbs. That maps cleanly to our dlt → SHACL → Fuseki pipeline: each dlt source becomes a named graph, ingestion is PUT /knowledge/data?graph=urn:source:foo, and the SHACL validator runs on the graph before it's committed to TDB2.


3. Oxigraph — embedded RDF in the sandbox

Oxigraph is an RDF store written in Rust, with a SPARQL 1.1 Query + Update + GSP implementation, Python bindings (pyoxigraph), and a standalone oxigraph_server binary. It is the embedded counterpart to Fuseki.

Per PR #522 ("embed oxigraph_server inside runner image & add guest agent SPARQL query endpoint"), every sandboxed runner ships with oxigraph_server already listening on 127.0.0.1:7878. The guest agent has its own private SPARQL endpoint that nobody outside the sandbox can reach.

When to use Oxigraph vs Fuseki

Use Oxigraph when Use Fuseki when
The triples are sandbox-private (per-run scratch state) The triples are fleet-shared (self-model, contracts registry)
You want zero JVM in the runner image You need TDB2 durability and the admin UI
The query is over a small graph (< 10M triples) You need federation across datasets, or SERVICE
You want the agent to mutate the store and discard it on exit You need the store to survive container restarts
You're prototyping a rule set locally You're publishing a shared ontology to consumers

The two stores are not redundant — they sit at different sandbox boundaries. The self-model is in Fuseki because every agent needs to query it. A scratch graph that a disambiguator builds for one inference run is in Oxigraph because nobody else needs it and the agent can throw it away.

Both speak SPARQL 1.1, so the query syntax is identical. The runtime decides which endpoint to hit based on whether the query is over shared state or sandbox state.

Why not skip Fuseki entirely and run Oxigraph server everywhere?

We considered it. The deal-breaker is the Apache Jena ecosystem: SHACL-AF rules, the Jena Rules language, the text:query full-text extension, and the geosparql: spatial extension are all Java-only and live inside Fuseki's process. Oxigraph's SHACL support is improving but isn't at parity yet. So: Oxigraph for the sandbox, Fuseki for the shared knowledge layer.


4. N3.js — Notation3 rules at the edge

N3.js is the RDF.js Notation3 parser/serializer, and eyereasoner is the JavaScript port of the EYE reasoner — together they let us run Notation3 rules inside the browser, inside a Node agent, inside any V8-hosted runtime.

This matters because of two ADRs:

  • ARC-ADR-046 — the disambiguator streaming service fires rules as tokens arrive. Round-tripping every candidate to a server SPARQL endpoint would add 50–200ms per token. N3.js + eye-js evaluate the rule set in-process; the latency budget stays in the single-digit milliseconds.
  • ARC-ADR-072 — the self-model viewer in ontology/platform-self-model/viz/ runs a small N3 rule set client-side to derive secondary facts (e.g., "a surface that exposes a capability that is realized-by a component running in tier X is itself a tier-X-adjacent surface"). Pushing that derivation to the browser lets the viewer render the right colours without a server roundtrip.

N3 is a superset of Turtle that adds rules ({ ?s :p ?o } => { ?s :q ?o }.), graph literals (treating a graph as a term), and quantification. It's the rule language EYE was built for; eyereasoner is the production-grade implementation.

Why N3 over SPARQL CONSTRUCT for edge inference

SPARQL CONSTRUCT queries can also derive new triples. The reason we reach for N3 at the edge is iteration: N3 rules naturally fire to a fixed point (forward-chain until no new triples are produced). To get the same behaviour from SPARQL you have to wrap CONSTRUCT in an external loop and detect quiescence yourself. N3 reasoners do it for you.

That said: server-side, we use CONSTRUCT for one-shot derivations and Jena Rules for the fixed-point work, because they live in the same JVM as Fuseki and have direct access to TDB2 indexes. The choice is "edge: N3, server: Jena Rules / SHACL-AF" — same paradigm, two runtimes.


5. Reasoners considered

OWL reasoning is a spectrum. Picking one reasoner for everything is the wrong shape — ontologies span OWL 2 profiles (EL, QL, RL) and full DL, and each profile has a class of reasoner that's dramatically faster than the general-purpose ones.

5.1 ELK — the EL profile workhorse

ELK is a Java reasoner for the OWL 2 EL profile: subsumption, classification, and instance checking only, but in polynomial time and with brutally parallelizable algorithms. The Gene Ontology, SNOMED CT, and most OBO Foundry ontologies are deliberately authored within EL so ELK can classify them in seconds.

We use ELK for: the BFO/CCO interop projection (ADR linked in ontology-foundations), any OBO-style ontology we ingest, and the self-model's structural classification. The trade-off is that ELK can't reason about owl:disjointWith chains, complex role hierarchies, or anything outside EL.

5.2 HermiT and Openllet — full OWL 2 DL

HermiT (Glimm et al., 2014) is a hypertableau reasoner for OWL 2 DL — complete on the full DL profile. We use it when an ontology uses cardinality restrictions, complex class expressions, or property chains that ELK can't handle.

Openllet is the actively maintained fork of Pellet (originally from Clark & Parsia). Same OWL 2 DL coverage as HermiT, different algorithm (tableau with optimizations). We keep both in the runner image because one occasionally outperforms the other on specific ontology shapes; the runtime picks per workload.

Why not RacerPro or FaCT++?

RacerPro is commercial and not freely redistributable in our images. FaCT++ is C++, excellent on the right shape, but its JNI integration with Jena has bit-rotted; we'd rather pay the small perf gap to stay on a pure-JVM reasoner.

5.3 OWL-RL — rule-based fallback

The OWL 2 RL profile is designed to be implementable by a rule engine over RDF triples — forward-chaining production rules can materialize all OWL-RL entailments. We use Jena's GenericRuleReasoner configured with the OWL-RL rule set as a streaming fallback on the ingestion path: triples come in via dlt, OWL-RL closure is computed forward, SHACL validates the result, and the closed graph lands in Fuseki.

This is the cheapest OWL inference we run, and the only one that works incrementally over a firehose. The trade-off is that OWL-RL is a strict subset of DL; an ontology that uses features outside RL (e.g., owl:someValuesFrom in non-subclass position) gets incomplete inference. We document per-ontology which profile it targets and which reasoner it expects.

5.4 When we run which

Workload Reasoner Why
OBO Foundry ingestion (Gene Ontology, ChEBI, etc.) ELK EL profile, scales to millions of classes
BFO/CCO interop projection ELK EL classification of the upper ontology
Self-model classification ELK + N3.js (edge derivation) EL on the server, N3 in the viewer
Ad-hoc DL question ("does this ontology entail X?") HermiT or Openllet Full DL completeness
dlt firehose ingestion OWL-RL (Jena Rules) Streaming, incremental
Disambiguator streaming N3.js + eye-js Sub-ms per token, browser/agent

6. SHACL — the runtime validation contract

SHACL (Shapes Constraint Language, W3C Rec 2017) is the contract language for RDF. A SHACL shape says "a node of type :Person must have exactly one :birthDate of datatype xsd:date"; a SHACL validator answers "does this graph conform?" with a precise report (sh:conforms, sh:result, sh:resultPath, sh:value).

We use SHACL three ways:

  1. Ingestion gate — every dlt source has a SHACL shapes graph. The ingestion pipeline loads source data into a candidate named graph, runs SHACL Core + SHACL-AF, and only promotes the graph to the shared Fuseki dataset if sh:conforms true. (ARC-ADR-030 is the anchor.)
  2. Contract registration — when a new ontology is registered in docs/contracts.md, its SHACL shapes graph is part of the contract. Consumers can fetch the shapes and validate locally before sending data.
  3. SHACL-AF rulesSHACL Advanced Features adds sh:rule (triple/SPARQL rules) and custom targets. We use SHACL-AF for derivations that are contractual — "if a node has property X, the validator MUST add property Y before reporting conformance." This collapses the shape-vs-rule split into a single W3C spec, which is exactly what we want for a contract surface.

Why SHACL over ShEx

ShEx is the other shape language; Wikidata uses it heavily. We picked SHACL because:

  • It has first-class W3C Recommendation status; ShEx has a Community Group spec only.
  • SHACL-AF unifies shape and rule under one runtime; ShEx + rules requires a separate engine.
  • Jena ships a mature SHACL implementation; the Jena ShEx implementation is community- maintained and lags.

We still read ShEx

OBO ontologies occasionally publish ShEx schemas alongside SHACL. We convert at ingestion time using shexer or hand-port the constraints. There's no philosophical objection — we just don't want two validators in the runtime hot path.


7. Authoring tooling

The boring secret of ontology engineering is that editing is in a text editor. The fancy GUI tools matter at specific moments.

7.1 Protégé — the canonical desktop editor

Protégé (Stanford BMIR) is the OBO de-facto standard for authoring OWL ontologies. We use it occasionally for:

  • Visualizing class hierarchies of someone else's ontology before deciding to ingest it.
  • Authoring complex class expressions where the OWL Manchester syntax in a text editor is fiddly (e.g., (hasPart some Wheel) and (hasPart some Engine)).
  • Running HermiT interactively against an in-development ontology to catch DL errors.

We do not use Protégé as the canonical source format. The canonical format is Turtle in git, hand-authored or compiled.

7.2 ROBOT — the OBO release tool

ROBOT (Jackson et al., 2019) is the command-line tool the OBO Foundry uses for ontology release pipelines: extract, merge, reason, convert, validate, diff. It's a Java CLI with a stable interface.

We adopted ROBOT verbatim for any pipeline step that's already a ROBOT recipe — robot reason --reasoner ELK, robot convert --format ofn, robot diff. There's no point reimplementing it; the OBO community has hardened these commands over a decade.

7.3 TopBraid Composer — deliberately avoided

TopBraid Composer is the commercial SHACL/SPARQL/OWL IDE from TopQuadrant. It's excellent. We avoided it because:

  • License cost would have to propagate to every fork of this template.
  • Its SHACL engine has TopQuadrant-specific extensions that wouldn't validate against Jena's pure W3C implementation; ingestion would diverge between authoring and runtime.
  • The contract surface for AgentArmy is open W3C standards, not vendor extensions.

We still read TopBraid's SHACL examples (their docs are some of the best in the ecosystem). We just don't author against the IDE.

7.4 The F# ontology compiler

ARC-ADR-033 is the architectural decision to build an F# ontology compiler as the canonical authoring layer for the fleet's own ontologies (as distinct from third-party ingested ones).

The compiler takes a typed F# source — algebraic data types, computation expressions, category-theoretic combinators — and emits:

  • OWL 2 (Functional and Manchester syntax)
  • Turtle (for SHACL shapes and the runtime store)
  • gUFO-aligned OWL projection (for the UFO authoring discipline, ontology-foundations)
  • BFO/CCO-aligned OWL projection (for the realist interop discipline)
  • TypeScript type definitions (for the frontend-core consumer)
  • C# record types (for backend-core consumers)

This is the generator-first pattern memorized in the project memory: the source is one F# AST; everything else is a deterministic projection. Per category theory for the FP compiler, each projection is a functor, generators are catamorphisms over the AST, the gUFO↔BFO mapping is a natural transformation, and the sift validation loop is monadic.

The F# compiler lives next to the rest of the ontology runtime — it is not a build-time afterthought, it's a peer of Fuseki and Oxigraph. When you change a source ADT, the compiler re-emits OWL + SHACL + gUFO + BFO, the doctor checks rerun, and the projections either still validate or they don't.


8. Ingestion — RML, R2RML, and dlt

Most data in the world is not RDF. The ingestion path turns non-RDF source data into ontology-conformant triples, validates them with SHACL, and lands them in Fuseki.

8.1 dlt — source connectors

dlt (data load tool) is the Python library we use for source connectors. dlt gives us schema-on-read extraction from REST APIs, databases, S3, GCS, filesystem dumps, SaaS APIs (Stripe, HubSpot, Salesforce, etc.), with retry/backoff/state tracking included.

See docs/dlt-pipelines.md and docs/dlt-sources.md for the inventory of sources and pipelines we run. The dlt output is typed tabular data — DuckDB tables in the staging area — not RDF. The next step lifts it.

8.2 R2RML — relational to RDF

R2RML (W3C Rec 2012) is the standard mapping language for turning relational data into RDF. A rr:TriplesMap says "this SQL query produces rows; each row is a subject of type :Foo; column bar becomes object of :hasBar."

For sources that land in DuckDB (most of them), R2RML mappings via the Ontop engine are the path. Ontop also supports virtual R2RML — answering SPARQL queries by translating to SQL against the live source, no materialization needed. We use materialization for batch ingestion and virtual for ad-hoc federated queries.

8.3 RML — generalized mapping

RML (RML.io, currently a Community Group spec, W3C CG path) generalizes R2RML to non-relational sources: JSON, XML, CSV, MongoDB. An rml:LogicalSource points at a JSON file with a JSONPath expression; the rest of the mapping looks like R2RML.

For dlt sources that don't naturally project to a SQL view (deeply nested JSON, XML feeds), RML is the path. We run RML mappings through RMLMapper in the ingestion container.

8.4 The end-to-end shape

dlt source         R2RML / RML        SHACL gate         Fuseki GSP
─────────────  →   ──────────────  →  ──────────────  →  ──────────────
  JSON/SQL/CSV     Turtle triples      sh:conforms?       PUT /knowledge/
  → DuckDB          (named graph)      true → promote     data?graph=...
                                       false → quarantine

The SHACL gate is the runtime contract from §6. The Fuseki named graph is the GSP target from §2.3. The whole pipeline is the embodiment of ARC-ADR-030.


9. Storage shapes — pure RDF vs property graph vs hybrid

We run both an RDF triple store (Fuseki/Oxigraph) and a property graph (ArcadeDB). This is deliberate, and is the architectural shape memorialized in ARC-ADR-016 (reification + hyperedges) and ARC-ADR-055 (hybrid object query system).

9.1 What RDF gives us

  • Standards: OWL, SHACL, SPARQL, GSP, federation, PROV-O, RML/R2RML — none of which have full property-graph equivalents.
  • Logical semantics: an OWL reasoner can answer "does this graph entail X?" against a formal model theory.
  • Open-world assumption: missing data isn't absence; it's just absent. This is the right default for an ingestion pipeline that pulls from many sources.

9.2 What property graphs give us

  • Performance on path queries: "find all 4-hop neighbours via edges labelled :dependsOn" is a graph-native operation in ArcadeDB; in SPARQL it's a property path that the optimizer often gets wrong.
  • Edge properties: an edge can carry attributes (weight, timestamp, confidence) without reification. In RDF, every edge attribute requires either reification, RDF-star, or a named-graph dance.
  • Fast traversal indices: ArcadeDB's RID-based indexing makes traversal latency predictable; Fuseki's TDB2 is excellent for query but not built for traversal-heavy workloads.

9.3 Why hybridize

The fleet's data has both shapes. The contract surface — what an ontology says about a domain — wants RDF + OWL + SHACL. The operational surface — who depends on whom, what runs where, which agent is in which state — wants traversal and edge properties.

We hybridize via two patterns:

  1. Reification for canonical, property-graph for operational. The canonical representation in Fuseki uses standard reification or RDF-star for edge attributes; a denormalized property-graph view in ArcadeDB carries the same edges with native edge properties for query performance. The Fuseki copy is the source of truth; the ArcadeDB copy is a derived materialization.
  2. Hyperedges for n-ary relations. Per ADR-016, n-ary relations (a contract has a producer, a consumer, a version, a mock URL, a status) are reified as a single :Hyperedge node in RDF with role properties pointing to participants. The same hyperedge becomes an Edge collection in ArcadeDB with role properties as edge attributes.

The hybrid object query system adds a federation layer above both stores so a single object query can pull canonical facts from Fuseki and traversal results from ArcadeDB without the caller knowing which store served which fact.


10. Standards index

Every choice on this page is anchored in a W3C or de-facto standard. This is the index.

Standard Spec URL What we use it for
RDF 1.1 Concepts & Abstract Syntax w3.org/TR/rdf11-concepts The data model — triples, IRIs, literals, datatypes — for every graph in the stack
RDF 1.1 Turtle w3.org/TR/turtle The canonical authoring syntax for shapes, ontologies, and the self-model
RDF 1.1 N-Triples w3.org/TR/n-triples Line-oriented format for streaming ingestion and bulk loads
RDF 1.1 N-Quads w3.org/TR/n-quads Named-graph form for multi-graph dumps
JSON-LD 1.1 w3.org/TR/json-ld11 RDF-on-the-wire for any HTTP API that returns linked data
RDF-star w3c.github.io/rdf-star Edge-attribute representation without full reification (used selectively)
OWL 2 Structural Spec w3.org/TR/owl2-syntax The abstract syntax our F# compiler emits to
OWL 2 Profiles (EL, QL, RL) w3.org/TR/owl2-profiles The profile we target per ontology — drives reasoner choice (§5)
OWL 2 Manchester Syntax w3.org/TR/owl2-manchester-syntax Human-readable syntax for class expressions in Protégé
OWL 2 RDF Mapping w3.org/TR/owl2-mapping-to-rdf How OWL axioms encode as RDF triples in Fuseki
SPARQL 1.1 Query w3.org/TR/sparql11-query Every read against Fuseki and Oxigraph
SPARQL 1.1 Update w3.org/TR/sparql11-update Every write against Fuseki and Oxigraph
SPARQL 1.1 Protocol w3.org/TR/sparql11-protocol The HTTP wire format for POST /sparql
SPARQL 1.1 Graph Store Protocol w3.org/TR/sparql11-http-rdf-update Named-graph load/replace via HTTP verbs — the dlt → Fuseki landing surface
SPARQL 1.1 Federation w3.org/TR/sparql11-federated-query SERVICE keyword for cross-endpoint queries
SHACL Core w3.org/TR/shacl The runtime validation contract on every ingestion path
SHACL Advanced Features w3.org/TR/shacl-af sh:rule and custom targets for contractual derivations
Notation3 (N3) w3c.github.io/N3/spec The edge rule language; N3.js + eyereasoner in the browser and agent
R2RML w3.org/TR/r2rml Relational → RDF mapping (Ontop, materialized + virtual)
RML rml.io/specs/rml Generalized non-RDF → RDF mapping (JSON, XML, CSV)
PROV-O w3.org/TR/prov-o Provenance vocabulary — every triple lands with a prov:Activity and prov:wasDerivedFrom
SKOS w3.org/TR/skos-reference Concept schemes for controlled vocabularies (e.g. tags, taxonomies)
DCAT 3 w3.org/TR/vocab-dcat-3 Dataset catalog vocabulary — how dlt sources self-describe
OWL Time w3.org/TR/owl-time Temporal vocabulary (see Persistence & Time)
BFO 2020 (ISO/IEC 21838-2) iso.org/standard/74572.html Realist upper ontology — see ontology-foundations

External non-W3C references used above:


Closing notes

This stack is intentionally boring at each layer. Fuseki has been ASF-stewarded since 2010. SPARQL 1.1 has been a Recommendation since 2013. SHACL since 2017. Oxigraph and N3.js are newer but live inside well-defined sandboxes where their failure modes are contained.

The interesting work is not in any single component — it's in how they compose. The generator-first F# compiler (§7.4) emits the OWL + SHACL + projections; the ingestion pipeline (§8) lands data through the SHACL gate; Fuseki (§2) serves the canonical graph; Oxigraph (§3) gives the sandbox a private endpoint; N3.js (§4) does edge inference; the reasoners (§5) materialize OWL entailments; the property-graph hybridization (§9) covers the operational surface. Each piece is replaceable in isolation — but the whole only works because each piece speaks the same W3C standards (§10).

That's the bet of the stack: invest in the standards, not the implementations. Any of these implementations could be swapped (Fuseki → GraphDB, Oxigraph → RDF4J embedded, N3.js → a server-side reasoner) without changing the contract. The standards are the contract.


See also: Ontology Foundations · Persistence & Time · Generative Pipeline · Intellectual Foundations (Bibliography)