Ontology Stack — Runtime & Tooling¶
This page is the runtime/tooling companion to Ontology Foundations. Where the foundations page covers the theory (BFO, UFO, gUFO, OntoUML, Common Core Ontologies), this one covers the boxes and arrows: which triple store actually runs, which reasoner answers the query, which rule engine fires in the browser, which validator gates the ingestion path. Every choice is grounded in an ADR or a shipped image manifest in this repo — when a claim is load-bearing, the link goes to the source, not a marketing page.
The headline: we run Apache Jena Fuseki as the SPARQL endpoint, Oxigraph embedded inside sandboxed runners, N3.js in the browser/agent for edge inference, and SHACL as the runtime contract on every ingestion path. The visualization layer is React Flow (ARC-ADR-040). The self-model uses this exact stack as a living instance (ARC-ADR-072).
1. Stack at a glance¶
| Layer | Our choice | Alternative considered | Why ours won |
|---|---|---|---|
| Triple store (server) | Apache Jena Fuseki 5.x (TDB2) | Blazegraph, GraphDB, Stardog, Virtuoso | ASF-governed, OWL-2 + SPARQL 1.1 + SHACL out of the box, container-friendly, sieve manifest contract (template) |
| Triple store (embedded) | Oxigraph (Rust, pyoxigraph / oxigraph crate) |
rdflib pure-Python, RDF4J embedded | Rust → tiny binary, ships inside the runner image (PR #522), SPARQL 1.1 Query + Update + GSP, no JVM in the sandbox |
| SPARQL endpoint | Fuseki /knowledge dataset + Oxigraph /sparql in runner |
embedded-only, no HTTP | Federation + multi-tenant via Fuseki; per-sandbox isolated query via Oxigraph |
| OWL reasoner | ELK for EL profile, HermiT/Openllet for full DL, OWL-RL as a streaming fallback | RacerPro, FaCT++ | ELK is ~100× faster on the OBO-style EL ontologies we author; HermiT for tableau-complete DL; OWL-RL gives forward-chaining over arbitrary triples |
| Rules engine (server) | Jena Rules (GenericRuleReasoner) + SHACL-AF rules |
Drools, Stardog ICV | In-process with Fuseki, no extra service; SHACL-AF unifies shape + rule under one W3C contract |
| Rules engine (edge) | N3.js + eyereasoner | Send every inference back to Fuseki | Browser/agent can fire production rules locally (ADR-040, ADR-072) — disambiguator (ADR-046) avoids the round-trip |
| SHACL validator | Apache Jena SHACL (server) + shacl-engine / rdf-validate-shacl (JS) | TopBraid SHACL API (commercial) | Pure ASF Java on the server matches the Fuseki stack; rdf-validate-shacl mirrors it in the agent |
| Shape / schema authoring | Hand-authored Turtle + the F# ontology compiler (ARC-ADR-033) | TopBraid Composer (GUI), Stardog Studio | We treat shapes as source; the F# compiler lets us prove projections are functors and generators are catamorphisms |
| Ontology IDE | Protégé 5.6 (occasional), VS Code + ttl/oxigraph extensions (daily) | TopBraid Composer, WebProtégé | Protégé is the OBO de-facto standard but desktop-only; daily editing is in VS Code beside the F# compiler |
| ETL / ingestion to ontology | dlt sources → R2RML / RML mappings → SHACL gate → Fuseki | Custom Python, Kettle, Airbyte | dlt gives schema-on-read source connectors; RML/R2RML are the W3C mapping standards; SHACL is the runtime contract (ARC-ADR-030) |
| Visualization | React Flow + dagre/elk.js layout | Cytoscape.js, vis.js, D3 force | ARC-ADR-040 — typed nodes/edges, headless layout, React-native |
The rest of this page walks each row.
2. Apache Jena Fuseki — why a real SPARQL endpoint¶
Apache Jena is the ASF-stewarded Java RDF stack. Fuseki is its HTTP server: a SPARQL 1.1 Query + Update + Graph Store Protocol endpoint backed by TDB2 (the on-disk triple store) or in-memory datasets.
We could have stayed embedded (Jena library only, or Oxigraph everywhere). We chose to run Fuseki as a first-class platform-tier container (ARC-ADR-023) for three reasons.
2.1 Federation and multi-graph reality¶
A SPARQL endpoint is a protocol surface, not just a library. Once Fuseki is up, any agent
in the fleet — Claude Code, Copilot, GitHub Actions, a self-hosted runner, an Azure
Container App — can hit POST /knowledge/sparql with a query body and get RDF back. That
single fact dissolves a class of integration glue we'd otherwise hand-write per consumer.
SERVICE keyword federation also becomes free: a query in the self-model dataset can pull
context from a separate Fuseki dataset (or any other SPARQL 1.1 endpoint — DBpedia, an OBO
mirror, a Wikidata SPARQL endpoint) via SERVICE <https://other/sparql>. The cost of
federation collapses to "do you have the URL?".
Why not GraphDB or Stardog?
Both are excellent commercial-leaning OWL stores. GraphDB has the best built-in reasoner ergonomics; Stardog has the strongest ICV story. We avoided them because AgentArmy is a template repo that fleets fork — pinning a closed-source license into the platform tier would make every fork inherit that constraint. ASF licensing keeps the door open.
2.2 The sieve manifest contract¶
Fuseki ships in this repo as templates/fuseki-ontology-image/ — a deliverable container
image with an image.json manifest in the Image Standard. The
manifest declares three doctor checks the image must pass to be considered healthy:
sieve-accepts-conformant— load a known-good Turtle file via the Graph Store Protocol (PUT /knowledge/data?graph=…) and confirm the SHACL validator returnssh:conforms true.sieve-rejects-violating— load a known-bad Turtle file and confirm the validator returnssh:conforms falsewith the expectedsh:resultPath.construct-emits— fire a knownCONSTRUCTquery and confirm the inferred triples appear in the result set.
Those three checks are the runtime contract for "this Fuseki is actually a sieve, not
just a triple bucket." agentarmy-doctor image fuseki-ontology runs them on every CI build
and every local rebuild. The image is not published until all three are green.
2.3 CORS, GSP, and the /knowledge vs admin split¶
Fuseki's default shiro.ini locks down the /$/ admin endpoints behind basic auth — that's
why direct PUT /$/datasets/... returns 403 without credentials. The /knowledge data
endpoints are deliberately open inside the fleet's private network so any agent can call
GSP (GET/PUT/POST/DELETE /knowledge/data) and SPARQL (POST /knowledge/sparql,
POST /knowledge/update) without juggling tokens.
CORS is configured in fuseki-config.ttl to allow Origin: * on the data endpoints only;
the admin endpoints carry the default same-origin policy. That posture matches our threat
model — fleet repos are private and trusted (memory: threat-model-no-forks),
so we'd rather pay for ergonomic edge-agent inference than per-call token plumbing.
The Graph Store Protocol is the
under-appreciated part. GSP says: a named graph is a resource at a URL, and you load it
with HTTP verbs. That maps cleanly to our dlt → SHACL → Fuseki pipeline: each dlt source
becomes a named graph, ingestion is PUT /knowledge/data?graph=urn:source:foo, and the
SHACL validator runs on the graph before it's committed to TDB2.
3. Oxigraph — embedded RDF in the sandbox¶
Oxigraph is an RDF store written in Rust, with a
SPARQL 1.1 Query + Update + GSP implementation, Python bindings (pyoxigraph), and a
standalone oxigraph_server binary. It is the embedded counterpart to Fuseki.
Per PR #522 ("embed oxigraph_server inside runner image & add guest agent SPARQL query
endpoint"), every sandboxed runner ships with oxigraph_server already listening on
127.0.0.1:7878. The guest agent has its own private SPARQL endpoint that nobody outside
the sandbox can reach.
When to use Oxigraph vs Fuseki¶
| Use Oxigraph when | Use Fuseki when |
|---|---|
| The triples are sandbox-private (per-run scratch state) | The triples are fleet-shared (self-model, contracts registry) |
| You want zero JVM in the runner image | You need TDB2 durability and the admin UI |
| The query is over a small graph (< 10M triples) | You need federation across datasets, or SERVICE |
| You want the agent to mutate the store and discard it on exit | You need the store to survive container restarts |
| You're prototyping a rule set locally | You're publishing a shared ontology to consumers |
The two stores are not redundant — they sit at different sandbox boundaries. The self-model is in Fuseki because every agent needs to query it. A scratch graph that a disambiguator builds for one inference run is in Oxigraph because nobody else needs it and the agent can throw it away.
Both speak SPARQL 1.1, so the query syntax is identical. The runtime decides which endpoint to hit based on whether the query is over shared state or sandbox state.
Why not skip Fuseki entirely and run Oxigraph server everywhere?
We considered it. The deal-breaker is the Apache Jena ecosystem: SHACL-AF rules, the
Jena Rules language, the text:query full-text extension, and the geosparql: spatial
extension are all Java-only and live inside Fuseki's process. Oxigraph's SHACL support
is improving but isn't at parity yet. So: Oxigraph for the sandbox, Fuseki for the
shared knowledge layer.
4. N3.js — Notation3 rules at the edge¶
N3.js is the RDF.js Notation3 parser/serializer, and eyereasoner is the JavaScript port of the EYE reasoner — together they let us run Notation3 rules inside the browser, inside a Node agent, inside any V8-hosted runtime.
This matters because of two ADRs:
- ARC-ADR-046 — the disambiguator streaming service fires rules as tokens arrive. Round-tripping every candidate to a server SPARQL endpoint would add 50–200ms per token. N3.js + eye-js evaluate the rule set in-process; the latency budget stays in the single-digit milliseconds.
- ARC-ADR-072 — the self-model
viewer in
ontology/platform-self-model/viz/runs a small N3 rule set client-side to derive secondary facts (e.g., "a surface that exposes a capability that is realized-by a component running in tier X is itself a tier-X-adjacent surface"). Pushing that derivation to the browser lets the viewer render the right colours without a server roundtrip.
N3 is a superset of Turtle that adds rules ({ ?s :p ?o } => { ?s :q ?o }.), graph
literals (treating a graph as a term), and quantification. It's the rule language EYE
was built for; eyereasoner is the production-grade implementation.
Why N3 over SPARQL CONSTRUCT for edge inference¶
SPARQL CONSTRUCT queries can also derive new triples. The reason we reach for N3 at the
edge is iteration: N3 rules naturally fire to a fixed point (forward-chain until no new
triples are produced). To get the same behaviour from SPARQL you have to wrap CONSTRUCT in
an external loop and detect quiescence yourself. N3 reasoners do it for you.
That said: server-side, we use CONSTRUCT for one-shot derivations and Jena Rules for the
fixed-point work, because they live in the same JVM as Fuseki and have direct access to
TDB2 indexes. The choice is "edge: N3, server: Jena Rules / SHACL-AF" — same paradigm, two
runtimes.
5. Reasoners considered¶
OWL reasoning is a spectrum. Picking one reasoner for everything is the wrong shape — ontologies span OWL 2 profiles (EL, QL, RL) and full DL, and each profile has a class of reasoner that's dramatically faster than the general-purpose ones.
5.1 ELK — the EL profile workhorse¶
ELK is a Java reasoner for the OWL 2 EL profile: subsumption, classification, and instance checking only, but in polynomial time and with brutally parallelizable algorithms. The Gene Ontology, SNOMED CT, and most OBO Foundry ontologies are deliberately authored within EL so ELK can classify them in seconds.
We use ELK for: the BFO/CCO interop projection (ADR linked in
ontology-foundations), any OBO-style ontology we ingest, and the
self-model's structural classification. The trade-off is that ELK can't reason about
owl:disjointWith chains, complex role hierarchies, or anything outside EL.
5.2 HermiT and Openllet — full OWL 2 DL¶
HermiT (Glimm et al., 2014) is a hypertableau reasoner for OWL 2 DL — complete on the full DL profile. We use it when an ontology uses cardinality restrictions, complex class expressions, or property chains that ELK can't handle.
Openllet is the actively maintained fork of Pellet (originally from Clark & Parsia). Same OWL 2 DL coverage as HermiT, different algorithm (tableau with optimizations). We keep both in the runner image because one occasionally outperforms the other on specific ontology shapes; the runtime picks per workload.
Why not RacerPro or FaCT++?
RacerPro is commercial and not freely redistributable in our images. FaCT++ is C++, excellent on the right shape, but its JNI integration with Jena has bit-rotted; we'd rather pay the small perf gap to stay on a pure-JVM reasoner.
5.3 OWL-RL — rule-based fallback¶
The OWL 2 RL profile is designed to be
implementable by a rule engine over RDF triples — forward-chaining production rules can
materialize all OWL-RL entailments. We use Jena's GenericRuleReasoner configured with the
OWL-RL rule set as a streaming fallback on the ingestion path: triples come in via dlt,
OWL-RL closure is computed forward, SHACL validates the result, and the closed graph lands
in Fuseki.
This is the cheapest OWL inference we run, and the only one that works incrementally over a
firehose. The trade-off is that OWL-RL is a strict subset of DL; an ontology that uses
features outside RL (e.g., owl:someValuesFrom in non-subclass position) gets incomplete
inference. We document per-ontology which profile it targets and which reasoner it expects.
5.4 When we run which¶
| Workload | Reasoner | Why |
|---|---|---|
| OBO Foundry ingestion (Gene Ontology, ChEBI, etc.) | ELK | EL profile, scales to millions of classes |
| BFO/CCO interop projection | ELK | EL classification of the upper ontology |
| Self-model classification | ELK + N3.js (edge derivation) | EL on the server, N3 in the viewer |
| Ad-hoc DL question ("does this ontology entail X?") | HermiT or Openllet | Full DL completeness |
| dlt firehose ingestion | OWL-RL (Jena Rules) | Streaming, incremental |
| Disambiguator streaming | N3.js + eye-js | Sub-ms per token, browser/agent |
6. SHACL — the runtime validation contract¶
SHACL (Shapes Constraint Language, W3C Rec 2017) is the
contract language for RDF. A SHACL shape says "a node of type :Person must have exactly
one :birthDate of datatype xsd:date"; a SHACL validator answers "does this graph
conform?" with a precise report (sh:conforms, sh:result, sh:resultPath, sh:value).
We use SHACL three ways:
- Ingestion gate — every dlt source has a SHACL shapes graph. The ingestion pipeline
loads source data into a candidate named graph, runs SHACL Core + SHACL-AF, and only
promotes the graph to the shared Fuseki dataset if
sh:conforms true. (ARC-ADR-030 is the anchor.) - Contract registration — when a new ontology is registered in
docs/contracts.md, its SHACL shapes graph is part of the contract. Consumers can fetch the shapes and validate locally before sending data. - SHACL-AF rules — SHACL Advanced Features adds
sh:rule(triple/SPARQL rules) and custom targets. We use SHACL-AF for derivations that are contractual — "if a node has property X, the validator MUST add property Y before reporting conformance." This collapses the shape-vs-rule split into a single W3C spec, which is exactly what we want for a contract surface.
Why SHACL over ShEx¶
ShEx is the other shape language; Wikidata uses it heavily. We picked SHACL because:
- It has first-class W3C Recommendation status; ShEx has a Community Group spec only.
- SHACL-AF unifies shape and rule under one runtime; ShEx + rules requires a separate engine.
- Jena ships a mature SHACL implementation; the Jena ShEx implementation is community- maintained and lags.
We still read ShEx
OBO ontologies occasionally publish ShEx schemas alongside SHACL. We convert at
ingestion time using shexer or hand-port the constraints. There's no philosophical
objection — we just don't want two validators in the runtime hot path.
7. Authoring tooling¶
The boring secret of ontology engineering is that editing is in a text editor. The fancy GUI tools matter at specific moments.
7.1 Protégé — the canonical desktop editor¶
Protégé (Stanford BMIR) is the OBO de-facto standard for authoring OWL ontologies. We use it occasionally for:
- Visualizing class hierarchies of someone else's ontology before deciding to ingest it.
- Authoring complex class expressions where the OWL Manchester syntax in a text editor is
fiddly (e.g.,
(hasPart some Wheel) and (hasPart some Engine)). - Running HermiT interactively against an in-development ontology to catch DL errors.
We do not use Protégé as the canonical source format. The canonical format is Turtle in git, hand-authored or compiled.
7.2 ROBOT — the OBO release tool¶
ROBOT (Jackson et al., 2019) is the command-line tool the OBO Foundry uses for ontology release pipelines: extract, merge, reason, convert, validate, diff. It's a Java CLI with a stable interface.
We adopted ROBOT verbatim for any pipeline step that's already a ROBOT recipe — robot
reason --reasoner ELK, robot convert --format ofn, robot diff. There's no point
reimplementing it; the OBO community has hardened these commands over a decade.
7.3 TopBraid Composer — deliberately avoided¶
TopBraid Composer is the commercial SHACL/SPARQL/OWL IDE from TopQuadrant. It's excellent. We avoided it because:
- License cost would have to propagate to every fork of this template.
- Its SHACL engine has TopQuadrant-specific extensions that wouldn't validate against Jena's pure W3C implementation; ingestion would diverge between authoring and runtime.
- The contract surface for AgentArmy is open W3C standards, not vendor extensions.
We still read TopBraid's SHACL examples (their docs are some of the best in the ecosystem). We just don't author against the IDE.
7.4 The F# ontology compiler¶
ARC-ADR-033 is the architectural decision to build an F# ontology compiler as the canonical authoring layer for the fleet's own ontologies (as distinct from third-party ingested ones).
The compiler takes a typed F# source — algebraic data types, computation expressions, category-theoretic combinators — and emits:
- OWL 2 (Functional and Manchester syntax)
- Turtle (for SHACL shapes and the runtime store)
- gUFO-aligned OWL projection (for the UFO authoring discipline, ontology-foundations)
- BFO/CCO-aligned OWL projection (for the realist interop discipline)
- TypeScript type definitions (for the frontend-core consumer)
- C# record types (for backend-core consumers)
This is the generator-first pattern memorized in the project memory: the source is one F# AST; everything else is a deterministic projection. Per category theory for the FP compiler, each projection is a functor, generators are catamorphisms over the AST, the gUFO↔BFO mapping is a natural transformation, and the sift validation loop is monadic.
The F# compiler lives next to the rest of the ontology runtime — it is not a build-time afterthought, it's a peer of Fuseki and Oxigraph. When you change a source ADT, the compiler re-emits OWL + SHACL + gUFO + BFO, the doctor checks rerun, and the projections either still validate or they don't.
8. Ingestion — RML, R2RML, and dlt¶
Most data in the world is not RDF. The ingestion path turns non-RDF source data into ontology-conformant triples, validates them with SHACL, and lands them in Fuseki.
8.1 dlt — source connectors¶
dlt (data load tool) is the Python library we use for source connectors. dlt gives us schema-on-read extraction from REST APIs, databases, S3, GCS, filesystem dumps, SaaS APIs (Stripe, HubSpot, Salesforce, etc.), with retry/backoff/state tracking included.
See docs/dlt-pipelines.md and docs/dlt-sources.md for the inventory of sources and pipelines we run. The dlt output is typed tabular data — DuckDB tables in the staging area — not RDF. The next step lifts it.
8.2 R2RML — relational to RDF¶
R2RML (W3C Rec 2012) is the standard mapping language for
turning relational data into RDF. A rr:TriplesMap says "this SQL query produces rows;
each row is a subject of type :Foo; column bar becomes object of :hasBar."
For sources that land in DuckDB (most of them), R2RML mappings via the Ontop engine are the path. Ontop also supports virtual R2RML — answering SPARQL queries by translating to SQL against the live source, no materialization needed. We use materialization for batch ingestion and virtual for ad-hoc federated queries.
8.3 RML — generalized mapping¶
RML (RML.io, currently a Community Group spec, W3C CG path)
generalizes R2RML to non-relational sources: JSON, XML, CSV, MongoDB. An rml:LogicalSource
points at a JSON file with a JSONPath expression; the rest of the mapping looks like R2RML.
For dlt sources that don't naturally project to a SQL view (deeply nested JSON, XML feeds), RML is the path. We run RML mappings through RMLMapper in the ingestion container.
8.4 The end-to-end shape¶
dlt source R2RML / RML SHACL gate Fuseki GSP
───────────── → ────────────── → ────────────── → ──────────────
JSON/SQL/CSV Turtle triples sh:conforms? PUT /knowledge/
→ DuckDB (named graph) true → promote data?graph=...
false → quarantine
The SHACL gate is the runtime contract from §6. The Fuseki named graph is the GSP target from §2.3. The whole pipeline is the embodiment of ARC-ADR-030.
9. Storage shapes — pure RDF vs property graph vs hybrid¶
We run both an RDF triple store (Fuseki/Oxigraph) and a property graph (ArcadeDB). This is deliberate, and is the architectural shape memorialized in ARC-ADR-016 (reification + hyperedges) and ARC-ADR-055 (hybrid object query system).
9.1 What RDF gives us¶
- Standards: OWL, SHACL, SPARQL, GSP, federation, PROV-O, RML/R2RML — none of which have full property-graph equivalents.
- Logical semantics: an OWL reasoner can answer "does this graph entail X?" against a formal model theory.
- Open-world assumption: missing data isn't absence; it's just absent. This is the right default for an ingestion pipeline that pulls from many sources.
9.2 What property graphs give us¶
- Performance on path queries: "find all 4-hop neighbours via edges labelled
:dependsOn" is a graph-native operation in ArcadeDB; in SPARQL it's a property path that the optimizer often gets wrong. - Edge properties: an edge can carry attributes (weight, timestamp, confidence) without reification. In RDF, every edge attribute requires either reification, RDF-star, or a named-graph dance.
- Fast traversal indices: ArcadeDB's RID-based indexing makes traversal latency predictable; Fuseki's TDB2 is excellent for query but not built for traversal-heavy workloads.
9.3 Why hybridize¶
The fleet's data has both shapes. The contract surface — what an ontology says about a domain — wants RDF + OWL + SHACL. The operational surface — who depends on whom, what runs where, which agent is in which state — wants traversal and edge properties.
We hybridize via two patterns:
- Reification for canonical, property-graph for operational. The canonical representation in Fuseki uses standard reification or RDF-star for edge attributes; a denormalized property-graph view in ArcadeDB carries the same edges with native edge properties for query performance. The Fuseki copy is the source of truth; the ArcadeDB copy is a derived materialization.
- Hyperedges for n-ary relations. Per ADR-016, n-ary relations (a contract has a
producer, a consumer, a version, a mock URL, a status) are reified as a single
:Hyperedgenode in RDF with role properties pointing to participants. The same hyperedge becomes anEdgecollection in ArcadeDB with role properties as edge attributes.
The hybrid object query system adds a federation layer above both stores so a single object query can pull canonical facts from Fuseki and traversal results from ArcadeDB without the caller knowing which store served which fact.
10. Standards index¶
Every choice on this page is anchored in a W3C or de-facto standard. This is the index.
| Standard | Spec URL | What we use it for |
|---|---|---|
| RDF 1.1 Concepts & Abstract Syntax | w3.org/TR/rdf11-concepts | The data model — triples, IRIs, literals, datatypes — for every graph in the stack |
| RDF 1.1 Turtle | w3.org/TR/turtle | The canonical authoring syntax for shapes, ontologies, and the self-model |
| RDF 1.1 N-Triples | w3.org/TR/n-triples | Line-oriented format for streaming ingestion and bulk loads |
| RDF 1.1 N-Quads | w3.org/TR/n-quads | Named-graph form for multi-graph dumps |
| JSON-LD 1.1 | w3.org/TR/json-ld11 | RDF-on-the-wire for any HTTP API that returns linked data |
| RDF-star | w3c.github.io/rdf-star | Edge-attribute representation without full reification (used selectively) |
| OWL 2 Structural Spec | w3.org/TR/owl2-syntax | The abstract syntax our F# compiler emits to |
| OWL 2 Profiles (EL, QL, RL) | w3.org/TR/owl2-profiles | The profile we target per ontology — drives reasoner choice (§5) |
| OWL 2 Manchester Syntax | w3.org/TR/owl2-manchester-syntax | Human-readable syntax for class expressions in Protégé |
| OWL 2 RDF Mapping | w3.org/TR/owl2-mapping-to-rdf | How OWL axioms encode as RDF triples in Fuseki |
| SPARQL 1.1 Query | w3.org/TR/sparql11-query | Every read against Fuseki and Oxigraph |
| SPARQL 1.1 Update | w3.org/TR/sparql11-update | Every write against Fuseki and Oxigraph |
| SPARQL 1.1 Protocol | w3.org/TR/sparql11-protocol | The HTTP wire format for POST /sparql |
| SPARQL 1.1 Graph Store Protocol | w3.org/TR/sparql11-http-rdf-update | Named-graph load/replace via HTTP verbs — the dlt → Fuseki landing surface |
| SPARQL 1.1 Federation | w3.org/TR/sparql11-federated-query | SERVICE keyword for cross-endpoint queries |
| SHACL Core | w3.org/TR/shacl | The runtime validation contract on every ingestion path |
| SHACL Advanced Features | w3.org/TR/shacl-af | sh:rule and custom targets for contractual derivations |
| Notation3 (N3) | w3c.github.io/N3/spec | The edge rule language; N3.js + eyereasoner in the browser and agent |
| R2RML | w3.org/TR/r2rml | Relational → RDF mapping (Ontop, materialized + virtual) |
| RML | rml.io/specs/rml | Generalized non-RDF → RDF mapping (JSON, XML, CSV) |
| PROV-O | w3.org/TR/prov-o | Provenance vocabulary — every triple lands with a prov:Activity and prov:wasDerivedFrom |
| SKOS | w3.org/TR/skos-reference | Concept schemes for controlled vocabularies (e.g. tags, taxonomies) |
| DCAT 3 | w3.org/TR/vocab-dcat-3 | Dataset catalog vocabulary — how dlt sources self-describe |
| OWL Time | w3.org/TR/owl-time | Temporal vocabulary (see Persistence & Time) |
| BFO 2020 (ISO/IEC 21838-2) | iso.org/standard/74572.html | Realist upper ontology — see ontology-foundations |
External non-W3C references used above:
- Apache Jena Fuseki — server documentation
- TDB2 — on-disk store
- Oxigraph — embedded Rust RDF store
- N3.js — RDF.js Notation3 parser
- eyereasoner / eye-js — JS port of the EYE reasoner
- ELK — OWL 2 EL reasoner
- HermiT — OWL 2 DL hypertableau reasoner
- Openllet — actively maintained Pellet fork
- Protégé — Stanford OWL editor
- ROBOT — OBO release tool
- Ontop — R2RML virtual SPARQL engine
- RMLMapper — RML execution engine
- dlt — data load tool
- React Flow — visualization (per ADR-040)
- dagre / elk.js — layout engines
Closing notes¶
This stack is intentionally boring at each layer. Fuseki has been ASF-stewarded since 2010. SPARQL 1.1 has been a Recommendation since 2013. SHACL since 2017. Oxigraph and N3.js are newer but live inside well-defined sandboxes where their failure modes are contained.
The interesting work is not in any single component — it's in how they compose. The generator-first F# compiler (§7.4) emits the OWL + SHACL + projections; the ingestion pipeline (§8) lands data through the SHACL gate; Fuseki (§2) serves the canonical graph; Oxigraph (§3) gives the sandbox a private endpoint; N3.js (§4) does edge inference; the reasoners (§5) materialize OWL entailments; the property-graph hybridization (§9) covers the operational surface. Each piece is replaceable in isolation — but the whole only works because each piece speaks the same W3C standards (§10).
That's the bet of the stack: invest in the standards, not the implementations. Any of these implementations could be swapped (Fuseki → GraphDB, Oxigraph → RDF4J embedded, N3.js → a server-side reasoner) without changing the contract. The standards are the contract.
See also: Ontology Foundations · Persistence & Time · Generative Pipeline · Intellectual Foundations (Bibliography)