Skip to content

Ontology Foundations

Why untool.ai grounds its canonical model in two foundational ontologies (UFO and BFO), how the W3C semantic stack (RDF/RDFS/OWL 2/SHACL) is layered, and the trade-offs behind every reasoner, reification pattern, and methodology choice we make.

This page is the reference companion to the platform's ontology ADRs. It is written for technical reviewers (architects, ontologists, auditors) who want the trade-space behind the decisions, not just the decisions themselves. The ADRs say what we picked; this page says what else exists, why we didn't pick it, and where the literature stands today.

Where this fits


1. Why two foundational ontologies (UFO ⊕ BFO)

A foundational (or upper / top-level) ontology fixes the most general categories any domain ontology may inherit from — what kinds of things exist at all. Choosing one is a philosophical commitment, not a stylistic preference. Two traditions dominate the field, and they disagree at their roots:

UFO (Unified Foundational Ontology) BFO (Basic Formal Ontology)
Lineage Guizzardi et al. (NEMO group, UFRGS) Smith et al. (Buffalo / IFOMIS)
Telos Descriptive — model how humans conceptualize a domain Prescriptive realist — model what exists in mind-independent reality
Modeling language OntoUML (UML profile); gUFO as a lightweight OWL 2 implementation Native OWL 2 + Common Logic (CLIF) axiomatization
Standard Active research programme, ~20 years of refereed publications ISO/IEC 21838-2:2021 — formally standardized
Killer feature Rigidity / sortality / identity machinery: «kind», «role», «phase», «relator», «mode», «quality» OBO Foundry alignment — used by hundreds of biomedical ontologies (Gene Ontology, ChEBI, …) and the U.S. DoD's Common Core Ontologies (CCO)
Best at Conceptual modeling, requirements analysis, inter-stakeholder consensus Scientific data integration, regulatory interoperability, evidence-grounded knowledge

1.1 They are not interchangeable

The temptation — and our own original sin, captured in ARC-ADR-039 — is to treat the two as flat synonyms: tag every concept with both a bfoUpper and a gufoArchetype and assume someone, somewhere, will produce a clean mapping. The literature is unambiguous that this fails:

  • Opposing telos. UFO is descriptive (Guizzardi et al., Applied Ontology 2022); BFO is prescriptive realism ("beyond concepts" — Smith, FOIS 2004).
  • Constructs do not correspond. UFO's rigidity / sortality / identity machinery has no BFO primitive. UFO «role» (an anti-rigid sortal whose instances are objects) is not the same thing as BFO role (a realizable dependent continuant). A UFO relator has no native BFO kind.
  • No turnkey mapper. The Ontology Alignment Evaluation Initiative (OAEI) shows foundational-ontology alignment is an open, benchmarked problem. SUGOI — the most general interchange tool — covers DOLCE / BFO / GFO but not UFO, and clean equivalence captures roughly 36% of entities on average (as low as 2%). Even bfo:Continuant ≡ dolce:Endurant proves unsatisfiable on merge.

The 'pick one' temptation, rejected

A single foundational ontology is the cheapest path. We rejected it because:

  • Picking UFO only loses the regulatory / scientific interop story (no CCO, no OBO).
  • Picking BFO only loses the modeling machinery (no relators, no phases, no clean conceptual semantics for stakeholder workshops).
  • The platform is the system of record for both the design-time conceptual model and the runtime evidence graph. Each side has its native foundation.

1.2 Foundations as Perspectives — our resolution

Rather than asserting a false equivalence, the platform holds each foundation as a DSRP Perspective (Cabrera 2015) — a sovereign point of view with its own commitments. Per-concept commitments and divergences are recorded explicitly in a divergence registry. The full mechanics are in ARC-ADR-039; the relevant takeaway here:

  • UFO is the authoring perspective. Domain experts and modelers think in kinds, roles, phases, and relators. The model factory (ontology-pipeline.md) emits gUFO-aligned OWL from a model.yaml IR.
  • BFO is the realist-interop projection. When the same content has to slot into a CCO/OBO-aligned analytic pipeline — regulatory submissions, evidence chains, scientific datasets — we project to BFO with explicit bridge axioms, not a lookup table.
  • Divergences are first-class information, not failures. A UFO «role» and a BFO role are recorded as related but irreducible; the registry carries the modeling decision and the bridge axiom (if any).

Why DSRP and not 'just' meta-modeling

DSRP (Distinctions, Systems, Relationships, Perspectives) supplies an empirically-grounded organizing grammar for holding multiple commitments side-by-side. The category-theoretic / institution-theoretic machinery (Spivak's functorial data migration; the DOL / Ontohub line) does the logical bridging. Grammar organizes; logic bridges. The two are complementary, not in competition.

1.3 Further reading on the foundations

  • Guizzardi, G. (2005). Ontological Foundations for Structural Conceptual Models. PhD thesis, U. Twente — the canonical UFO reference.
  • Guizzardi, G. et al. (2022). UFO: Unified Foundational Ontology. Applied Ontology 17(1).
  • Smith, B. (2004). Beyond Concepts: Ontology as Reality Representation. In FOIS 2004.
  • Arp, R., Smith, B., Spear, A. D. (2015). Building Ontologies with Basic Formal Ontology. MIT Press.
  • ISO/IEC 21838-2:2021 — Information technology — Top-level ontologies (TLO) — Part 2: Basic Formal Ontology (BFO). iso.org/standard/74572.html.

2. The W3C semantic stack — what each layer actually does

untool.ai's external semantic surface is W3C-standard end-to-end. Concretely:

                ┌────────────────────────────────────────────┐
                │           SHACL  (constraints / A-Box gate)│
                ├────────────────────────────────────────────┤
                │           OWL 2  (T-Box / classification)  │
                ├────────────────────────────────────────────┤
                │           RDFS   (subClassOf, domain/range)│
                ├────────────────────────────────────────────┤
                │           RDF    (s-p-o triples / IRIs)    │
                └────────────────────────────────────────────┘

2.1 RDF — the wire format

RDF 1.1 gives us named subjects, predicates, and objects addressable by IRI, plus literal values typed by XSD datatypes. Everything else in the stack is layered semantics on top of triples. RDF is intentionally minimal: no schema, no constraints, no inference.

2.2 RDFS — the schema layer

RDF Schema adds the lightweight vocabulary you need to declare classes and properties: rdfs:Class, rdfs:subClassOf, rdfs:domain, rdfs:range, rdfs:subPropertyOf. RDFS entailment is decidable in polynomial time. It is not enough to express disjointness, cardinality, or property characteristics — those need OWL.

2.3 OWL 2 — the T-Box

OWL 2 Web Ontology Language gives us description-logic semantics: equivalence, disjointness, property restrictions, cardinality, transitivity, inverse, functional/inverse-functional, hasKey, and the rest. The standard defines three profiles trading expressivity for tractability:

Profile DL fragment Reasoning cost Used for
OWL 2 EL EL++ PTime Very large biomedical terminologies (SNOMED CT, Gene Ontology).
OWL 2 QL DL-Lite AC⁰ on data Query-rewriting over relational data; ontology-based data access.
OWL 2 RL rule-expressible subset PTime; implementable as forward-chaining rules Pragmatic enterprise reasoning — our seed reasoner profile.
OWL 2 DL SROIQ(D) N2ExpTime worst-case (HermiT, Pellet) Full classification — used when we need it (e.g. relator-range checks).

Our pick. The seed reasoner is rdflib + owlrl (OWL 2 RL, forward-chaining, pure Python, no Java) — see ARC-ADR-019. We escalate to OWL 2 DL via HermiT / Openllet when a domain T-Box demands it (relator ranges, complex disjointness chains).

Why RL, not DL, by default

Most enterprise reasoning is not hard DL — it's "if X works for Y, infer X is a Person." OWL 2 RL captures that as forward-chaining rules, runs in polynomial time, and degrades gracefully (a missed inference is a missed inference, not an unbounded search). DL profiles exist for the cases where you genuinely need them; defaulting to DL costs you orders of magnitude on data you didn't need it for.

2.4 SHACL — the constraint layer

The T-Box says what can exist. SHACL (Shapes Constraint Language) says what must hold for any A-Box (instance data) before you'll accept it. The two are doing different jobs:

OWL 2 SHACL
Semantics Open-world, monotonic Closed-world, non-monotonic
Question "What can I infer?" "Does this data conform?"
Failure mode Inconsistency (a logical impossibility) Validation report (a list of violations)
When T-Box authoring; classification A-Box gating; ingestion; API contracts

We use both. OWL 2 gives us classification and inference; SHACL gives us gates on incoming data — the relator vertex must bind ≥ 2 role-binding edges; a valid_from must precede valid_to; a relator's participants must satisfy its declared role types. The model factory emits SHACL shapes from the same model.yaml IR that emits OWL.

2.5 SPARQL, RDF-star, and the gaps

  • SPARQL 1.1 (W3C Recommendation) is our standard query surface for the realist-interop projection. Internally we use openCypher over ArcadeDB for the native LPG path.
  • RDF-star (now RDF 1.2 Working Draft) addresses statement-level metadata cleanly — a triple as a subject. We track it but do not use it as the canonical model: our reified relators carry metadata as proper vertex properties (and project to RDF-star or classical n-ary patterns depending on consumer).

3. N-ary relations, reification, and hyperedges

3.1 The modeling problem

RDF is binary: every statement is (subject, predicate, object). Real-world facts are not. "On 2026-04-12, the lab analyst Maya certified, against protocol P-17 and with 0.92 confidence, that sample S-201 belongs to lineage L-3." That single semantic unit ties seven participants together, carries metadata about the relationship as a whole, and must be referenceable by further statements ("the certification was retracted on 2026-04-19"). A naive binary projection — (S-201, certifiedAs, L-3) plus a sidecar of (S-201, certifiedBy, Maya), (S-201, certifiedOn, 2026-04-12) — loses the unit: the analyst, the protocol, and the confidence are no longer tied to each other, only to the sample.

3.2 W3C patterns

The W3C Working Group Note Defining N-ary Relations on the Semantic Web catalogues three patterns:

  1. Class for the relation + binary properties for its arguments. The relation becomes a node with one property per participant. This is the standard idiom and what we implement.
  2. Lists / ordered sequences. Acceptable when arity is intrinsic to the predicate (a recipe's ingredient list); poor for general n-ary relations.
  3. RDF reification (deprecated style). Quoting the triple. Verbose, conceptually fuzzy; superseded in practice by Pattern 1 and (newer) RDF-star.

3.3 UFO relators as the conceptual primitive

In UFO, a relator is a mediating endurant that connects participants through typed roles. An Employment relator mediates employer and employee roles; a Marriage relator mediates two spouse roles. Relators are not predicates with extra fields tacked on — they are first-class entities with identity, lifecycle, and the capacity to be participated in by other relations.

This conceptual move is exactly what Pattern 1 above asks for on the semantic-web side. The W3C pattern says "make the relation a class"; UFO says "yes, and here is the ontology of what that class is." They compose.

3.4 Our materialization — relator-vertex + typed role-binding

Concretely, in the canonical LPG store (ArcadeDB):

                 ┌──────────────────┐
                 │ Sample S-201     │
                 └────────▲─────────┘
                 sampleRole│
                          │
┌──────────────┐  evidenceRole  ┌─────────────────────────────┐  protocolRole  ┌────────────────┐
│ Reading R-3 ◀───────────────  Certification (relator vertex) ───────────────▶ Protocol P-17  │
└──────────────┘                ├─────────────────────────────┤                └────────────────┘
                                │ valid_from / valid_to        │
                                │ recorded_at / superseded_at  │      analystRole
                                │ confidence: 0.92             │            │
                                │ prov:wasDerivedFrom: …       │            ▼
                                └─────────────────────────────┘    ┌──────────────┐
                                                                   │ Analyst Maya │
                                                                   └──────────────┘
  • The relator vertex carries all whole-relation metadata (bitemporal stamps, PROV, confidence).
  • Role-binding edges are typed (sampleRole, protocolRole, analystRole, evidenceRole) — never untyped pointers.
  • A binary relationship is a degenerate 2-role relator — existing binary edges remain valid (additive, not breaking; ARC-ADR-016 D5).
  • Bitemporal + PROV live on the relator, never on participants (audit-safe; no in-place participant edits).

3.5 The "reify judiciously" rule

Reification is not free: every reified relation adds a vertex and n edges to the graph. The rule (D8 in ARC-ADR-016):

Reify only material relations with identity / lifecycle, > 2 participants, whole-relation metadata, or onward participation. Never reify formal relations (parthood, subset, identity).

Operationally, the model factory rejects reified hyperedges with fewer than 2 roles (the RelOver anti-pattern guard from the OntoUML literature) and warns on ORM ≥(n-1) uniqueness smells.

RDF-star alternative

RDF-star quoted triples are a viable lightweight reification idiom for statement-level metadata. They are not a replacement for relators — a quoted triple has no identity-bearing class, no role typing, no lifecycle. We project to RDF-star when consumers want it; we do not author in it.


4. Reasoners surveyed

The reasoner is the engine that turns a T-Box (OWL) + an A-Box (RDF graph) into inferred facts and classifications. The market has consolidated around a small number of mature implementations, each with a sweet spot.

Reasoner Profile Implementation Sweet spot Caveats
HermiT OWL 2 DL (SROIQ(D)) Java; hypertableau Reference DL reasoner; full classification; canonical for academic benchmarks Java runtime; memory-hungry on large A-Boxes
ELK OWL 2 EL Java Massive EL terminologies (SNOMED CT, GO); concurrent; very fast classification EL profile only — no inverse, no nominals
Pellet / Openllet OWL 2 DL + SWRL Java (Openllet is the maintained fork) DL + rule reasoning; explanation services; SPARQL-DL Maintenance has waxed/waned; HermiT is more active for pure DL
owlrl (rdflib) OWL 2 RL Pure Python; forward-chaining over rdflib Pragmatic enterprise inference, no JVM, embeddable; our seed RL profile — misses some DL inferences
Oxigraph RDF + SPARQL 1.1 (no DL) Rust; embedded or HTTP High-performance RDF/SPARQL store + endpoint; great as a side-store Not a reasoner — pair with owlrl or external classifier
N3.js N3 / Notation3 rules TypeScript / JavaScript In-browser rule-based reasoning; eye-style proof; lightweight derived views Not a DL classifier; rule discipline required
RDFox Datalog + OWL 2 RL+ Commercial C++ Highest-throughput materialization; incremental maintenance Commercial licence; heavyweight infra
GraphDB OWL 2 RL/QL (+ proprietary profiles) Commercial Java Enterprise triple store with reasoning baked in Vendor lock-in; opt-in only if data outgrows in-process reasoning
Z3 SMT (beyond DL) C++ with bindings The BFO Common-Logic axioms that fall outside OWL 2 DL Not a triple-store reasoner; bolted on for specific BFO checks

4.1 Our picks, by surface

  • Authoring loop (model factory, ARC-ADR-019, ARC-ADR-032) — rdflib + owlrl for OWL 2 RL forward-chaining; HermiT when full DL classification is required for a profile.
  • Self-model runtime store (ARC-ADR-072) — Oxigraph for the embedded RDF + SPARQL surface; N3.js for in-browser rule-based derived views over the self-model graph (e.g. live capability-→-surface inferences).
  • Graph visualization (ARC-ADR-040) — React Flow for the interactive editor surface; not a reasoner but the layer that consumes materialized inferences.
  • BFO interop projection — owlrl for the OWL 2 RL fragment; Z3 for the beyond-DL Common-Logic axioms (ISO/IEC 21838-2 ships both an OWL 2 and a CLIF axiomatization; the CLIF version is strictly more expressive).

Why we don't standardize on one reasoner

Reasoners have profiles, and a profile is a contract. Locking the platform to a single reasoner couples us to a single OWL 2 fragment. The architecture instead exposes a ReasonerCapable UDA capability (ARC-ADR-019, D2) — the reasoner runtime is pluggable behind it. Today owlrl; tomorrow ELK; the day BFO needs CLIF, Z3 slots in alongside.


5. Competency questions — ontology acceptance tests

A foundational ontology can be elegant and still wrong for your domain. The discipline that catches that gap is the competency question (CQ) methodology, introduced by Grüninger & Fox (1995) and operationalized as part of the NeOn methodology (Suárez-Figueroa et al., 2012).

The shape:

  1. Elicit the questions the ontology must be able to answer in business / domain language.

    "Given a sample, what is the full evidence chain (analyst, protocol, instrument, reading) that supports its current lineage classification?"

  2. Formalize each CQ as a SPARQL query (or, for inference-bearing questions, SPARQL over the materialized A-Box after reasoning).
  3. Test. A CQ fails when (a) the query doesn't parse against the T-Box (the vocabulary is missing) or (b) it parses but returns nothing on a canonical scenario (the A-Box can't represent the answer).
  4. Iterate. A failed CQ is a defect on the ontology, not on the data.

We treat CQs as first-class acceptance tests on the model factory. Every ADR that introduces or changes a relator, role, or property must come with at least one CQ and a scenario that exercises it. The sift/sort authoring loop (ARC-ADR-032) explicitly composes CQ runs into the iteration.

5.1 Methodology references

  • Grüninger, M., Fox, M. S. (1995). Methodology for the design and evaluation of ontologies. IJCAI Workshop on Basic Ontological Issues in Knowledge Sharing.
  • Suárez-Figueroa, M. C., Gómez-Pérez, A., Motta, E., Gangemi, A. (eds.) (2012). Ontology Engineering in a Networked World (the NeOn book). Springer.
  • Bezerra, C., Freitas, F. (2017). Verifying Ontologies with Competency Questions. WI 2017.

6. OBO Foundry principles and the CCO/BFO interop projection

The OBO Foundry is the de facto governance body for biomedical ontologies built on BFO. Its principles — orthogonality, openness, common format, URI naming convention, versioning, documentation, scope, formalization — are the interop contract that makes hundreds of independent ontologies (GO, ChEBI, Uberon, Mondo, …) compose without conflict.

Three principles are load-bearing for our realist-interop projection:

  • Orthogonality (FP-002) — ontologies cover non-overlapping content; cross-ontology relations use shared Relation Ontology (RO) predicates.
  • Naming convention (FP-003 / FP-012) — IRIs follow the http://purl.obolibrary.org/obo/<PREFIX>_<NUMERIC_ID> pattern; labels are domain-controlled.
  • Open licence (FP-001) — Creative Commons attribution-style licences; we keep ours compatible.

6.1 The Common Core Ontologies (CCO)

The U.S. DoD's Common Core Ontologies are a BFO-extension suite covering agents, events, information, artifacts, time, geospatial regions, and quality. They form the mid-level layer between BFO's ~40 categories and any domain ontology. The IKW-GraphEngine BFO/CCO interop projection (see ARC-ADR-039 §1 and ARC-ADR-043) aligns our canonical model to these layers when consumers need it: domain → CCO → BFO.

Why CCO over plain BFO

BFO alone is too abstract for most analytics. CCO supplies the named middle-ground categories (e.g. Act of Information Transfer, Designative Information Content Entity) that a downstream consumer can actually query against. Targeting CCO + BFO is what "BFO-aligned" means in practice for any non-toy interop.


7. Anti-patterns we explicitly avoid

A surprising amount of ontology engineering is negative — knowing the failure modes the literature has already cataloged and refusing to recreate them. We track the following anti-patterns explicitly:

7.1 OntoUML anti-patterns (Guizzardi / Sales catalogue)

Sales & Guizzardi catalog dozens of recurring OntoUML modeling smells. The platform's model-factory validation enforces guards against the high-frequency ones:

  • RelOver — a relator with fewer than two role bindings (a relator that mediates nothing). Rejected at IR validation.
  • RWOR (Relator Without Roles) — a relator whose participants are declared without typed role attachments.
  • AssRel — a binary relationship where the modeler should have introduced a relator but didn't. Flagged by the sift loop (ARC-ADR-032).
  • WSRT (Wrong Sortal Relation Target) — a sortal role whose target type fails the rigidity check.
  • DepPhase — a phase whose specialization condition contradicts its supertype's identity criterion.

7.2 Naive RDF binary modeling

The classic mistake is "promote every relation to a binary predicate." It fails for n-ary relations (§3.1) and produces graphs where temporal and provenance metadata are scattered across participant vertices. Our rule, restated: n-ary, metadata-bearing, or referenceable relations are relator-reified; binary edges remain binary only when they carry no whole-relation metadata.

7.3 Conflating taxonomy with ontology

A taxonomy is a hierarchy of terms organized by is-a. An ontology is a logical theory of a domain — classes, properties, axioms, constraints, inference. A taxonomy lifted into RDFS is not an ontology even if the file extension is .owl. See the taxonomy → ontology gradient in glossary.md for our controlled vocabulary on this.

Taxonomy Controlled vocabulary Thesaurus (SKOS) Ontology
Defines terms? Yes Yes Yes Yes
Hierarchical? Yes Optional Yes (broader / narrower) Yes
Logical axioms? No No No Yes
Inference? No No No Yes

The agent roster reflects this: taxonomistontologist-generalistontologist-ufo / ontologist-bfoknowledge-engineer is a formality gradient, not a synonym list.

7.4 The "single foundation will fit everything" assumption

The assumption that one foundational ontology will serve both descriptive-conceptual and prescriptive-realist needs is itself an anti-pattern (§1.1, ARC-ADR-039). The platform's Perspectives-plus-divergence-registry pattern exists precisely to refuse it.

7.5 Mistaking SHACL for OWL (and vice versa)

Using OWL where SHACL is the right tool produces inferences nobody asked for ("the engine added a triple because it was logically implied — wait, that's now in our compliance report"). Using SHACL where OWL is the right tool produces validation reports that miss everything the data implies. The split (§2.4) is non-negotiable: OWL for the T-Box and inference; SHACL for A-Box gating.


8. The compiler core and the generative pipeline — where this all lands

The foundations don't sit in a .owl file as documentation. They drive a generator-first pipeline that emits the entire downstream stack from one model.yaml IR:

  • F# compiler core (ARC-ADR-033) — algebraic data types and discriminated unions match the categorical structure (kinds as sums; relators as records of roles); the pipeline projects as a category-theoretic functor (cf. Spivak, Functorial Data Migration).
  • Sift / sort authoring loop (ARC-ADR-032) — iterative validation: competency-question runs, anti-pattern guards, divergence-registry reconciliation, holographic-frontier of unconverged candidates.
  • Generative pipeline (ARC-ADR-043) — one model emits OWL (T-Box), SHACL (gates), RDF fixtures, LinkML, C# / TypeScript contract types, hyperedge enforcement APIs, model summaries, and the realist-interop CCO/BFO projection.

The narrative version of the pipeline lives in ontology-pipeline.md; the visualization layer that consumes the materialized graph is in ARC-ADR-040; the runtime self-model store in ARC-ADR-072.


9. External references and further reading

Standards

Foundational ontologies

Reasoners and stores

Books and key papers

  • Allemang, D., Hendler, J., Gandon, F. (2020). Semantic Web for the Working Ontologist, 3rd ed. ACM Books. — the field's standard pragmatic reference.
  • Guizzardi, G. (2005). Ontological Foundations for Structural Conceptual Models. PhD thesis, U. Twente.
  • Guizzardi, G., Wagner, G., Almeida, J. P. A., Guizzardi, R. S. S. (2022). UFO: Unified Foundational Ontology. Applied Ontology 17(1): 167–210.
  • Arp, R., Smith, B., Spear, A. D. (2015). Building Ontologies with Basic Formal Ontology. MIT Press.
  • Smith, B. (2004). Beyond Concepts: Ontology as Reality Representation. In FOIS 2004.
  • Sales, T. P., Guizzardi, G. (2015). Detection, Simulation and Elimination of Semantic Anti-patterns in Ontology-Driven Conceptual Models. In ER 2015.
  • Grüninger, M., Fox, M. S. (1995). Methodology for the design and evaluation of ontologies. IJCAI-95 Workshop on Basic Ontological Issues in Knowledge Sharing.
  • Suárez-Figueroa, M. C. et al. (eds.) (2012). Ontology Engineering in a Networked World. Springer (the NeOn book).
  • Cabrera, D., Cabrera, L. (2015). Systems Thinking Made Simple — the canonical DSRP reference.
  • Spivak, D. I. (2014). Category Theory for the Sciences. MIT Press — functorial data migration grounding.

See also