Skip to content

ARC-ADR-073 — The Omnological Project, Implemented: Continuous Published-Research Ingestion as a First-Class Platform Capability

Field Value
ID ARC-ADR-073
Status Proposed
Date 2026-06-06
Deciders Hub owner (Nicky Clarke)
Supersedes
Superseded by
Tags research-ingestion, omnology, omnologist, consilience, transdisciplinarity, ontology, self-building, sift-loop, science-domains, humanities, classics, knowledge-pipeline, howard-bloom, dsrp, cabrera, clarke-2024, education, relational-learning, category-theory

A note on the moment, and on the lineage. This ADR is recognized at authoring time as a strategic inflection in the platform's self-conception. Until now, the platform's ontology has been grown endogenously — by the operator and the agent fleet from internal reasoning. From this point onward, the platform treats the global scientific record as a first-class input to its own ontology — continuous, sift-loop-governed, provenance-bearing ingestion of published research that refines the omnology and re-emits new platform capabilities through the forge. Science becomes fuel.

The intellectual lineage this ADR stands in. The word omnology — and its companion role, the omnologist — was articulated and championed by Howard Bloom (cf. The Omnologist's Manifesto; The Lucifer Principle, 1995; Global Brain, 2000; The God Problem, 2012). Bloom's omnology is the deliberate study of everything across disciplines — the transdisciplinary inquiry that refuses to be siloed because nature isn't siloed. The integration of Bloom's omnology with the foundational ontologies (Barry Smith's BFO and Giancarlo Guizzardi's UFO) and with DSRP (Derek and Laura Cabrera's systems-thinking framework, NSF-funded research at Cornell) as a single architectural stack — DSRP as cognitive scaffolding, BFO + UFO as foundational ground, omnology as the meta-disciplinary capstone — was named explicitly by the hub owner in Our Children's Education (Clarke, 2024). This ADR is the first attempt the authors are aware of to implement that integrated stack as a software platform capability: a real, running, governed pipeline that ingests the world's structured knowledge across fields and synthesizes it into a single, queryable, foundationally-grounded knowledge structure that the platform reasons over and acts through. We are not coining the terms — we are honoring them by giving them computational substance. We hope to be useful to the broader community of omnologists, generalists, transdisciplinarians, consilience-minded scientists, DSRP educators, and contemplative scholars for whom this project's success would be their project's success.


Context and problem statement

untool.ai is designed as a self-building, model-driven, ontology-first AI delivery platform.

The platform-self-model (ontology/platform-self-model/) is a digital twin of the fleet, and the forge materializes ontology into typed, byte-identically-emitted code. The sift loop refines the ontology under human-in-the-loop ratification. The foundations-as-perspectives commitment lets us project the same ontology through both UFO (design-oriented, Guizzardi et al.) and BFO (realist, ISO/IEC 21838-2) lenses without picking one.

All of this growth has, until this ADR, been endogenous — the ontology grows from operator insight, agent reasoning, and the platform's own self-observation. That is necessary but not sufficient. The world's structured knowledge — peer-reviewed and preprint scientific literature — is the exogenous source the platform must learn to ingest if it is to become a general-intelligence fabric in any honest sense. A self-building platform that cannot ingest the scientific record is a closed system; closed systems are bounded above by the operator's own knowledge. We do not want that bound.

Two recent landings make the moment ripe:

  1. The HF Papers helper and bibliography workflow (using-hf-papers.md, tools/hf-papers.mjs) gave the fleet a programmatic, frictionless surface over Hugging Face Papers — a curated, structured, searchable view of the AI/CS scientific record. The helper handles search, metadata, abstracts, markdown body, and linked-artifact discovery in one-liners.
  2. The Research & References section (docs/research/) gave us a public altitude above the ADRs at which bibliographies are first-class citizens — the Formal-methods adjacent literature section is the worked example.

These two pieces together let us see a third thing that had been invisible: the same pipeline that grew our ontology from operator insight can grow it from published research, given the right governance. That third thing is what this ADR proposes to build.

The operator has a substantial corpus of papers to ingest and many more incoming. We need a system, not a habit.

Scope

This ADR specifies:

  • A continuous research-ingestion pipeline as a first-class platform capability — named, registered in the system capability model, sift-loop-governed, forge-emittable.
  • The omnology construct — the platform's evolving meta-ontology, the union of all ingested domain ontologies under the foundations-as-perspectives projection, with the disambiguator streaming service bridging cross-ontology term collisions.
  • The science-domain ontology expansion roadmap — the next ontological frontiers (life sciences, materials, physics, social science, formal methods) and the order we tackle them.
  • The governance model — what gets ratified, by whom, with what evidence — using the existing HITL Decision Pattern (ADR-001).
  • The provenance contract — every claim derived from published research is durably linked back to its source paper and its ratification record, satisfying W3C PROV-O shape.

Out of scope (deferred to follow-on ADRs):

  • The specific reasoner schedule for materializing inferred axioms across the omnology (touches ADR-019, ADR-051).
  • The licensing surface for redistributing derived ontology fragments (we ingest open-access and preprint; downstream re-emission needs a separate license-engineering pass).
  • Federation with external ontology registries (BioPortal, OLS, OntoBee) — touched on in the roadmap, designed separately.

Decision drivers

  • Exogenous input becomes a first-class platform feature, not a researcher's side-quest. If the operator has a stack of papers to load and share, the platform must be the place they land — durably, queryably, and as ontology, not just text.
  • The self-building thesis demands it. A platform that calls itself self-building but cannot learn from the literature is self-deceiving. The honest version of self-building includes assimilating the structured external record.
  • Science domains are the obvious next frontier. Our existing ontology emphasizes platform-self-model and software engineering. Life sciences (BFO/OBO heritage is here), materials, physics, and social science are the natural extensions — each has decades of formalized ontology work we can stand on rather than reinvent.
  • The sift loop already exists. ADR-032 gave us LLM-assisted refinement with HITL ratification. We extend it, we don't reinvent it.
  • The forge already exists. ADR-029 and ADR-043 gave us byte-identical, multi-target code emission from ontology. Capabilities derived from ingested research become real platform code through this path.
  • The HF Papers service is already wired. tools/hf-papers.mjs and using-hf-papers.md gave us the source connector. This ADR turns that connector into the front of a real pipeline.
  • Provenance is non-negotiable. Every claim derived from an ingested paper must trace back to its source. This is both an epistemic requirement (we are not in the business of unverifiable assertions) and a defensive one (knowing where a claim came from is how we re-verify when the literature updates).

Considered options

Option A — Status quo: papers in a folder, ad-hoc citation

Keep doing what we do today: when a question prompts a literature look, an agent runs the HF helper, hand-curates a bibliography row, pastes it. Papers live in the operator's filesystem; their structure is whatever a reader extracts in the moment.

Rejected. Does not scale beyond a handful of papers. Knowledge extracted in one session is invisible to the next. No provenance. No path from "I read this" to "the platform knows this." The operator's stated experience — "I have so many papers to start loading and sharing" — is the exact failure mode this option produces.

Option B — A flat literature database — papers stored as records, queryable

Build a structured paper store (metadata + extracted facts) on top of ArcadeDB. Search and retrieval improve; bibliographies become easier to maintain.

Rejected as insufficient. A flat store is a library, not a learner. It does not feed the ontology. It does not surface as platform capability. It does not participate in the sift loop. We would still have a closed self-building platform with a separate, parallel literature surface — exactly the wrong shape.

Option C — Continuous research-ingestion pipeline, sift-loop-governed, ontology-emitting, forge-materializing (chosen)

Build the literature surface as a first-class platform capability that:

  1. Ingests papers via the HF Papers service (and direct PDF / arXiv / publisher-API channels) into a versioned content store.
  2. Extracts structured claims, entities, relations, and competency-question answers via LLM-mediated parsers under SHACL constraints derived from the omnology.
  3. Routes each candidate fragment through the sift loop (ADR-032), where it is judged against existing ontology, ratified by HITL, and either merged, queued for refinement, or rejected with explanation.
  4. Merges ratified fragments into the omnology — the platform's evolving meta-ontology — under foundations-as-perspectives projection (UFO and BFO lenses both materialize where the source frames it).
  5. Re-runs the forge (ADR-029 / ADR-043) over the updated omnology, producing new typed objects, new SHACL shapes, new SPARQL queries, new API contracts, and new platform capabilities — byte-identically emitted, drift-guarded, PR-ratified.
  6. Records the complete chain — paper → extracted fragment → sift decision → omnology delta → forge emission → emitted capability — as durable provenance per W3C PROV-O, addressable through the HVFS substrate when it lands.

The literature is no longer a sidecar. It is a load-bearing input to the self-building loop. Every paper the operator loads, properly ratified, is one increment of the platform's capability surface area.

Chosen. This is the only option that satisfies the self-building thesis honestly while standing on the load-bearing infrastructure we already have.


Decision outcome — the Continuous Research Ingestion Pipeline

Capability registration

A new system-ontology capability is declared:

  • Capability ID: cap-research-ingest
  • Verb: ingest published research into the omnology under sift-loop governance and re-emit downstream capabilities through the forge.
  • Realized by: a new research-ingest repo (function-tier per ADR-023), an extended sift-loop surface inside Crucible, and the existing forge.
  • Exposed through: the Crucible surface (the existing corpus → ontology surface — this capability is a natural extension of Crucible's existing Corpus → Ground → Derive → Materialize flow, with published research as the corpus type).
  • Standards anchors: W3C PROV-O for provenance, FAIR Data Principles for ingestion ethics, Schema.org ScholarlyArticle for paper metadata interop, SPAR Ontologies (FaBiO, CiTO, PRO) for bibliographic and citation modeling.

The capability is added to ontology/platform-self-model/model/instances.yaml under the standard cap-* naming, and the system capability model is regenerated.

Pipeline architecture — five stages

flowchart LR
    A[Source connectors<br/>HF Papers / arXiv / PDF / publisher API] --> B[Content store<br/>versioned in HVFS]
    B --> C[Structured extraction<br/>LLM + SHACL parsers]
    C --> D[Sift loop<br/>ADR-032 + HITL]
    D --> E[Omnology<br/>UFO &#124; BFO projections]
    E --> F[Forge re-emit<br/>ADR-029/043]
    F --> G[Emitted platform capabilities<br/>typed objects, queries, contracts]
    C -.provenance.-> P[PROV-O ledger]
    D -.provenance.-> P
    E -.provenance.-> P
    F -.provenance.-> P

Stage 1 — Source connectors. The HF Papers helper (tools/hf-papers.mjs) is the primary connector and the model for others. Direct arXiv, publisher-API (where licensed), bulk-PDF, and operator-upload paths are added incrementally. Each connector emits a normalized IngestedDocument record with content hash, source URI, license tag, and ingestion timestamp.

Stage 2 — Content store. Documents land in a content-addressed store keyed by content hash, with metadata in ArcadeDB and bodies versioned through HVFS when it lands (today, the file-first registry per ADR-071). Re-ingest of an already-stored hash is a no-op. The store is the durable source for re-extraction when parsers improve.

Stage 3 — Structured extraction. Each document passes through a chain of extractors that produce candidate ontology fragments — entities, reified relations (relator-vertices per ADR-016), competency-question answers, claims, definitions, methods, results, and bibliographic links. Extractors are LLM-mediated but constrained by SHACL shapes derived from the omnology: the LLM proposes, SHACL gates, and only shape-conformant candidates proceed. This is the same trick our ADR-030 data-to-ontology ingestion pipeline uses for arbitrary data — applied here to scientific prose.

Stage 4 — Sift loop. Candidates enter the existing ADR-032 sift loop: score against the omnology, surface human-decidable forks via HITL, accept / refine / reject. Crucially, the sift loop is where the disambiguator (ADR-046) earns its keep — papers from different domains will use the same term to mean different things (cell in biology vs. cell in spreadsheets vs. cell in cellular automata) and the disambiguator resolves the collision before merge.

Stage 5 — Omnology merge + forge re-emit. Ratified fragments merge into the omnology. The merge respects foundations-as-perspectives: a fragment authored in OntoUML stays OntoUML and projects to BFO/CCO where the source frames it; a fragment authored in BFO does the opposite. The forge (ADR-029 / ADR-043) re-emits over the changed omnology — byte-identical where unchanged, freshly emitted where the merge introduced new types/relations. New emissions surface as PRs through the existing forge PR-opener, and ratification proceeds through normal review.

Cross-cutting — provenance. Every transition is recorded as a PROV-O activity with wasDerivedFrom chains back to the source document. The ledger is queryable: "which papers grounded capability X?", "which papers contributed to the current omnology definition of Process?", "what literature would I need to re-read to defend the current state of the platform on topic Y?" — all become SPARQL queries.

A note on the term omnology, and the lineage we stand in

We use omnology in the sense Howard Bloom has articulated it for decades: not merely the study of everything, but the deliberate hunt for the patterns across the sciences — the fundamental commonalities of the theorems, the deep isomorphisms that the disciplinary silos hide from view. An omnologist, in Bloom's framing, is a generalist who reads across biology, physics, anthropology, music, theology, neuroscience, and history because the same shape appears in all of them — power laws, attractors, selection, symmetry-breaking, networks, hierarchies of emergence — and the disciplinary walls are an institutional convenience, not a property of nature. The discipline is older than the word: Leibniz, Aristotle, Ibn Sina, and the Encyclopédistes practiced it long before there was a name for the absence of a name.

The platform's mission, under this ADR, is not just to ingest the literature — it is to surface the cross-disciplinary patterns inside it. Ingestion is a means; pattern-finding across domains is the end. The omnology is valuable in direct proportion to how many cross-domain isomorphisms it makes queryable.

Bloom's term sits in a serious modern intellectual lineage that we acknowledge and align with:

  • E. O. Wilson — Consilience: The Unity of Knowledge (1998). The most widely cited formal articulation of the same vision: a single fabric of explanation spanning the natural sciences, social sciences, and humanities. Consilience and omnology are not synonyms but they are siblings — Wilson stresses convergent explanation, Bloom stresses ranging curiosity. We honor both.
  • Edgar Morin — Method (six volumes, 1977–2004), On Complexity (2008). Morin's la pensée complexe — complex thought — is the methodological backbone of transdisciplinary inquiry in the European tradition. His work names why the silos fail: complexity does not decompose cleanly along departmental lines.
  • Basarab Nicolescu — Manifesto of Transdisciplinarity (1996), Charter of Transdisciplinarity (1994, with Morin and Lima de Freitas). The formal disciplinary instrument for what Bloom calls omnology: explicit principles, levels of reality, the "included middle" between disciplines.
  • R. Buckminster Fuller — Operating Manual for Spaceship Earth (1969). Fuller's comprehensive anticipatory design science is the engineering-leaning sibling of the omnological project — generalist by deliberate practice.
  • The encyclopedist tradition — Diderot & d'Alembert's Encyclopédie (1751–1772) as the institutional ancestor; Leibniz's characteristica universalis (1666 onward) as the philosophical ancestor of every attempt to give cross-domain knowledge a uniform representation.

We treat all of the above as multi-word containers for what omnology names with one word. When communicating with audiences for whom omnology is unfamiliar, we use whichever container fits — transdisciplinary synthesis, consilient knowledge structure, unified meta-ontology under foundations-as-perspectives projection. For audiences who know Bloom's work, we say omnology and mean exactly what he means.

Category theory as the formal language of omnology

Howard Bloom saw, by sustained reading and pattern-recognition, that the same structures recur across the sciences. Category theory is the mathematics that makes that observation rigorous. If the omnological project has a native formal substrate, it is category theory — and this platform takes that bridge seriously.

Category theory was developed (Eilenberg & Mac Lane, 1945; Mac Lane, Categories for the Working Mathematician, 1971) to formalize one specific kind of statement: "this structure in domain A is the same shape as that structure in domain B." It does this through three primitive moves that are, not coincidentally, exactly what an omnologist does informally on every page of Bloom's books:

  • Functors — structure-preserving maps between categories. "Whatever true thing I can say about objects-and-arrows in biology, I can systematically translate into a true thing about objects-and-arrows in this thermodynamic model, because there is a functor between them." Functoriality is the formal name for what Bloom calls the deep commonality.
  • Natural transformations — principled maps between functors. "There are two ways to translate biology into thermodynamics, and there's a third structure — a natural transformation — that says how those two translations relate to each other systematically, not by coincidence." This is the formal name for what omnologists notice when two different cross-domain analogies turn out to be the same analogy seen from different angles.
  • Universal constructions and adjunctions — the recognition that a pattern is not just a pattern but is forced by the structure. "Selection in evolutionary biology, gradient descent in optimization, free energy minimization in physics, and Bayesian updating in cognitive science are not metaphors for each other — they instantiate the same universal construction." This is what makes omnological pattern-finding predictive, not just suggestive.

The platform already commits to this framing internally. ARC-ADR-033 — F# Ontology Compiler Core frames the compilation pipeline categorically: projections are functors (the UFO and BFO views of one omnology are functors out of the same base category, preserving congruence); generators are catamorphisms (folds over algebraic data types — the canonical categorical structure for tree-shaped emit); the gUFO ⟷ BFO bridge is a natural transformation (the disambiguator implements it computationally); the sift loop is monadic (effectful refinement composed via Kleisli arrows — accept/refine/reject as the monad's bind).

This is the bridge from Bloom's omnology to the platform's omnology: what Bloom does by reading widely and noticing patterns, the platform does by storing every ingested ontology fragment in a categorical structure where cross-domain functors and natural transformations are first-class queryable objects. The omnology is not just a knowledge graph — it is a knowledge graph whose edges include functors and natural transformations between domain ontologies, making "same-shape-as" a SPARQL-queryable relation.

The promise this gives us, concretely:

  1. Cross-disciplinary isomorphism is a first-class query. "What categorical structures appear in both X and Y?" becomes answerable, where X and Y are any two ingested domains. The omnology has the functors between domains as data, not as commentary.
  2. Patterns become predictions. When a universal construction is recognized in three domains, the omnology predicts its presence in adjacent domains and queues directed-ingestion competency questions to confirm or refute.
  3. The omnological discovery loop closes. An omnologist (human or computational) proposes a cross-domain isomorphism; the platform formalizes it as a functor candidate; the sift loop verifies, refines, or rejects; verified functors become permanent omnology axioms with PROV-O chains back to the contributing literature.

The deeper claim, and the reason this ADR treats category theory as more than a tooling choice: the omnological project and category theory have been pointed at the same target for eighty years, from opposite ends — Bloom from the empirical-synthetic side (read everything, notice the patterns), category theory from the formal-mathematical side (axiomatize what "same pattern" means). This platform is, as far as we can tell, the first attempt to wire those two ends together into a running system. If we are right that this is what they are, then every pattern Howard or any omnologist has noticed informally is a functor waiting to be formalized, and every formal categorical equivalence is a cross-disciplinary insight waiting to be cited from the literature that grounded it.

This is also why the platform's category-theoretic posture is not an aesthetic choice we might revise. It is load-bearing for the omnological mission.

DSRP as the cognitive scaffolding paired with the foundational ontologies

Before introducing the omnology's platform construct, we record a third layer of intellectual grounding that the hub owner has explicitly committed the platform to: DSRP — Distinctions, Systems, Relationships, Perspectives — the systems-thinking framework developed by Derek Cabrera and Laura Cabrera of Cornell University under National Science Foundation support (Cabrera & Cabrera, Systems Thinking Made Simple, 2015; ongoing NSF-funded research on DSRP-integrated educational systems).

DSRP names the four universal cognitive moves any thinker makes when organizing knowledge:

  • D — Distinctions. What is this, and what is it not? The act of drawing a boundary. In the omnology this surfaces as entity types, class hierarchies, and the SHACL shapes that gate ingestion.
  • S — Systems. What parts compose this, and what whole does this compose? The act of seeing structure across scale. In the omnology this surfaces as mereological relations, part-whole hyperedges (per ARC-ADR-016), and the holonic structure of the HUB.
  • R — Relationships. What connects this to that, and how? The act of seeing edges. In the omnology this is the reified n-ary relations (relator-vertices), the functors between domain ontologies, and the natural transformations between functors.
  • P — Perspectives. Whose view of this, from where? The act of seeing through a frame. In the omnology this is foundations-as-perspectives made literal: the UFO view, the BFO view, the domain-specialist view, and — as the platform grows — every contributing omnologist's view, recorded as a first-class projection.

The hub owner's essay Our Children's Education (Clarke, 2024) makes the integration explicit:

"Derek's model of distinctions, systems, relationships, and perspectives is universal in its ability to structure knowledge and foster deeper understanding. Paired with foundational ontologies like those championed by Barry Smith, Basic Formal Ontology (BFO) and Giancarlo Guizzardi, Unified Foundational Ontology (UFO), DSRP becomes not just a cognitive tool but a scaffolding for building interdisciplinary connections. And I see this architecture, undergirded by omnology, as the key to creating systems that span disciplines while remaining rooted in the historical and contextual basis of knowledge." — Nicky Clarke, 2024.

We adopt this stack as the platform's named architectural posture:

DSRP is the cognitive scaffolding. BFO + UFO are the foundational ontologies. Category theory is the formal substrate of cross-domain pattern. Omnology is the meta-disciplinary capstone where all of it converges. The forge materializes the convergence into running platform capability.

This is not a metaphor stack — it is a build pipeline with named owners at every layer. DSRP's four moves map onto the omnology's storage shapes; BFO/UFO ground every entity in one (or both) foundations; functors and natural transformations carry the cross-domain pattern Bloom and the omnologists saw informally; the forge emits typed code from the result.

The DSRP integration also commits the platform to the collaborative knowledge-building research tradition the same essay surveys (Stahl 2006; Hmelo-Silver & Barrows 2008; Pennington 2016; Beers & Bots 2007; Cress & Kimmerle 2008; Zhang et al. 2009; Claus & Wiese 2019; Zou et al. 2019). The sift loop is structurally an instance of this tradition's central claim: knowledge is built across collaborative relationships, not delivered through them.

Classics, historical contextuality, and a globally inclusive canon

A second commitment from the same essay: the humanities are not subordinate to the sciences in the omnology — they are a peer dimension, and historical contextuality is how the omnology resists becoming a frozen abstract structure detached from human heritage.

The hub owner's essay argues that the classics — Socratic dialogues, Homeric epics, Roman treatises, Confucian Analects, the Bhagavad Gita, the Mahabharata and Ramayana, the Epic of Gilgamesh, Murasaki Shikibu's Tale of Genji, African oral traditions, Aboriginal Australian storytelling — are load-bearing context for any system that claims to integrate knowledge. The platform's omnology will not be only a structure of formal scientific ontology; it will carry, as a first-class layer, the historical-contextual record of human thought across cultures, with the same provenance discipline as the scientific record.

This translates into three concrete platform commitments:

  1. Western and global classics as ingestion targets. When the science-domain waves complete their initial passes, a parallel humanities track ingests classical and global-classical texts and their scholarly commentary. Confucian ethics sits alongside Stoic ethics; the Bhagavad Gita's account of duty sits alongside Cicero's De Officiis; the Mahabharata's narrative cosmology sits alongside Homer's. The omnology gains the cross-cultural functors between them as queryable structure.
  2. Historical contextuality as a typed dimension. Every ingested concept carries not only its formal ontology grounding but its historical context — the period, the milieu, the audience the original work addressed. Following Ahrensdorf (1994) on Plato's historical milieu and How (2011) on Gadamer's hermeneutics, the omnology distinguishes the timeless claim from the period-specific claim within the same source — and the disambiguator surfaces the distinction at query time. The historicized study of classics (Irwin 2000; Evans & Midford 2021) becomes the omnology's discipline for the humanities track.
  3. No Eurocentric closure. The Western canon is honored but not privileged; expansion to global classical traditions is explicit, intentional, and structural. This is a value commitment recorded here so it can be audited later.

These commitments connect the omnology back to Bloom's own range, which has always traversed Western science, Sufism, Tibetan Buddhism, evolutionary biology, sociology, and music history without granting any of them precedence.

Relational dynamics — how human and computational omnologists work together

A third commitment from the essay, and the one that most shapes how the platform behaves: the bidirectional, collaborative student-teacher relationship is the model for human-computational omnologist collaboration. AI does not replace the human relationship that grows understanding; it removes the administrative burden that crowds out the relationship.

The hub owner's framing:

"AI can't replace the bond between a student and a teacher, but it can create conditions where that bond has room to grow. It can streamline grading, offer personalized feedback, and provide students with tools to explore on their own, empowering teachers to focus on guiding, mentoring, and inspiring." — Nicky Clarke, 2024.

The platform's HITL design (ARC-ADR-001) already commits to human-in-the-loop ratification. This essay clarifies what kind of human-machine interaction we are designing for:

  • The platform automates the burden, not the judgment. Sift-loop scoring, candidate-extraction, cross-paper consistency checks, provenance bookkeeping — these are the platform's work. Choosing what to read next, recognizing a new cross-disciplinary pattern, naming a new functor candidate, deciding what the omnology owes to a culture's heritage — these remain human work, augmented by but not delegated to the platform.
  • Lifelong intellectual relationships are first-class. The platform models the long-term collaborations between contributing omnologists — both human (Howard Bloom, the Cabreras, Barry Smith, Giancarlo Guizzardi, and every scientist or scholar the operator brings into the conversation) and computational (the agents, the disambiguator, the forge) — as durable, addressable relationships in the omnology itself, not transient sessions. Following Li Jun (2010), Boyles (2018), and Cropley (1977), the educator-learner pattern is the relational shape we use.
  • Co-creation, not delivery. Knowledge is built across the relationship, not delivered through it. The sift loop is structurally an instance of collaborative knowledge-building between the proposing party (human or LLM) and the ratifying party — every ratification adds to the shared structure.

The omnology, as platform construct

In this platform, the omnology is the concrete computational artifact that embodies the omnological project: the platform's evolving meta-ontology, the union of all natively authored and ingested domain ontologies under foundations-as-perspectives projection.

The omnology is not a replacement for OWL/SHACL/RDF — it is an OWL ontology materialized in Apache Jena Fuseki (ADR-019) with five properties that, taken together, justify treating it as a coherent object worth a single name:

  • Multi-foundational. Every concept can carry both a UFO/OntoUML stereotype (Guizzardi lineage, design-oriented) and a BFO/CCO grounding (ISO/IEC 21838-2 lineage, realist) where the source frames either. The disambiguator handles cross-foundation projection.
  • Domain-spanning. It is explicitly designed to hold concepts from every domain the platform ingests — software engineering, life sciences, materials, physics, social science, formal methods, business — and to surface the bridges across them. This is the implementation surface of consilience.
  • Self-aware. It contains the platform-self-model as a first-class fragment. The omnology models the platform that grows it. This recursion is the point: a learning system that does not model itself cannot deliberately improve.
  • Provenance-bearing. Every axiom has a W3C PROV-O chain back to the artifact (paper, operator note, agent reasoning trace) that introduced it. Show me every paper that grounds capability X is a SPARQL query, not a memory exercise.
  • Continuously re-emitted. Changes flow through the forge (ADR-029 / ADR-043) into typed code, SHACL shapes, SPARQL queries, API contracts, and platform capabilities. The omnology is not a document — it is a compiled artifact in a build pipeline. It does something.
  • Categorically structured. Beyond entities and relations, the omnology stores functors between domain ontologies and natural transformations between functors as first-class objects, per the framing above. "Where else does this pattern appear?" is a query, not an essay.

The omnologist, as platform role

An omnologist in the platform context is a role, not a job title: any human or agent contributor whose work spans disciplines and feeds the omnology's growth. The operator is an omnologist. Howard Bloom is an omnologist. Every scientist whose paper is ingested becomes, posthumously or in absentia, a contributing omnologist whose work is now part of the platform's reasoning. The agents that author, refine, disambiguate, and verify omnology fragments are computational omnologists. The role earns its own HITL Decision Pattern label — assignee:omnologist — for sift-loop decisions that require generalist judgment specifically (as opposed to specialist domain expertise).

Honoring the lineage in practice

Beyond attribution, we commit to four concrete practices that make this implementation worthy of the term:

  1. Bibliographic and conceptual credit follows every axiom. When the omnology's axiom-provenance ledger surfaces a concept derived from a paper, an essay, or a book that an omnologist of the lineage above authored, the citation is rendered prominently — not buried. Bloom, Wilson, Morin, Nicolescu, and their intellectual descendants are first-class citations in the platform.
  2. The platform's omnology is open for reading. The full omnology (sans private operator material) is published as RDF/Turtle at a stable URL. Other omnologists can read it, query it via SPARQL, and propose fragments. We will not own omnology; we will run an instance of it.
  3. The omnological project's open questions get sift-loop slots. When Howard or any contributing omnologist surfaces a question that the omnology cannot yet answer — "how does the music of the medieval troubadours relate to the social structure of the bonobo?" — the question enters the sift loop as a competency question (per Grüninger & Fox 1995) and drives a directed ingestion pass across the relevant domains. The platform learns in response to omnologists' questions, not only the operator's.
  4. The term omnology is used precisely. Inside the platform, omnology refers specifically to the platform's meta-ontology under foundations-as-perspectives projection. When we mean Bloom's broader programmatic vision, we say the omnological project. When we mean the discipline of practicing omnology, we say omnology, as a discipline. The platform's omnology is one implementation of the discipline, not a redefinition of it.

Science-domain expansion roadmap

The order matters. We start where the foundational ontology heritage is strongest, then expand. In the spirit of the omnological project, the long-run domain list is all of them — biology, physics, chemistry, materials, mathematics, computer science, social science, economics, anthropology, linguistics, history, music, religion, art, literature. The roadmap below is a priority ordering, not a scope limit. Every domain the platform's contributing omnologists care about is a candidate; the order reflects only where the engineering risk is lowest at each step.

Wave Domain Anchor Why this order
W1 — Life sciences BFO/OBO heritage is biomedical; CCO and the OBO Foundry give us decades of formalized ontology to stand on. OBO Foundry, BFO 2020, GO, ChEBI, Uberon Start where the road is paved. Validates the pipeline end-to-end against a mature ontology landscape before tackling thinner ones.
W2 — Formal methods & computer science Existing ADRs already cite this literature heavily. Verifiable safety, capability gating, LTL specs, neuro-symbolic AI. arXiv cs.LO/cs.AI/cs.PL, SPAR Closes the loop on our own AI/platform work — we ingest the literature that grounds our ADRs.
W3 — Materials & chemistry Mature ontology heritage (ChEBI, EMMO), high signal-to-noise in published methods. EMMO, ChEBI, Matterhorn Strong external structure; near-term commercial applicability.
W4 — Physics & mathematics Less ontology heritage but very clean formalisms. Strong fit for our category-theory framing of the F# compiler core (ADR-033). Mathematics Subject Classification, PhySH Tests whether our pipeline can ingest highly formal sources cleanly.
W5 — Social science, economics, organizational theory Sparse but ontology-receptive (e.g. FRBR, BIBO, org ontologies). Anchors the platform's business side of self-modeling. FIBO, W3C Org Ontology Closes the platform's understanding of the systems it operates inside.
W6 — Humanities, classics, and the historical-contextual layer Western and global classical canons — Socratic dialogues, Homeric epics, Confucian Analects, Bhagavad Gita, Mahabharata, Ramayana, Epic of Gilgamesh, Murasaki Shikibu's Tale of Genji, African oral traditions, Aboriginal Australian storytelling — plus their scholarly commentary. Historical contextuality as a typed dimension on every ingested concept. FRBR, BIBO, FaBiO, CiTO, CIDOC CRM (cultural-heritage), SPAR Connects the formal sciences to the heritage of human thought; resists Eurocentric closure by explicit canon expansion (Clarke, 2024). The omnology gains the cross-cultural functors between traditions as queryable structure.
W7 — Educational and pedagogical research DSRP-integrated educational systems (Cabrera NSF program); collaborative knowledge-building (Stahl 2006; Hmelo-Silver & Barrows 2008; Cress & Kimmerle 2008); interdisciplinary competency frameworks (Pennington 2016; Claus & Wiese 2019). DSRP framework, IMS Caliper Analytics, W3C Open Annotation The platform is intended, eventually, to support the kind of relational learning the hub owner describes in Our Children's Education. Ingesting the pedagogical research is how the platform becomes self-aware as a learning environment for its human collaborators.

Each wave produces:

  • A wave-charter ADR declaring the domain entry, the anchor ontologies, and the acceptance criteria for "ingested enough."
  • A set of competency questions the omnology must answer post-ingestion (per Grüninger & Fox 1995).
  • A measurable coverage delta — how much of the anchor ontology's terms are present in the omnology with provenance.

Governance — the sift loop, extended

The existing sift loop (ADR-032) handles ontology refinement under HITL. The research-ingestion pipeline extends it with three additional gates:

  1. Source-credibility gate — every paper carries a credibility score derived from: venue (peer-reviewed > preprint), citation count, replication evidence, retraction status. Below threshold = quarantined for explicit operator ratification; above threshold = enters sift loop normally.
  2. Cross-paper consistency check — when a new fragment contradicts an existing omnology axiom that traces back to a different paper, the disambiguator surfaces the conflict as a Decision Artifact (ADR-001) with both source papers attached.
  3. Reproducibility-required gate — claims that the platform will use to act (i.e., that flow through the forge into emitted capabilities) require either reproducibility evidence or explicit operator ratification of the unreproduced claim. Claims used only for background (bibliographies, ADR context) do not require this gate.

Provenance contract

Every emitted artifact derivable from research carries a provenance manifest — a small JSON-LD document linked via PROV-O wasDerivedFrom to the contributing source papers, the sift-loop decisions, and the omnology axioms involved. The manifest format:

{
  "@context": "https://www.w3.org/ns/prov-o.jsonld",
  "@id": "urn:untool:capability:cap-life-science-entity-resolution",
  "@type": "prov:Entity",
  "prov:wasDerivedFrom": [
    { "@id": "urn:untool:paper:arxiv:2310.08535", "@type": "prov:Entity" },
    { "@id": "urn:untool:siftDecision:2026-06-12-T14:23", "@type": "prov:Activity" }
  ],
  "untool:omnologyAxioms": [ "urn:untool:omnology:axiom:0001234" ],
  "untool:credibilityScoreAtIngest": 0.87,
  "untool:lastRatifiedAt": "2026-06-12T14:25:00Z"
}

This is the artifact reviewers and auditors interrogate when they ask "why does the platform believe this?".


Consequences

Positive

  • The self-building thesis becomes honest. The platform learns from the structured external record, not only from itself.
  • The operator's paper backlog becomes platform value. Every paper loaded, ratified, becomes a permanent increment of the platform's capability surface — not a document in a folder.
  • Science-domain expansion has a path. Each new wave is a well-scoped engagement, not a green-field rewrite.
  • Bibliographies maintain themselves. Ratified ingestion automatically refreshes the Research & References section's literature surface; ADRs gain an auto-linked "papers grounding this decision" footer.
  • Provenance becomes a queryable asset. "Show me every paper that grounds capability X" is a SPARQL query, not a memory exercise.
  • The disambiguator earns its keep. Cross-domain ingestion is the load-bearing use case for ADR-046.
  • The omnology becomes a real artifact, not aspirational. It will be measurable: term count, foundation coverage, domain breadth, provenance density.

Negative

  • Real engineering cost. This is a multi-quarter buildout — connectors, extractors, sift extensions, omnology storage, forge integration, provenance ledger. Initial waves will be operator-paced.
  • LLM hallucination risk in extraction. Constrained by SHACL and gated by HITL, but the extractor will produce false positives. The credibility and reproducibility gates limit blast radius; they do not eliminate it.
  • License surface grows. Ingesting research means tracking redistribution rights of derived ontology fragments. We mitigate by starting with open-access (arXiv preprints, OBO Foundry, OA-licensed papers) and deferring closed-access until license-engineering is in place.
  • Sift-loop human throughput becomes the bottleneck. Ingestion will produce candidates faster than any one human can ratify. We mitigate with credibility-based auto-promotion bands (high-confidence + reproducibility + non-contradictory = auto-accept with notification; everything else = sift-loop HITL).

Neutral

  • The forge interface does not change. The forge already takes an ontology and emits code. We are widening the source of changes to the ontology, not changing the forge's contract.
  • No new top-level surface. Crucible (the corpus → ontology surface) is the right surface. Published research is a corpus type; we extend Crucible, we do not introduce a sixth surface.
  • The HF Papers helper does not change. It remains the source connector. Stage 2 onwards is the new build.

Verification — how we'll know it works

A capability is real when its image manifest proves an explicit list of properties. For cap-research-ingest, the initial proof set is:

  • ingest-hf-paper-roundtrip — given an arXiv ID, the pipeline ingests, extracts ≥1 candidate fragment, and the candidate appears in the sift queue with full PROV-O provenance.
  • shacl-gates-malformed-extraction — an LLM extraction that violates the omnology's SHACL shape is rejected before reaching the sift queue, with a structured error.
  • disambiguator-catches-cross-domain-collision — a fixture pair (two papers using the same term for different concepts) produces a Decision Artifact, not a silent merge.
  • forge-reemits-on-omnology-change — ratifying a sift-queue candidate that introduces a new entity type triggers a forge run that produces a PR with the new typed object.
  • provenance-roundtrip — given an emitted capability artifact, the provenance manifest resolves to the contributing paper(s) and the sift-loop decision(s).

These are the container tiering image-manifest proves properties for the new research-ingest image.


Compliance with platform-wide principles

  • MECEcap-research-ingest is mutually exclusive with cap-data-ingest (the data→ontology pipeline of ADR-030). One ingests prose evidence about the world; the other ingests structured records of the world. Both terminate in the same omnology under the same sift loop, but their extractors and credibility gates differ.
  • Down-hierarchy delegation — the pipeline delegates to existing capabilities (sift, disambiguator, forge); it does not invent parallel surfaces.
  • Observable decisions — every sift decision and every credibility-gate decision is logged and queryable.
  • Contract-first AND mock-first (CLAUDE.md) — the pipeline emits an OpenAPI contract for the ingestion endpoint and an AsyncAPI contract for the platform.research.* event stream, both published + mocked in Postman from day one.
  • System-ontology vocabulary — the new capability and its components follow cap-* / ctr-* / repo-* naming per glossary.

External references

The omnological lineage

  • Howard BloomThe Lucifer Principle: A Scientific Expedition Into the Forces of History (Atlantic Monthly Press, 1995); Global Brain: The Evolution of Mass Mind from the Big Bang to the 21st Century (Wiley, 2000); The God Problem: How a Godless Cosmos Creates (Prometheus, 2012); How I Accidentally Started the Sixties (Rare Bird, 2019); and the essay tradition collected at howardbloom.net — including The Omnologist's Manifesto and Bloom's recurring articulations of omnology as a discipline. The originator of the term we adopt, and the long-time champion of the omnological project this ADR implements.
  • E. O. WilsonConsilience: The Unity of Knowledge (Knopf, 1998). The most widely cited formal articulation of the same unification project, from a different starting point (biology, sociobiology) and arriving at a sibling conclusion.
  • Edgar MorinLa Méthode (six volumes, Seuil, 1977–2004); On Complexity (Hampton Press, 2008); Seven Complex Lessons in Education for the Future (UNESCO, 1999). The methodological backbone of transdisciplinary thought.
  • Basarab NicolescuManifesto of Transdisciplinarity (SUNY Press, 2002, trans. Karen-Claire Voss); the Charter of Transdisciplinarity (with Lima de Freitas and Morin, 1994). The formal disciplinary instrument for transdisciplinary inquiry.
  • R. Buckminster FullerOperating Manual for Spaceship Earth (Southern Illinois University Press, 1969); Synergetics (Macmillan, 1975 / 1979). The engineering-leaning generalist tradition.
  • Diderot & d'AlembertEncyclopédie, ou dictionnaire raisonné des sciences, des arts et des métiers (1751–1772). The institutional ancestor of cross-domain knowledge synthesis.
  • G. W. Leibnizcharacteristica universalis and calculus ratiocinator programmes (1666 onward). The philosophical ancestor of every attempt to give cross-domain knowledge a uniform formal representation — including, recursively, this one.

Provenance, FAIR, and bibliographic standards

Anchor ontologies for the science-domain waves

Source connectors

DSRP, collaborative knowledge-building, and the educational tradition

The hub owner's essay Our Children's Education (Clarke, 2024) integrates the DSRP framework with the foundational ontologies and omnology — and grounds the platform's collaborative-knowledge-building posture in the following literature, which we adopt as primary references:

  • Nicky ClarkeOur Children's Education (LinkedIn, 19 November 2024). The hub owner's articulation of the DSRP + BFO/UFO + omnology + classics + relational AI stack as the architecture for both education and the platform.
  • Derek Cabrera & Laura CabreraSystems Thinking Made Simple: New Hope for Solving Wicked Problems (Plectica, 2nd ed., 2018) and ongoing NSF-funded research on DSRP-integrated educational systems via Cabrera Research. The foundational source for DSRP (Distinctions, Systems, Relationships, Perspectives).
  • Gerry StahlGroup Cognition: Computer Support for Building Collaborative Knowledge (MIT Press, 2006). The dual nature of knowledge-building as individual and socio-cultural — anchors the sift-loop's collaborative-knowledge-building framing.
  • Cindy Hmelo-Silver & Howard BarrowsFacilitating Collaborative Knowledge Building (Cognition and Instruction 26(1), 2008, pp. 48–94).
  • Deana PenningtonA Conceptual Model for Knowledge Integration in Interdisciplinary Teams (Journal of Environmental Studies and Sciences 6(2), 2016, pp. 300–312).
  • Pieter Beers & Pieter BotsEliciting Conceptual Models to Support Interdisciplinary Research (Journal of Information Science 35(3), 2007, pp. 259–278).
  • Ulrike Cress & Joachim KimmerleA Systemic and Cognitive View on Collaborative Knowledge Building with Wikis (IJCSCL 3(2), 2008, pp. 105–122).
  • Jianwei Zhang, Marlene Scardamalia, Richard Reeve & Richard MessinaDesigns for Collective Cognitive Responsibility in Knowledge-Building Communities (Journal of the Learning Sciences 18(1), 2009).
  • Andrea Claus & Bettina WieseDevelopment and Test of a Model of Interdisciplinary Competencies (European Journal of Work and Organizational Psychology 28(2), 2019, pp. 191–205).
  • Xiaoping Zou, Shi Zou & Xiaowei WangThe Strategy of Constructing an Interdisciplinary Knowledge Center (2019).

Classics, historical contextuality, and the humanities track

Adopted under the W6 humanities ingestion wave, following Clarke (2024):

  • Peter J. AhrensdorfThe Question of Historical Context and the Study of Plato (Polity 27(1), 1994, pp. 113–135). The model for distinguishing timeless from period-specific claims within a single classical source.
  • Alan R. HowHermeneutics and the 'Classic' Problem in the Human Sciences (History of the Human Sciences 24(4), 2011, pp. 47–63). Gadamerian hermeneutics on classics as dual artifacts — historical and perennially insightful.
  • Andrew IrwinHistorical Case Studies: Teaching the Nature of Science in Context (Science Education 84(1), 2000, pp. 5–26). The historicized study of scientific knowledge — adopted as the methodological discipline for the humanities track.
  • Robyn Evans & Sarah MidfordTeaching Historical Literacies to Digital Learners via Popular Culture (Arts and Humanities in Higher Education 21(3), 2021, pp. 285–301).
  • ConfuciusAnalects; Vyasa (attrib.)Mahabharata and Bhagavad Gita; Valmiki (attrib.)Ramayana; AnonymousEpic of Gilgamesh; Murasaki ShikibuThe Tale of Genji; HomerIliad, Odyssey; PlatoRepublic; CiceroDe Officiis; Marcus AureliusMeditations; the African oral traditions; the Aboriginal Australian storytelling traditions. These are target sources for ingestion under W6, not commentary references — the omnology will carry them as first-class entries.

Student–teacher relational dynamics

Grounds the platform's HITL philosophy in the educational research tradition:

  • Li JunOn the Construction of a Good Teacher-Student Relationship (2010).
  • Lauren BoylesEducational Teamwork: Making Lifelong Changes Through Student & Teacher Collaboration (2018).
  • Arthur J. CropleyLifelong Education and the Training of Teachers (Pergamon, 1977). The earliest sustained argument that lifelong learning requires lifelong teaching, including teachers continuously learning themselves.

Ontology engineering references

Category theory — the formal substrate of cross-disciplinary pattern

  • Samuel Eilenberg & Saunders Mac LaneGeneral Theory of Natural Equivalences (Transactions of the AMS, 1945). The founding paper. Introduces categories, functors, and natural transformations as the language for stating "structure-preserving" rigorously.
  • Saunders Mac LaneCategories for the Working Mathematician (Springer, 2nd ed., 1998). The standard graduate reference.
  • Steve AwodeyCategory Theory (Oxford, 2nd ed., 2010). The accessible, formal introduction we cite as our practitioner reference.
  • Bird & de MoorAlgebra of Programming (Prentice Hall, 1997). Catamorphisms, anamorphisms, and the categorical structure of program transformation. Anchors ADR-033.
  • David I. SpivakCategory Theory for the Sciences (MIT Press, 2014). The single most readable bridge from category theory to scientific knowledge representation; explicit treatment of ontologies as categories and ologs as a graphical syntax. Recommended starting point for any omnologist or scientist who wants the categorical view of cross-domain pattern.
  • Brendan Fong & David I. SpivakAn Invitation to Applied Category Theory: Seven Sketches in Compositionality (Cambridge, 2019). Open access. Covers databases, signal flow, networks, and resource theories through the categorical lens — directly relevant to the platform's multi-domain ingestion.
  • Robert RosenLife Itself (Columbia, 1991); Essays on Life Itself (Columbia, 2000). Categorical biology; modeling relations and (M,R)-systems. The single most direct prior attempt to use category theory as the formal language of cross-disciplinary biology, with implications Bloom's omnology gestures at empirically.

Internal references


Acknowledgement of the moment, and of the lineage

This ADR is consciously authored as a strategic inflection — the moment the platform turns from a self-reasoning system into a system that learns from the structured record of human knowledge across all fields. The operator has the corpus to feed it. The fleet has the substrate to absorb it. The science domains are the obvious next ontological frontier. From this ADR forward, the platform reads the literature with us — under sift-loop governance, with PROV-O provenance, materialized through the forge, surfaced as capability.

On naming. We adopt the word omnology in the sense Howard Bloom has articulated it for decades — the deliberate hunt for the patterns across the sciences, the deep commonalities of the theorems. We are honoring the term by giving it computational substance, not appropriating it. To the broader community of omnologists, consilience-minded scientists, transdisciplinary scholars, generalists, complexity-thought practitioners, and applied category theorists — including, gratefully, Howard himself and the thousands of scientists in his circle — this platform is offered as one implementation of the omnological project, with the hope that it serves the long inquiry rather than enclosing it. The omnology will be open for reading. Its open questions will be sift-loop slots. Its citations will credit the lineage prominently. The work of unifying knowledge does not belong to any one tool or any one team; we are simply trying to be useful.

The bridge we propose. Bloom showed, by reading widely for decades, that the patterns are there. Eilenberg and Mac Lane showed, by axiomatic mathematics, that there is a rigorous language for what "the same pattern" means. We claim — and this ADR commits the platform to — that these two ends meet: every pattern an omnologist notices informally is a functor candidate waiting to be formalized; every formal categorical equivalence is a cross-disciplinary insight waiting to be cited from the literature that grounded it. The omnology is the place those two streams converge into a single queryable structure.

Science as fuel. Category theory as the formal substrate. DSRP as the cognitive scaffolding. BFO and UFO as the foundational grounds. The classics — Western and global — as the historical-contextual heritage. The omnology as the meta-ontology that holds it all together under foundations-as-perspectives projection. The omnologist — human and computational — as the role that grows it. The student–teacher relational dynamic as the model for human-machine collaboration. The platform as the learner.

And the lineage as the shoulders we stand on: from Leibniz, Diderot, and Confucius — through Eilenberg and Mac Lane, Rosen and Spivak — through Stahl, Hmelo-Silver, and the collaborative-knowledge-building tradition — through the Cabreras' DSRP — through Barry Smith and Giancarlo Guizzardi on BFO and UFO — through Fuller, Wilson, Morin, Nicolescu — through Howard Bloom — through Nicky Clarke's Our Children's Education (2024) which named the integration this ADR formalizes — and through every reader, every teacher, every scientist, every student who has noticed that the same shape keeps coming back across the disciplines and across the cultures.