Skip to content

Persistence & Time

A research/reference page for how the untool.ai platform persists data, how it stamps time, and how it lets the same facts be re-shaped for many access patterns without ever rewriting the source of truth.

This page is the persistence + time companion to the ontology and coordination pages. It covers the temporal stance (ADR-042), the warehouse methodology (Data Vault 2.1, ADR-026), the wire format (Apache Arrow, ADR-009), the pace-layered projection model (ADR-041), the unified process-and-time architecture (ADR-038), and the hybrid object query system (ADR-055) — plus the supporting standards: Snodgrass bitemporal SQL, ISO SQL:2011 system-versioned tables, Kimball dimensional modeling, Arrow, Parquet, and W3C PROV-O.

One-line summary

Stamp every record once at write time, store it in whichever shape the access pattern needs, and let pace-layered projections rebuild the fast layers from the slow source of truth. Time is a contract, not a store; the data vault is the slow layer; information marts and the operational graph are the fast layers.


1. The temporal stance — "stamp first, store by access pattern"

ARC-ADR-042 is the supreme law of time on this platform. It answered an audit question that surfaces on every platform that grows past one database: "now that we have a time standard, do we need a dedicated time-series engine?" The answer was a polite, evidence-backed no.

1.1 The reframe

"This was never a time-series store decision. It is a time-stamping decision." — the five-seat panel that ratified ADR-042 (2026-05-30)

A time standard exists to decouple order from storage. Once every record carries a trustworthy temporal envelope at the moment it is born, the choice of where the bytes live becomes a cheap two-way door. The store may change; the time stays correct.

1.2 The trade-space — bitemporal as the baseline

The intellectual backbone is the bitemporal model from Richard Snodgrass's Developing Time-Oriented Database Applications in SQL (Morgan Kaufmann, 1999) and the earlier work of C. J. Date, Hugh Darwen, and Nikos Lorentzos in Temporal Data and the Relational Model (Morgan Kaufmann, 2003). Two axes are kept separate and never collapsed:

Axis Meaning Synonyms
Transaction time When the system knew the fact (the row's birth, never edited) system time, recorded-at
Valid time When the fact is true in the world (independent of when we learned it) application time, business time, world time

This baseline is now codified in ISO SQL:2011 as system-versioned tables (transaction time) and application-time period tables (valid time). Both PostgreSQL (via the tstzrange + EXCLUDE … WITH && pattern) and SQL Server (via SYSTEM_VERSIONING = ON) implement subsets.

ADR-042 layers two more axes on top of the bitemporal pair for distributed-system honesty:

  • Decision time — when a human or agent made the judgment that produced this assertion (separate from when we recorded it).
  • Process time — when a workflow step executed (separate again — DBOS-durable runs inherit a different clock than the actor that scheduled them).

A Hybrid Logical Clock (HLC, from Kulkarni et al., Logical Physical Clocks and Consistent Snapshots in Globally Distributed Databases, 2014) gives causal order across containers without requiring tight wall-clock sync. NTP/chrony is the physical baseline; the HLC is the ordering contract.

Why bitemporal first, store choice second

Wall-clock skew can invert causation permanently. If you persist before the HLC seam is wired, every row carries skew-prone time forever and the audit trail is unrecoverable. The one near-irreversible move in ADR-042 is wiring the stamp before the first persist. Everything else is reversible.

1.3 Three classes, three homes

ADR-042 split "time data" into three classes that are routinely conflated and routes each to the store that already owns its access pattern:

Class What Home Why
A — Ops time metrics, span latency, skew SLI OTel / Prometheus / Tempo / Loki (ARC-ADR-010) This is a TSDB. Never a domain store.
B — Semantic bitemporal ledger pinned elements, relators, valid + transaction time append-only ledger behind the IPinStore / ISerializationClock seam The actual contested store. Decided on a measured trigger.
C — Value/drift telemetry pace-layer ICEs (Information Content Entities), drift notes NATS JetStream log → materialized view An ordered causal event log.

The crucial discipline is the anti-dual-write invariant: the temporal envelope is written once by the stamping authority; every other store copies only the slice it owns and never back-writes another store's slice. This is enforced as a contract row, not folklore.

The append-only invariant is enforced with an insert-only DB role, not developer discipline. Every world-time assertion is written once, never UPDATEd, and closed by superseded_at. Violate it once and the audit trail is unrecoverable.


2. Data Vault 2.1 — the slow layer of the warehouse

ARC-ADR-026 adopts Data Vault 2.1 (Daniel Linstedt, Michael Olschimke, Building a Scalable Data Warehouse with Data Vault 2.0, Morgan Kaufmann 2015 + the 2.1 supplement from the Data Vault Alliance) as the enterprise warehouse methodology for every analytics-bearing spoke. The methodology is loader-agnostic; the reference loader is dbt + Datavault4dbt.

2.1 The three layers

SOURCE SYSTEMS (CRM, billing, events, files, APIs)
       │  ELT  (dlt, Fivetran, Kafka Connect, custom)
       ▼
┌─────────────────┐
│   Raw Vault     │ insert-only · history-preserving · hash-keyed
│  hubs/links/sat │ load_date + record_source on every row
└────────┬────────┘
         │ business rules in dbt models
         ▼
┌─────────────────┐
│ Business Vault  │ same DV constructs · _bv suffix
│ same-as · PIT   │ computed sats · bridge tables
└────────┬────────┘
         │ virtualize where possible, materialize where SLAs demand
         ▼
┌─────────────────┐
│ Information     │ star · snowflake · OBT · graph
│ Marts           │ shaped per consumer
└─────────────────┘

2.2 The three constructs

Construct Holds Audit columns Hash columns
Hub a unique business key (e.g. customer_id) and the moment it first appeared load_date, record_source <hub>_hk (SHA-256 of the canonical key)
Link the association between two or more hubs (e.g. order_customer) load_date, record_source <link>_hk (SHA-256 of concatenated hub keys)
Satellite descriptive context for a hub or link, change-tracked over time load_date, load_end_date, record_source, hash_diff <sat>_hk (inherits parent), <sat>_hd (hash of payload)

A hub is the noun. A link is the verb. A satellite is the adjective stream over time. That separation — and only that separation — gives the warehouse its three superpowers: schema-on-write integration, full audit by default, and parallel idempotent loads.

2.3 Raw vault vs business vault

  • Raw vault is sacred. Insert-only. Never edited. Never deleted. It is the integration layer — its job is to faithfully record what each source said, when it said it, exactly as it was said. No business rules. No transforms beyond hash-keying and column passthrough.
  • Business vault is where interpretation lives. Same hub/link/sat constructs with a _bv suffix, but the columns are computed. Same-as links resolve identity. PIT (Point-In-Time) tables snapshot the "current as of T" state. Bridge tables flatten link-chains for query.

This split is the warehouse equivalent of ADR-041's pace separation: raw vault is the slow layer (changes only when the source schema changes), business vault is the medium-fast layer (changes when interpretation changes), information marts are the fast layer (change when the consumer's question changes).

2.4 Why DV fits a sandboxed-spoke fleet

The fleet's spokes are sandboxed — each runs in its own container with no shared filesystem, no cross-mount, and writes only through contracts. Data Vault's parallelism is the exact match: every producer writes its own satellite for its hub/link participation, keyed by SHA-256 of the business key. There is no shared sequence, no surrogate-key coordination, no write-side contention. Two spokes loading "the same" customer record from two different source systems land in two different satellites attached to the same hub. Identity reconciliation happens later, in the business vault, via same-as links.

Same-as links — the identity-resolution primitive

A same_as_link records the assertion "hub X in CRM and hub Y in billing are the same real-world entity." It is itself a hub-pair with a satellite carrying the rule that produced the assertion, a confidence, and a valid-time window. Identity is declared, not assumed, and the declaration is itself bitemporal.

2.5 Streaming as a first-class load shape

DV 2.1 (vs 2.0) formalizes streaming. Late-arriving keys, micro-batch and continuous loads, and idempotency under retry are first-class methodology concerns. The platform's NATS JetStream bus (ARC-ADR-022) is the streaming substrate; CloudEvents v1.0 is the envelope; an idempotency key (the message ID) plus the hub's hash key together make every load deterministic on replay.


3. Hash keys & hash diffs

tools/data-vault/hash.mjs and its Python sibling implement the canonical hashing rules. Hash-based business keys are the rule, not surrogate sequences, for four reasons:

  1. Parallel load. A SHA-256 of the canonical business key is computable independently in every loader, on every node, with no central coordination. Surrogate sequences require a lock-step counter, which is the antithesis of horizontal scale.
  2. Idempotency. Reloading the same row produces the same hash, so dedupe is implicit. With sequences, replay creates a new surrogate every time and you need an out-of-band reconcile.
  3. Cross-system referential integrity. Two systems hashing the same business key produce the same hub key, with no negotiation. Sequences can't do this without a central registry.
  4. Reproducibility. A reload from cold storage produces the same warehouse, byte for byte (modulo load_date). With sequences, the warehouse is path-dependent.

3.1 SHA-256 over MD5

DV 2.0 sometimes specified MD5. DV 2.1 (and ADR-026) require SHA-256:

  • MD5 is cryptographically broken (collision attacks since 2004). For business keys this rarely matters in practice, but the cost differential is tiny and the collision surface matters at planet-scale warehouses.
  • SHA-256 is LENGTH 64 (hex) — about 2× MD5's 32. On a billion-row hub the extra 32 bytes per row is ~32 GB, which on cloud storage is rounding error.
  • All target warehouses (Snowflake, Postgres, BigQuery, Databricks) have a native SHA-256. No vendor lock-in.

3.2 Canonical ordering, normalization, separator, null sentinel

A hash is only as deterministic as its input. ADR-026's hashing rules:

Rule Value
Field order Canonical (alphabetical by attribute name, source-system-independent)
Unicode NFC normalization before hashing
Separator between fields \|\| (double pipe; configurable but consistent within a model)
Null sentinel ^^ (configurable; never the empty string, which is a valid value)
Trimming Trailing whitespace trimmed; leading whitespace preserved
Casing UPPER() on business keys; payload casing preserved

The hash_diff on a satellite is the SHA-256 of the payload (descriptive columns), so a satellite load can decide "is this a new version?" in a single hash comparison. Cheap on every warehouse engine; portable across them.


4. ArcadeDB — multi-model engine for the platform plane

The platform plane (the slow layer for live operational reads) runs on ArcadeDB, the multi-model engine that speaks document + graph + key- value + time series + vector in one process. The fleet runs it locally (Docker container in templates/local-stack/) and on Azure ACI (rg-arcadedb-test).

4.1 Why one multi-model engine, not three specialist engines

The naive instinct is: Neo4j for graph, Postgres for relational, Pinecone (or pgvector) for vectors. ADR-026 and ARC-ADR-055 both push back on this for the platform plane:

  • Cross-store joins are the dominant query pattern for an agent stack. A search like "give me the documents semantically similar to X, in the same project, authored by anyone who has reviewed Y" spans vector + graph + relational. Three engines means three round-trips, three transaction boundaries, three failure modes.
  • Operational tier budget. Per ARC-ADR-023, every stateful store is backup / HA / upgrade / monitoring surface. Three stores is three upgrade cycles and three on-call runbooks. One is one.
  • Schema congruence. Multi-model means one schema (the ontology's RDF/LPG projection) maps to one engine. Multiple engines means multiple shadow schemas drift apart.

ArcadeDB's trade-off: it is good enough at every model, best in class at none. That trade is correct for the platform plane (the operational read-and-write layer for live agents). For the analytics plane, the data vault sits on a real OLAP warehouse (Snowflake / BigQuery / Databricks), and that's where star-schema marts get materialized for BI.

4.2 ADR-055 — the hybrid object query system

ARC-ADR-055 decided the read seam: the Universal Data Adapter (UDA) orchestrates a three-phase retrieval pipeline:

  1. Vector phase — semantic similarity over Chunk elements in the vector store.
  2. Graph phase — context expansion (e.g. fetch the chunk's Document, its parent Repository, related Decision references).
  3. Relational/axiomatic phase — security clearance, project scope, SHACL/OWL constraints.

The three result sets are merged via Reciprocal Rank Fusion (RRF, Cormack et al. 2009) and hydrated into IHyperElement Object Model instances. Agents never call databases directly; they invoke Search(QueryString) on the UDA and receive a fully entity-resolved graph sub-network.

4.3 Multi-model also means multi-temporal

Time fits ArcadeDB's mixed-mode posture: the bitemporal ledger (Class B from ADR-042) lives as vertices with valid_from / valid_to / recorded_at / superseded_at properties, with relator vertices (ARC-ADR-016) carrying time-indexed roles. Time-series buckets are not used — that was an explicit ADR-042 decision. ArcadeDB's TS-bucket feature is its least-proven; metrics belong in Prometheus.


5. Apache Arrow — the canonical wire format

ARC-ADR-009 settled the type vocabulary across the Universal Data Adapter: every connector normalizes to Apache Arrow record batches at the boundary.

5.1 Why Arrow over JSON or Protobuf

Dimension Arrow JSON Protobuf
Representation Columnar Row (text) Row (binary)
Zero-copy reads Yes No No (deserialization required)
Cross-language C++, Rust, Python, Java, Go, JS, C#, Julia Universal text Code-gen per language
Decimal / temporal precision First-class (decimal128, timestamp(unit, tz)) Lossy (strings) Schema-dependent
Nested / repeated types First-class (struct, list, map) Native Schema-dependent
Analytical scan cost Optimal (vectorized) Worst Middling
Streaming chunks Native (RecordBatch) Manual framing Manual framing

The decisive driver is analytical workload shape. The UDA's primary load is bulk reads from BigQuery, Snowflake, Postgres, and Parquet object stores. JSON imposes row-by-row parse and string-typed decimals/timestamps; Protobuf imposes per-message deserialize. Arrow's columnar batches are read once, scanned vectorized, and shipped over Arrow Flight RPC with zero copy.

5.2 Arrow + ADBC + dlt

The connector substrate is ADBC (Arrow Database Connectivity) — the Arrow-native answer to JDBC/ODBC. Drivers exist for Postgres, BigQuery, Snowflake, DuckDB, and SQLite, all returning Arrow record batches directly without an intermediate row-tuple layer.

For ingestion, dlt (Data Load Tool) is the framework of choice. dlt pipelines emit Arrow batches to staging, where the data vault loaders pick them up. The whole flow stays columnar from source to staging to raw vault.

5.3 Arrow + Parquet — the cold-storage twin

Apache Arrow's in-memory format is paired with Apache Parquet for on-disk storage. The two are designed by overlapping committees: Arrow batches deserialize from Parquet column chunks with near-zero overhead. The data vault's staging layer is typically Parquet in object storage (Azure Blob, S3, GCS), giving cheap cold storage that the raw vault loader can scan vectorized at high throughput.


6. Pace-layered projection — Stewart Brand as architecture

ARC-ADR-041 borrows pace layering from Stewart Brand's The Clock of the Long Now (Basic Books, 1999) and earlier in How Buildings Learn (Viking, 1994). The thesis: a healthy complex system has layers that change at different speeds, loosely coupled so the fast layers can churn without destabilizing the slow ones.

Pace Layer Cadence Job
Slow Canonical ontology (RDF + OWL + SHACL) Quarterly Meaning, rules, rigor
Slow Raw data vault Source-schema changes only History, integration, audit
Medium Business vault Interpretation changes Same-as resolution, computed sats
Medium LPG CANON zone Slow-layer changes ratchet down Operational graph reads
Fast Information marts Per-question, per-sprint BI, ML features, OBT
Fast LPG FRONTIER zone Operator-namespaced extensions Operational metadata, drift telemetry
Faster Trace + metric layer Per-second Observability

6.1 Down-projection (slow → fast)

The canonical RDF is projected deterministically into the LPG CANON zone. Every LPG node carries the source iri; every LPG edge is the binarized projection of a relator vertex. The projection is asymmetrically lossy — binarized edges drop the time index — so all relation reasoning stays in RDF (and in ArcadeDB's relator vertices), never in the LPG shadow.

The information marts work the same way for the analytics plane: PIT tables, bridge tables, and star schemas are projections of the raw + business vault, materialized when an SLA demands it, virtualized otherwise. The marts can be rebuilt at any time from the vault; the vault never gets rebuilt from the marts.

6.2 Up-graduation (fast → slow) — the gated ratchet

The harder direction is up: when useful structure emerges in the fast layer (an operator adds an ops: property, a vector search consistently surfaces a cluster, a same-as link is proposed by a heuristic), can it graduate into canon?

ADR-041's answer is the two-stage graduation:

  1. Nominate (fast) — a statistical signal flags a candidate. Popularity, frequency, RRF rank — any fast measurement may nominate.
  2. Classify → dispose (slow) — only an ontological classifier (a slow-layer agent running the ARC-ADR-032 sift-sort loop) may decide which BFO/UFO category the candidate joins. Categories have disjointness axioms; mis-categorization is rejected, not silently flattened into "class."

This is the platform's enforcement of the propose-dispose maxim: operations propose meaning; the formal layer disposes; nothing auto-promotes. Popularity nominates; classification decides.


7. Unified process & time architecture (ADR-038)

ARC-ADR-038 is the binding layer that fuses the four time axes (transaction, valid, decision, process) into one canonical temporal envelope rides on every CloudEvent, every pin, every workflow step.

7.1 The five-axis envelope

{
  "transaction_time": "2026-06-06T12:34:56.789Z",  // when the system recorded it
  "valid_time": {                                   // when it is true in the world
    "from": "2026-06-01T00:00:00Z",
    "to":   "2026-06-30T23:59:59Z"
  },
  "decision_time": "2026-06-06T12:30:00Z",          // when the human/agent judged
  "process_time": "2026-06-06T12:34:55.000Z",       // when the workflow step ran
  "hlc": "2026-06-06T12:34:56.789Z|42|node-7"       // causal-order tag (HLC)
}

The HLC tag combines a physical timestamp, a logical counter, and the originating node ID (Kulkarni et al. 2014). Two events with identical wall-clock timestamps still have a strict total causal order via the counter; the counter is bounded (it resets when the wall clock advances), so it is not a Lamport clock that grows forever.

7.2 Process time and event-sourcing

The "process time" axis is inspired by event-sourcing (Greg Young, Martin Fowler, Event Sourcing) and the CACAO 2.0 playbook standard (OASIS), along with BPMN 2.0 (OMG). A workflow step has a process time (when DBOS-durable execution actually ran it) that is distinct from the transaction time of the pin it produced and the decision time of the human/agent that scheduled it. Conflating them destroys the audit trail.

The fleet's durable runtime is DBOS Transact (ARC-ADR-018) — durable workflows with checkpoint/replay, durable queues, scheduled workflows, and list / cancel / resume / fork. BPMN/CACAO playbooks parse to one intermediate representation, executed by a ~400-line kernel inside DBOS (ARC-ADR-031).

7.3 Provenance as a primitive — W3C PROV-O

Time alone is not provenance. The platform encodes provenance per the W3C PROV-O standard (W3C Recommendation 2013) with OWL-Time (W3C Recommendation 2022) for temporal intervals. Every write to the bitemporal ledger emits a PROV-O triple linking the activity (the workflow step), the agent (the human or AI), and the entity (the pin) to the temporal envelope. This is what makes "evidence as a primitive" (ADR-041) more than a slogan.


8. Information marts — the fast consumption layer

The vault's job is integration and history. Information marts shape that history for particular consumers. Four mart patterns are first-class in ADR-026:

8.1 Kimball star schema

Ralph Kimball's The Data Warehouse Toolkit (3rd ed., Wiley 2013) is the canon for dimensional modeling. A star schema is one fact table (the measure events) surrounded by dimension tables (the descriptive context, with slowly-changing-dimension Type 2 history). Fast for BI tools, easy for analysts, well-supported across every OLAP engine.

In a DV 2.1 architecture, the star is built from PIT and bridge tables in the business vault. The mart materializes when an SLA demands it; otherwise it stays a virtual view.

8.2 Snowflake schema

A snowflake is a star with normalized dimensions (dimension tables that point to sub-dimension tables, instead of flattened). Use it when a dimension is genuinely multi-level and the duplication cost of flattening exceeds the join cost of normalizing. In practice, BI tools prefer flat stars; snowflakes show up most often as an intermediate build step.

8.3 OBT — One Big Table

The OBT mart is one denormalized table with every column the consumer needs, joined and flattened at build time. It is the dominant pattern for modern columnar warehouses (Snowflake, BigQuery, Databricks) because columnar compression and partition pruning make the storage cost trivial and the query cost optimal. ML feature stores especially favor OBT.

8.4 Graph projection

The fourth mart pattern is the graph projection: hubs become nodes, links become edges, satellites become time-indexed property streams. This mart materializes into the operational graph on the platform plane (ArcadeDB) and is the input to the LPG CANON zone (ADR-041).

8.5 Materialize vs virtualize

The methodology choice is not "which mart shape?" — it is all four, as consumer demand warrants. The methodology choice is materialize vs virtualize:

Choice When Cost
Virtualize (view) Latency SLA permits compute on read; data volume moderate Compute per query
Materialize Latency SLA strict; query frequency high; data volume large Storage + refresh job

The default is virtualize. Materialize only when measurement says virtualization fails the SLA.


9. Vector + embedding storage

9.1 The embedder

The fleet's canonical embedder is Cohere embed-v-4-0 served via Azure AI Foundry (fndry-01 in subscription AASub1), producing 1536-dimensional dense vectors. The LLM Gateway (ARC-ADR-021) provides a single REST surface so spokes never hardcode a vendor SDK.

A local-embedder image is pinned at hub issue #184 — a CPU/NPU/iGPU local model serving embeddings without the Foundry round-trip, for offline development and for the platform plane when sub-100ms embedding latency matters.

9.2 Storing vectors — property vs sidecar

Two patterns coexist, chosen by access pattern:

Pattern A — vector as ArcadeDB property. The vector is stored as a LIST<FLOAT> or ARRAY<FLOAT> property on the node. Search is via ArcadeDB's HNSW index or via in-process cosine. Best for: small-to-medium corpora (≤ 1M vectors), tight graph-context queries (the node-and-its-vector come back in one read).

Pattern B — sidecar vector index (pgvector / Chroma). The vector lives in a dedicated ANN-optimized store; only an opaque ID joins back to the graph. Best for: large corpora, specialized ANN algorithms (IVF-Flat, IVF-PQ), independent scale.

ADR-055's hybrid query system orchestrates both patterns under the same Search() seam. The UDA's query planner decides per query which substrate to hit, then merges via RRF. The caller never sees the topology.

9.3 Embedding versioning

A neglected but critical point: every stored vector is the output of a specific embedder version. When the embedder upgrades (Cohere ships v5; the local model retrains), the vector space shifts and old vectors are no longer comparable to new ones. The platform records the embedder version as a satellite attribute on the embedded entity, and re-embeds on a versioned schedule rather than mixing vector generations in the same index.


10. Standards & references

10.1 Temporal data

  • Snodgrass, R.T. (1999). Developing Time-Oriented Database Applications in SQL. Morgan Kaufmann. PDF
  • Date, C.J., Darwen, H., Lorentzos, N. (2003). Temporal Data and the Relational Model. Morgan Kaufmann.
  • ISO/IEC 9075:2011 (SQL:2011) — system-versioned tables, application-time period tables. ISO catalog
  • Kulkarni, S. et al. (2014). Logical Physical Clocks and Consistent Snapshots in Globally Distributed Databases. PDF
  • Lamport, L. (1978). Time, Clocks, and the Ordering of Events in a Distributed System. PDF

10.2 Data Vault

  • Linstedt, D. & Olschimke, M. (2015). Building a Scalable Data Warehouse with Data Vault 2.0. Morgan Kaufmann.
  • Data Vault Alliance — DV 2.1 standard documents.
  • Datavault4dbt — the reference loader.

10.3 Dimensional modeling

  • Kimball, R. & Ross, M. (2013). The Data Warehouse Toolkit (3rd ed.). Wiley.
  • Kimball Group — design tips, articles, the official reference.

10.4 Wire & storage formats

10.5 Provenance & process

10.6 Pace layering

  • Brand, S. (1999). The Clock of the Long Now: Time and Responsibility. Basic Books.
  • Brand, S. (1994). How Buildings Learn. Viking.

11. End-to-end flow — one diagram

flowchart LR
    SRC["Source systems<br/>(CRM · billing · events · APIs · files)"]
    DLT["dlt pipeline<br/>(Arrow record batches)"]
    STG["Staging layer<br/>(Parquet on object store)"]
    RV["Raw Vault<br/>hubs · links · sats<br/>load_date · record_source"]
    BV["Business Vault<br/>same-as · PIT · bridge<br/>computed sats"]
    MART["Information Marts<br/>star · snowflake · OBT · graph"]
    LPG["LPG CANON zone<br/>(ArcadeDB)"]
    AGENT["Agents · BI · ML<br/>via UDA Search()"]

    SRC -->|extract| DLT
    DLT -->|Arrow batches| STG
    STG -->|hash-key + sat-load| RV
    RV -->|business rules<br/>same-as resolution| BV
    BV -->|materialize or virtualize| MART
    BV -->|down-project<br/>A-Box only| LPG
    MART --> AGENT
    LPG --> AGENT

    subgraph SLOW [Slow pace · meaning · history]
      RV
      BV
    end
    subgraph FAST [Fast pace · operations · queries]
      MART
      LPG
    end

    classDef slow fill:#1e293b,stroke:#60a5fa,color:#e2e8f0
    classDef fast fill:#0f172a,stroke:#38bdf8,color:#e2e8f0
    class RV,BV slow
    class MART,LPG fast

The flow is one-way at the layer boundary: raw vault is built from staging, business vault is built from raw, marts are built from business. Nothing back-writes. The fast layers can be torn down and rebuilt at any time without losing a single fact, because every fact's authoritative home is the raw vault and every fact carries its bitemporal envelope from the moment it was born.


12. Invariants — the short list

Five rules that bind every layer regardless of engine choice:

  1. Stamp before persist. The HLC seam is wired before the first persistent write. Skew-stamped rows are permanent disorder.
  2. One writer of the envelope. The temporal envelope is written once by the stamping authority. Every other store copies only the slice it owns and never back-writes.
  3. Append-only ledger. Every world-time assertion is written once, never UPDATEd, closed by superseded_at. Enforced with an insert-only DB role, not developer discipline.
  4. Valid time and transaction time are separate column pairs. Never collapsed into a single timestamp.
  5. Popularity nominates; classification decides. Fast-layer signals may flag candidates for graduation; only the slow ontological classifier may decide their category.

These five are the contract. Stores may change beneath them; the contract does not.


See also: Ontology Foundations · Ontology Stack · Coordination & VFS · Intellectual Foundations (Bibliography)