Middle-Core Knowledge Loop (Loop 5)¶

Status: design note + deterministic skeleton

Why¶

Today middle-core is an open-loop compiler:

model.yaml -> generator -> contracts -> runtime -> evidence

Evidence is terminal. Knowledge landed by the knowledge-drop scenario becomes searchable knowledge-chunks and a knowledge-graph-snapshot, and then nothing flows back. The Labs north-star — "One Model, Many Projections" — implies the inverse arc too: what the platform ingests should be able to reshape the platform's own governed model. That closes the loop:

knowledge -> (extract) -> observations -> (propose) -> model delta -> (govern) -> model.yaml -> regenerate -> contracts -> ...

This note defines that loop and ships the deterministic middle arc (propose) as a skeleton. The non-deterministic arc (extract) is an explicit, pluggable boundary so the rest of the system keeps its deterministic, gate-testable guarantees.

The loop, arc by arc¶

Arc	Input → Output	Determinism	Owner
1. Ingest (exists)	`knowledge-source` → `knowledge-chunk`s, `knowledge-graph-snapshot`, `evidence-pack`	deterministic runtime	`KnowledgeDropScenarioRunner`
2. Extract (boundary, new)	chunks → `observations.json` (candidate concepts / object types / relationships)	non-deterministic (NLP/LLM) — pluggable	`nlp-engineer` → `knowledge-engineer`
3. Propose (skeleton, new)	`observations.json` + `model.yaml` → `proposal.json` (model delta)	deterministic	`tools/modelgen/propose_model_evolution.py`
4. Govern (vocabulary exists)	`proposal.json` → `decision-record` (proposed → accepted)	human/agent gated	`ontologist-ufo` / `knowledge-engineer` + reviewer
5. Apply + regenerate (exists)	accepted delta merged into `model.yaml` → regenerate	deterministic	generator + drift gate

The determinism boundary¶

The only non-deterministic step is Extract. Everything downstream of it is a pure function of structured data, so the existing gates (model validation, drift, SHACL, OWL) still hold. propose never sees free text — it consumes a structured observations.json that an extractor (or a human, or an ontology agent) produced. This keeps the loop honest: a model change is only ever proposed by deterministic, reviewable diffing, and only ever applied through governance.

Governed, never auto-applied¶

propose does not mutate model.yaml. It emits a proposal.json shaped like the input to a decision-record (it carries a PROV-O-aligned provenance header: agent_id, activity_id, schema_version, recorded_at — mirroring ProvenanceStamp). A reviewer (or an evidence gate; see Loop 1) accepts it before any model edit. This reuses the governance primitives the model already defines (decision-record, evidence-pack, capability-exercise) rather than inventing a side channel.

The skeleton: `propose_model_evolution.py`¶

A deterministic "lift" that diffs structured observations against the current model and proposes only the genuinely new elements.

python tools/modelgen/propose_model_evolution.py \
  --model model/middle-core/model.yaml \
  --observations model/middle-core/examples/knowledge-loop-observations.example.json \
  --agent knowledge-engineer \
  --activity knowledge-drop \
  --recorded-at 2026-05-25T00:00:00Z      # optional; omit to stamp "now"

Output (proposal.json):

provenance — agent_id, activity_id, schema_version (read from the model), recorded_at (injected clock, like ISerializationClock, so output is reproducible).
proposed_additions — ontology_concepts, object_types, relationship_types that are not already in the model (sorted by id).
already_present — observed ids the model already has (the proposal is idempotent: observations ⊆ model ⇒ empty proposal).
conflicts — observed elements that cannot be added consistently (e.g. a relationship whose from/to is neither in the model nor in the same proposal, or an object type referencing an unknown ontology concept).
status — always proposed.

Determinism: ids validated (kebab-case for object/relationship ids, PascalCase for concepts), all lists sorted, json.dumps(..., sort_keys=True, indent=2) + trailing newline. Re-running with the same inputs yields byte-identical output.

How this composes with the other loops¶

Loop 1 (evidence-gated promotion) is the natural Govern gate: a proposal can be required to carry a passing capability-exercise + complete evidence-pack before a decision-record may move proposed → accepted.
Loop 4a (single-source gUFO stereotypes) is what lets a proposed ontology_concept carry a real stereotype (Kind/SubKind/EventType/Relator), so a lifted concept lands as a first-class gUFO commitment, not a bare string.
Agent visibility (Loop 6, delivered) — the model now declares an Actor gUFO Kind with agent as its SubKind, plus the provenance links agent --performs--> work-packet and evidence-pack --attributed-to--> agent (PROV-O wasAttributedTo). The proposal's agent_id and ProvenanceStamp.AgentId now correspond to a first-class ontology node. Remaining: wiring the runtime to attach an agent node and an attributed-to edge to the pinned scenario graph (changes the UI node/edge counts, so it ships with the UI update).

What is intentionally not here¶

Extraction. No NLP/LLM. The extractor is a boundary; observations.json is its contract. A reference extractor can be added later behind that contract without touching the deterministic core.
Auto-apply. Merging an accepted delta into model.yaml is deliberately left to the governance step (human/agent + the existing drift gate), not automated by the skeleton.

Future arcs¶

A reference extractor (extract_observations.py) over knowledge-chunk excerpts.
Round-trip lift from external OWL/LinkML contributions into observations.json (the LinkML projection is already the structural IR).
An apply_proposal.py that merges an accepted proposal and runs the generator + gates in one governed step, emitting the decision-record and evidence-pack as it goes.