RT5 PIN-S1 — ArcadeDB UPSERT / Idempotency / Batching Semantics¶
Spike: PIN-S1 — ArcadeDB transaction/UPSERT/idempotency semantics
Informed by: Live execution against arcadedata/arcadedb:26.5.1
Informs: PIN-F4 — ArcadeDB adapter (ArcadeDbPinBackend)
Verdict: All critical hypotheses confirmed live. See implementation patterns below.
Evidence: docs/synthesis/spike_arcadedb_test.py — 35/36 assertions passed; the 1 non-pass is a gotcha (see section 6).
Spike Context¶
The adapter (PIN-F4) must implement:
- An idempotent live row — one
OntologyElementvertex perontology_iri, updated in-place on re-pin. - An immutable append-only ledger (
PinLedgerEntry) — re-pinning identical content must be a no-op (guarded byUNIQUEoncontent_hash). - Both operations must be atomic.
Unknowns before this spike: does UPDATE...UPSERT deduplicate correctly, how does ArcadeDB behave on a duplicate INSERT against a UNIQUE index, and how do you batch both operations atomically?
1. HTTP Command API Shape¶
LIVE VERIFIED
All SQL and script execution goes through one endpoint:
POST /api/v1/command/{database}
Content-Type: application/json
Authorization: Basic <base64(user:password)>
{ "language": "sql", "command": "<sql>" }
Response shape:
{ "user": "root", "result": [ ...rows ] }
Key points:
- result is always an array (empty array [] on DDL statements, not null).
- HTTP 4xx/5xx on any error — check detail field in the error body for the message.
- For sqlscript, set "language": "sqlscript". All other fields identical.
- No separate transaction endpoint; transactions are managed inside sqlscript.
2. DDL Idempotency¶
LIVE VERIFIED — DDL-01 through IDX-04 all passed
All CREATE DDL statements support IF NOT EXISTS. Use it everywhere in EnsureReadyAsync:
CREATE VERTEX TYPE OntologyElement IF NOT EXISTS;
CREATE DOCUMENT TYPE PinLedgerEntry IF NOT EXISTS;
CREATE PROPERTY OntologyElement.ontology_iri IF NOT EXISTS STRING;
CREATE PROPERTY PinLedgerEntry.content_hash IF NOT EXISTS STRING;
CREATE INDEX IF NOT EXISTS ON OntologyElement (ontology_iri) UNIQUE;
CREATE INDEX IF NOT EXISTS ON PinLedgerEntry (content_hash) UNIQUE;
Calling these a second time returns HTTP 200 with an empty result array — no error. EnsureReadyAsync can be called repeatedly without guard state.
3. UPDATE...UPSERT — Live Row Semantics¶
LIVE VERIFIED — UPSERT-01 through UPSERT-06 all passed
Syntax¶
UPDATE <Type>
SET <field1> = <value1>, <field2> = <value2>
UPSERT
WHERE <unique_key_field> = '<key_value>'
The UPSERT keyword is placed between the SET clause and the WHERE clause.
Behavior confirmed¶
| Scenario | Result |
|---|---|
| First call — no matching record | Creates 1 vertex; returns [{"count": 1}] |
| Second call — same key, same content | Updates the same record; returns [{"count": 1}] |
| Call with same key, changed content | Updates in-place; returns [{"count": 1}] |
| Row count after N UPSERTs on same key | Always exactly 1 — no phantom duplicates |
Return shape¶
[{ "count": 1 }]
count is the number of records affected (created or updated). This is always 1 for a single-key UPSERT.
PIN-F4 live-row UPSERT¶
UPDATE OntologyElement
SET identity_hash = ':identity_hash',
content_hash = ':content_hash',
kind = ':kind',
object_type = ':object_type',
object_id = ':object_id',
ontology_concept = ':ontology_concept',
ufo_stereotype = ':ufo_stereotype',
state = ':state',
valid_from = ':valid_from',
valid_to = ':valid_to',
recorded_at = ':recorded_at',
canonical_payload = ':canonical_payload',
schema_version = ':schema_version'
UPSERT
WHERE ontology_iri = ':ontology_iri'
Pass values via the "params" object in the HTTP payload to avoid SQL injection:
{
"language": "sql",
"command": "UPDATE OntologyElement SET label = :label UPSERT WHERE ontology_iri = :iri",
"params": { "label": "Invoice", "iri": "urn:agentarmy:mc:invoice/Invoice/001" }
}
4. UNIQUE-Guarded INSERT — Ledger Semantics¶
LIVE VERIFIED — LEDGER-01 through LEDGER-06 all passed
Behavior confirmed¶
| Scenario | Result |
|---|---|
First INSERT with a new content_hash |
Succeeds; returns the created record |
Second INSERT with the same content_hash |
HTTP 500, error body contains "Duplicated key [<hash>] found on index" |
| Row count after 2 attempts with same hash | Exactly 1 — no duplicate committed |
INSERT with a different content_hash, same identity_hash |
Succeeds — appends a new ledger entry |
Error body shape on duplicate¶
{
"error": "Error on transaction commit",
"detail": "Duplicated key [sha256-abc123] found on index 'PinLedgerEntry[content_hash]' already assigned to record #4:0",
"exception": "com.arcadedb.index.IndexException"
}
The .detail field carries the message; .exception is the Java class name. The string "Duplicated key" is a reliable signal to distinguish a duplicate-hash no-op from a real error.
PIN-F4 ledger INSERT¶
INSERT INTO PinLedgerEntry
SET content_hash = ':content_hash',
ontology_iri = ':ontology_iri',
identity_hash = ':identity_hash',
kind = ':kind',
object_type = ':object_type',
object_id = ':object_id',
ufo_stereotype = ':ufo_stereotype',
valid_from = ':valid_from',
valid_to = ':valid_to',
recorded_at = ':recorded_at',
superseded_at = ':superseded_at',
canonical_payload = ':canonical_payload',
provenance = ':provenance',
schema_version = ':schema_version'
5. Batching and Transactions¶
LIVE VERIFIED — BATCH-01, BATCH-02, ATOMIC-01, ATOMIC-02, PIN-01 through PIN-04 all passed
The batching mechanism: sqlscript¶
ArcadeDB's sqlscript language runs a multi-statement script in a single HTTP call. It is the only way to execute the UPSERT + INSERT atomically. Plain sql language does not support multiple statements separated by semicolons (the second statement is silently ignored or errors).
POST /api/v1/command/{database}
{ "language": "sqlscript", "command": "<script>" }
Script syntax¶
BEGIN;
LET <var> = <sql_statement>;
LET <var2> = <sql_statement>;
COMMIT;
RETURN $<var>;
BEGIN/COMMITare mandatory to wrap the statements transactionally.LETassigns the result of a statement to a named variable.RETURNsends the named variable back as the HTTP responseresult.- Semicolons after
LETlines are required.
Atomic UPSERT + INSERT — the canonical PIN-F4 pattern¶
BEGIN;
LET live = UPDATE OntologyElement
SET identity_hash = :identity_hash, content_hash = :content_hash,
label = :label, state = :state, recorded_at = :recorded_at
UPSERT WHERE ontology_iri = :ontology_iri;
LET ledger = INSERT INTO PinLedgerEntry
SET content_hash = :content_hash, ontology_iri = :ontology_iri,
identity_hash = :identity_hash, recorded_at = :recorded_at;
COMMIT;
RETURN $live;
Transaction rollback on UNIQUE violation — confirmed¶
When the INSERT violates the UNIQUE index on content_hash, the entire transaction rolls back. This was confirmed live:
- The
UPDATE...UPSERThad already executed within the transaction. - The subsequent
INSERTtriggered a UNIQUE violation. - ArcadeDB returned HTTP 500 with
"Duplicated key"in the detail. - A follow-up
SELECTconfirmed the UPSERT's write was also rolled back.
This means the naive pattern (wrap both in one script, treat duplicate-key as idempotent) does not work. See section 6 for the correct approach.
6. Critical Gotcha: Re-pin Rolls Back the Live Row Update¶
LIVE VERIFIED — PIN-02, PIN-03, PIN-04 all passed (confirmed the gotcha)
When you re-pin with identical content using the atomic sqlscript:
- UPSERT succeeds inside the transaction (the live row is "pending").
- INSERT fails on the UNIQUE index.
- The entire transaction rolls back, including the UPSERT.
- The live row is left in its pre-call state (the first-pin value, not rolled back to nothing).
This is safe for the immutable-ledger invariant (no duplicate ledger entry is written), but it means the live row is NOT updated on a content-identical re-pin — which is actually the correct semantic (content has not changed, live row need not change).
However: if you want recorded_at or state on the live row to update on every pin call regardless of content change, you cannot do that atomically with the simple UPSERT+INSERT pattern. You need the check-first pattern below.
7. Recommended Idempotent Pin Pattern for PIN-F4¶
Based on the spike results, PIN-F4 PinObjectAsync should use a check-before-script pattern:
Step 1 (read): SELECT count(*) AS n FROM PinLedgerEntry WHERE content_hash = :content_hash
Step 2a (skip): if n > 0 → content already pinned, return existing live row (no-op)
Step 2b (write): if n = 0 → execute atomic sqlscript (UPSERT live + INSERT ledger)
This avoids the transaction rollback entirely on re-pin, and the two-step sequence is safe because:
- The UNIQUE index on content_hash prevents a race condition from inserting duplicates even if two callers pass the read check simultaneously.
- The UPSERT is idempotent by design.
C# pseudocode sketch¶
// Step 1: guard read
var n = await CountAsync($"SELECT count(*) AS n FROM PinLedgerEntry WHERE content_hash = :h",
new { h = element.ContentHash.Value });
if (n > 0)
return; // idempotent no-op
// Step 2: atomic write
var scriptResult = await ScriptAsync("""
BEGIN;
LET live = UPDATE OntologyElement
SET identity_hash = :identity_hash, content_hash = :content_hash, ...
UPSERT WHERE ontology_iri = :ontology_iri;
LET ledger = INSERT INTO PinLedgerEntry
SET content_hash = :content_hash, ...;
COMMIT;
RETURN $live;
""", params);
// If we still get a "Duplicated key" error here (lost race), treat as no-op.
// Do NOT rethrow duplicate-key as an exception — it means another caller won the race.
8. Composite Index Gotcha: Validate Data First¶
LIVE DISCOVERED — CIDX-01 failed
Creating a UNIQUE composite index on an existing type fails if existing data contains duplicates on those columns. ArcadeDB validates all existing records during index creation:
Error on creating index 'PinLedgerEntry_0_...' ->
Duplicated key [hash-script, 2026-01-01T00:00:00Z] found on index
Implication for PIN-F4: The index DDL in EnsureReadyAsync must run before any data is written. The DDL must be part of initial schema setup, not added to an existing populated database without checking. The recommended indexes from the PIN-F4 spec:
-- Primary dedup guard (must exist before any PIN write)
CREATE INDEX IF NOT EXISTS ON PinLedgerEntry (content_hash) UNIQUE;
-- Composite lookup indexes (data must be unique on these before adding)
CREATE INDEX IF NOT EXISTS ON PinLedgerEntry (identity_hash, recorded_at) UNIQUE;
CREATE INDEX IF NOT EXISTS ON OntologyElement (ontology_iri) UNIQUE;
For the composite (identity_hash, recorded_at) index: since recorded_at is the serialization clock timestamp (unique per clock tick per object), duplicates should be practically impossible in production. But in the test harness the BATCH-01 test deliberately inserted two rows with the same (identity_hash, recorded_at) pair, which blocked the later index creation. Schema-first order is mandatory.
9. Index and Schema Setup Order for EnsureReadyAsync¶
The correct initialization order to avoid the composite-index gotcha:
-- 1. Types
CREATE VERTEX TYPE OntologyElement IF NOT EXISTS;
CREATE DOCUMENT TYPE PinLedgerEntry IF NOT EXISTS;
-- (and HyperEdge types, ProvenanceVertex, BINDS_ROLE as needed by PIN-F4)
-- 2. Properties (required before index creation)
CREATE PROPERTY OntologyElement.ontology_iri IF NOT EXISTS STRING;
CREATE PROPERTY OntologyElement.identity_hash IF NOT EXISTS STRING;
CREATE PROPERTY OntologyElement.content_hash IF NOT EXISTS STRING;
-- ... all other OntologyElement fields ...
CREATE PROPERTY PinLedgerEntry.content_hash IF NOT EXISTS STRING;
CREATE PROPERTY PinLedgerEntry.identity_hash IF NOT EXISTS STRING;
CREATE PROPERTY PinLedgerEntry.recorded_at IF NOT EXISTS STRING;
-- ... all other PinLedgerEntry fields ...
-- 3. Indexes (after types and properties exist, before any data write)
CREATE INDEX IF NOT EXISTS ON OntologyElement (ontology_iri) UNIQUE;
CREATE INDEX IF NOT EXISTS ON PinLedgerEntry (content_hash) UNIQUE;
CREATE INDEX IF NOT EXISTS ON PinLedgerEntry (identity_hash, recorded_at) UNIQUE;
-- composite indexes from PIN-F4 spec: (identity_hash, superseded_at, recorded_at), etc.
10. Full Verified SQL Shapes for PIN-F4¶
Live row — UPSERT on ontology_iri¶
UPDATE OntologyElement
SET identity_hash = :identity_hash,
content_hash = :content_hash,
kind = :kind,
object_type = :object_type,
object_id = :object_id,
ontology_concept = :ontology_concept,
ontology_iri = :ontology_iri,
ufo_stereotype = :ufo_stereotype,
state = :state,
valid_from = :valid_from,
valid_to = :valid_to,
recorded_at = :recorded_at,
canonical_payload = :canonical_payload,
schema_version = :schema_version
UPSERT
WHERE ontology_iri = :ontology_iri
Immutable ledger — guarded INSERT¶
INSERT INTO PinLedgerEntry
SET content_hash = :content_hash,
ontology_iri = :ontology_iri,
identity_hash = :identity_hash,
kind = :kind,
object_type = :object_type,
object_id = :object_id,
ufo_stereotype = :ufo_stereotype,
valid_from = :valid_from,
valid_to = :valid_to,
recorded_at = :recorded_at,
superseded_at = :superseded_at,
canonical_payload = :canonical_payload,
provenance = :provenance,
schema_version = :schema_version
Atomic sqlscript wrapping both¶
BEGIN;
LET live = UPDATE OntologyElement
SET identity_hash = :identity_hash, content_hash = :content_hash,
kind = :kind, object_type = :object_type, object_id = :object_id,
ontology_concept = :ontology_concept, ufo_stereotype = :ufo_stereotype,
state = :state, valid_from = :valid_from, valid_to = :valid_to,
recorded_at = :recorded_at, canonical_payload = :canonical_payload,
schema_version = :schema_version
UPSERT WHERE ontology_iri = :ontology_iri;
LET ledger = INSERT INTO PinLedgerEntry
SET content_hash = :content_hash, ontology_iri = :ontology_iri,
identity_hash = :identity_hash, kind = :kind, object_type = :object_type,
object_id = :object_id, ufo_stereotype = :ufo_stereotype,
valid_from = :valid_from, valid_to = :valid_to, recorded_at = :recorded_at,
superseded_at = :superseded_at, canonical_payload = :canonical_payload,
provenance = :provenance, schema_version = :schema_version;
COMMIT;
RETURN $live;
11. Open Questions and Residual Risk¶
| Question | Status | Risk level |
|---|---|---|
HyperEdge vertex type for relator pinning (PIN-F4 uses HyperEdge subtype for ingest-evidence) |
Not spiked — no relator DDL tested | Medium — PIN-S2 may address query patterns; PIN-F4 needs relator schema before coding |
BINDS_ROLE edge type creation and traversal |
Not spiked | Medium |
Parameter binding with large canonical_payload JSON blob (>64KB) |
Not tested | Low — ArcadeDB document type handles arbitrary JSON; confirm max document size |
| Concurrent writes (race between check and script in high-throughput scenarios) | Partially addressed — UNIQUE index is the backstop; duplicate-key on INSERT is the race signal | Low for current throughput; document as known behavior |
| PostgreSQL wire protocol as alternative to HTTP command API | Not spiked — the cockpit contract notes ArcadeDB supports pg wire | Low — HTTP command API is sufficient for the adapter; pg wire adds driver dependency |
12. Test Evidence Summary¶
ArcadeDB version: 26.5.1 (build b6d08fee549dd5930303b362223869901aab0fac)
Container: arcadedata/arcadedb:26.5.1 via docker run --name spike-arcadedb-test -p 12480:2480
Test script: docs/synthesis/spike_arcadedb_test.py
Result: 35 PASS / 1 NOTED (CIDX-01 — gotcha documented above)
| Group | Tests | Pass | Notes |
|---|---|---|---|
| DDL idempotency | 4 | 4 | IF NOT EXISTS works on all DDL |
| UNIQUE index | 4 | 4 | IF NOT EXISTS works; repeatable |
| UPSERT live row | 6 | 6 | count=1 always; no phantom rows |
| Guarded INSERT | 6 | 6 | Duplicate fails; error is "Duplicated key" |
| sqlscript batch | 2 | 2 | BEGIN/COMMIT wraps multiple statements |
| Transaction atomicity | 2 | 2 | UNIQUE violation rolls back entire tx |
| PIN-F4 full pattern | 4 | 4 | First pin OK; re-pin rolls back correctly |
| Check-first re-pin | 5 | 5 | Skip on existing hash; new content appends |
| Composite index | 1 | 0 | Fails on pre-existing duplicates — gotcha |
| HTTP API shape | 2 | 2 | result=[...rows]; detail field on error |
All claims above are LIVE VERIFIED unless explicitly marked otherwise.