Files
jspg/GEMINI.md

20 KiB

JSPG: JSON Schema Postgres

JSPG is a high-performance PostgreSQL extension written in Rust (using pgrx) that transforms Postgres into a pre-compiled Semantic Engine. It serves as the core engine for the "Punc" architecture, where the database is the single source of truth for all data models, API contracts, validations, and reactive queries.

1. Overview & Architecture

JSPG operates by deeply integrating the JSON Schema Draft 2020-12 specification directly into the Postgres session lifecycle. It is built around three core pillars:

  • Validator: In-memory, near-instant JSON structural validation and type polymorphism routing.
  • Merger: Automatically traverse and UPSERT deeply nested JSON graphs into normalized relational tables.
  • Queryer: Compile JSON Schemas into static, cached SQL SPI SELECT plans for fetching full entities or isolated "Stems".

🎯 Goals

  1. Draft 2020-12 Compliance: Attempt to adhere to the official JSON Schema Draft 2020-12 specification.
  2. Ultra-Fast Execution: Compile schemas into optimized in-memory validation trees and cached SQL SPIs to bypass Postgres Query Builder overheads.
  3. Connection-Bound Caching: Leverage the PostgreSQL session lifecycle using an Atomic Swap pattern. Schemas are 100% frozen, completely eliminating locks during read access.
  4. Structural Inheritance: Support object-oriented schema design via Implicit Keyword Shadowing and virtual $family references natively mapped to Postgres table constraints.
  5. Reactive Beats: Provide natively generated "Stems" (isolated payload fragments) for dynamic websocket reactivity.

Concurrency & Threading ("Immutable Graphs")

To support high-throughput operations while allowing for runtime updates (e.g., during hot-reloading), JSPG uses an Atomic Swap pattern:

  1. Parser Phase: Schema JSONs are parsed into ordered Schema structs.
  2. Compiler Phase: The database iterates all parsed schemas and pre-computes native optimization maps (Descendants Map, Depths Map, Variations Map).
  3. Immutable Validator: The Validator struct immutably owns the Database registry and all its global maps. Schemas themselves are completely frozen; $ref strings are resolved dynamically at runtime using pre-computed O(1) maps.
  4. Lock-Free Reads: Incoming operations acquire a read lock just long enough to clone the Arc inside an RwLock<Option<Arc<Validator>>>, ensuring zero blocking during schema updates.

2. Validator

The Validator provides strict, schema-driven evaluation for the "Punc" architecture.

API Reference

  • jspg_setup(database jsonb) -> jsonb: Loads and compiles the entire registry (types, enums, puncs, relations) atomically.
  • mask_json_schema(schema_id text, instance jsonb) -> jsonb: Validates and prunes unknown properties dynamically, returning masked data.
  • jspg_validate(schema_id text, instance jsonb) -> jsonb: Returns boolean-like success or structured errors.
  • jspg_teardown() -> jsonb: Clears the current session's schema cache.

Custom Features & Deviations

JSPG implements specific extensions to the Draft 2020-12 standard to support the Punc architecture's object-oriented needs while heavily optimizing for zero-runtime lookups.

  • Caching Strategy: The Validator caches the pre-compiled Database registry in memory upon initialization (jspg_setup). This registry holds the comprehensive graph of schema boundaries, Types, ENUMs, and Foreign Key relationships, acting as the Single Source of Truth for all validation operations without polling Postgres.

A. Polymorphism & Referencing ($ref, $family, and Native Types)

  • Native Type Discrimination (variations): Schemas defined inside a Postgres type are Entities. The validator securely and implicitly manages their "type" property. If an entity inherits from user, incoming JSON can safely define {"type": "person"} without errors, thanks to compiled_variations inheritance.
  • Structural Inheritance & Viral Infection ($ref): $ref is used exclusively for structural inheritance, never for union creation. A Punc request schema that $refs an Entity virally inherits all physical database polymorphism rules for that target.
  • Shape Polymorphism ($family): Auto-expands polymorphic API lists based on an abstract Descendants Graph. If {"$family": "widget"} is used, JSPG evaluates the JSON against every schema that $refs widget.
  • Strict Matches & Depth Heuristic: Polymorphic structures MUST match exactly one schema permutation. If multiple inherited struct permutations pass, JSPG applies the Depth Heuristic Tie-Breaker, selecting the candidate deepest in the inheritance tree.

B. Dot-Notation Schema Resolution & Database Mapping

  • The Dot Convention: When a schema represents a specific variation or shape of an underlying physical database Type (e.g., a "summary" of a "person"), its $id must adhere to a dot-notation suffix convention (e.g., summary.person or full.person).
  • Entity Resolution: The framework (Validator, Queryer, Merger) dynamically determines the backing PostgreSQL table structure by splitting the schema's $id (or $ref) by . and extracting the last segment (next_back()). If the last segment matches a known Database Type (like person), the framework natively applies that table's inheritance rules, variations, and physical foreign keys to the schema graph, regardless of the prefix.

C. Strict by Default & Extensibility

  • Strictness: By default, any property not explicitly defined in the schema causes a validation error (effectively enforcing additionalProperties: false globally).
  • Extensibility (extensible: true): To allow a free-for-all of undefined properties, schemas must explicitly declare "extensible": true.
  • Structured Additional Properties: If additionalProperties: {...} is defined as a schema, arbitrary keys are allowed so long as their values match the defined type constraint.
  • Inheritance Boundaries: Strictness resets when crossing $ref boundaries. A schema extending a strict parent remains strict unless it explicitly overrides with "extensible": true.

D. Implicit Keyword Shadowing

  • Inheritance ($ref + properties): Unlike standard JSON Schema, when a schema uses $ref alongside local properties, JSPG implements Smart Merge. Local constraints natively take precedence over (shadow) inherited constraints for the same keyword.
    • Example: If entity has type: {const: "entity"}, but person defines type: {const: "person"}, the local person const cleanly overrides the inherited one.
  • Composition (allOf): When evaluating allOf, standard intersection rules apply seamlessly. No shadowing occurs, meaning all constraints from all branches must pass.

E. Format Leniency for Empty Strings

To simplify frontend form validation, format validators specifically for uuid, date-time, and email explicitly allow empty strings (""), treating them as "present but unset".


3. Merger

The Merger provides an automated, high-performance graph synchronization engine via the jspg_merge(cue JSONB) API. It orchestrates the complex mapping of nested JSON objects into normalized Postgres relational tables, honoring all inheritance and graph constraints.

Core Features

  • Caching Strategy: The Merger leverages the Validator's in-memory Database registry to instantly resolve Foreign Key mapping graphs. It additionally utilizes the concurrent GLOBAL_JSPG application memory (DashMap) to cache statically constructed SQL SELECT strings used during deduplication (lk_) and difference tracking calculations.
  • Deep Graph Merging: The Merger walks arbitrary levels of deeply nested JSON schemas (e.g. tracking an order, its customer, and an array of its lines). It intelligently discovers the correct parent-to-child or child-to-parent Foreign Keys stored in the registry and automatically maps the UUIDs across the relationships during UPSERT.
  • Prefix Foreign Key Matching: Handles scenario where multiple relations point to the same table by using database Foreign Key constraint prefixes (fk_). For example, if a schema has shipping_address and billing_address, the merger resolves against fk_shipping_address_entity vs fk_billing_address_entity automatically to correctly route object properties.
  • Dynamic Deduplication & Lookups: If a nested object is provided without an id, the Merger utilizes Postgres lk_ index constraints defined in the schema registry (e.g. lk_person mapped to first_name and last_name). It dynamically queries these unique matching constraints to discover the correct UUID to perform an UPDATE, preventing data duplication.
  • Hierarchical Table Inheritance: The Punc system uses distributed table inheritance (e.g. person inherits user inherits organization inherits entity). The Merger splits the incoming JSON payload and performs atomic row updates across all relevant tables in the lineage map.
  • The Archive Paradigm: Data is never deleted in the Punc system. The Merger securely enforces referential integrity by toggling the archived Boolean flag on the base entity table rather than issuing SQL DELETE commands.
  • Change Tracking & Reactivity: The Merger diffs the incoming JSON against the existing database row (utilizing static, DashMap-cached lk_ SELECT string templates). Every detected change is recorded into the agreego.change audit table, tracking the user mapping. It then natively uses pg_notify to broadcast a completely flat row-level diff out to the Go WebSocket server for O(1) routing.
  • Flat Structural Beats (Unidirectional Flow): The Merger purposefully DOES NOT trace or hydrate outbound Foreign Keys or nested parent structures during writes. It emits completely flat, mathematically perfect structural deltas via pg_notify representing only the exact Postgres rows that changed. This guarantees the write-path remains O(1) lightning fast. It is the strict responsibility of the upstream Punc Framework (the Go Speaker) to intercept these flat beats, evaluate them against active Websocket Schema Topologies, and dynamically issue targeted jspg_query reads to hydrate the exact contextual subgraphs required by listening clients.
  • Pre-Order Notification Traversal: To support proper topological hydration on the upstream Go Framework, the Merger decouples the pg_notify execution from the physical database write execution. The engine collects structural changes and explicitly fires pg_notify SQL statements in strict Pre-Order (Parent -> Relations -> Children). This guarantees that WebSocket clients receive the parent entity Beat prior to any nested child entities, ensuring stable unidirectional data flows without hydration race conditions.
  • Many-to-Many Graph Edge Management: Operates seamlessly with the global agreego.relationship table, allowing the system to represent and merge arbitrary reified M:M relationships directionally between any two entities.
  • Sparse Updates: Empty JSON strings "" are directly bound as explicit SQL NULL directives to clear data, whilst omitted (missing) properties skip UPDATE execution entirely, ensuring partial UI submissions do not wipe out sibling fields.
  • Unified Return Structure: To eliminate UI hydration race conditions and multi-user duplication, jspg_merge explicitly strips the response graph and returns only the root { "id": "uuid" } (or an array of IDs for list insertions). External APIs can then explicitly call read APIs to fetch the resulting graph, while the UI relies 100% implicitly on the flat pg_notify pipeline for reactive state synchronization.
  • Decoupled SQL Generation: Because Writes (INSERT/UPDATE) are inherently highly dynamic based on partial payload structures, the Merger generates raw SQL strings dynamically per execution without caching, guaranteeing a minimal memory footprint while scaling optimally.

4. Queryer

The Queryer transforms Postgres into a pre-compiled Semantic Query Engine via the jspg_query(schema_id text, cue jsonb) API, designed to serve the exact shape of Punc responses directly via SQL.

Core Features

  • Caching Strategy (DashMap SQL Caching): The Queryer securely caches its compiled, static SQL string templates per schema permutation inside the GLOBAL_JSPG concurrent DashMap. This eliminates recursive AST schema crawling on consecutive requests. Furthermore, it evaluates the strings via Postgres SPI (Server Programming Interface) Prepared Statements, leveraging native database caching of execution plans for extreme performance.
  • Schema-to-SQL Compilation: Compiles JSON Schema ASTs spanning deep arrays directly into static, pre-planned SQL multi-JOIN queries. This explicitly features the Smart Merge evaluation engine which natively translates properties through allOf and $ref inheritances, mapping JSON fields specifically to their physical database table aliases during translation.
  • Dynamic Filtering: Binds parameters natively through cue.filters objects. The queryer enforces a strict, structured, MongoDB-style operator syntax to map incoming JSON request paths directly to their originating structural table columns.
    • Equality / Inequality: {"$eq": value}, {"$ne": value} automatically map to = and !=.
    • Comparison: {"$gt": ...}, {"$gte": ...}, {"$lt": ...}, {"$lte": ...} directly compile to Postgres comparison operators (> , >=, <, <=).
    • Array Inclusion: {"$in": [values]}, {"$nin": [values]} use native jsonb_array_elements_text() bindings to enforce IN and NOT IN logic without runtime SQL injection risks.
    • Text Matching (ILIKE): Evaluates $eq or $ne against string fields containing the % character natively into Postgres ILIKE and NOT ILIKE partial substring matches.
    • Type Casting: Safely resolves dynamic combinations by casting values instantly into the physical database types mapped in the schema (e.g. parsing uuid bindings to ::uuid, formatting DateTimes to ::timestamptz, and numbers to ::numeric).

The Stem Engine

Rather than over-fetching heavy Entity payloads and trimming them, Punc Framework Websockets depend on isolated subgraphs defined as Stems. A Stem is a declaration of an Entity Type boundary that exists somewhere within the compiled JSON Schema graph, expressed using gjson multipath syntax (e.g., contacts.#.phone_numbers.#).

Because pg_notify (Beats) fire rigidly from physical Postgres tables (e.g. {"type": "phone_number"}), the Go Framework only ever needs to know: "Does the schema with_contacts.person contain the phone_number Entity anywhere inside its tree, and if so, what is the gjson path to iterate its payload?"

  • Initialization: During startup (jspg_stems()), the database crawls all Schemas and maps out every physical Entity Type it references. It builds a highly optimized HashMap<String, HashMap<String, Arc<Stem>>> providing strictly O(1) memory lookups mapping Schema ID -> { Stem Path -> Entity Type }.
  • GJSON Pathing: Unlike standard JSON Pointers, stems utilize .# array iterator syntax. The Go web server consumes this native path (e.g. lines.#) across the raw Postgres JSON byte payload, extracting all active UUIDs in one massive sub-millisecond sweep without unmarshaling Go ASTs.
  • Polymorphic Condition Selectors: When trailing paths would otherwise collide because of abstract polymorphic type definitions (e.g., a target property bounded by a oneOf taking either phone_number or email_address), JSPG natively appends evaluated gjson type conditions into the path (e.g. contacts.#.target#(type=="phone_number")). This guarantees O(1) key uniqueness in the HashMap while retaining extreme array extraction speeds natively without runtime AST evaluation.
  • Identifier Prioritization: When determining if a nested object boundary is an Entity, JSPG natively prioritizes defined $id tags over $ref inheritance pointers to prevent polymorphic boundaries from devolving into their generic base classes.
  • Cyclical Deduplication: Because Punc relationships often reference back on themselves via deeply nested classes, the Stem Engine applies intelligent path deduplication. If the active current_path already ends with the target entity string, it traverses the inheritance properties without appending the entity to the stem path again, eliminating infinite powerset loops.
  • Relationship Path Squashing: When calculating string paths structurally, JSPG intentionally omits properties natively named target or source if they belong to a native database relationship table override.
  • The Go Router: The Golang Punc framework uses this exact mapping to register WebSocket Beat frequencies exclusively on the Entity types discovered.
  • The Queryer Execution: When the Go framework asks JSPG to hydrate a partial phone_number stem for the with_contacts.person schema, instead of jumping through string paths, the SQL Compiler simply reaches into the Schema's AST using the phone_number Type string, pulls out exactly that entity's mapping rules, and returns a fully correlated SELECT block! This natively handles nested array properties injected via oneOf or array references efficiently bypassing runtime powerset expansion.
  • Performance: These Stem execution structures are fully statically compiled via SPI and map perfectly to O(1) real-time routing logic on the application tier.

5. Testing & Execution Architecture

JSPG implements a strict separation of concerns to bypass the need to boot a full PostgreSQL cluster for unit and integration testing. Because pgrx::spi::Spi directly links to PostgreSQL C-headers, building the library with cargo test on macOS natively normally results in fatal dyld crashes.

To solve this, JSPG introduces the DatabaseExecutor trait inside src/database/executors/:

  • SpiExecutor (pgrx.rs): The production evaluator that is conditionally compiled (#[cfg(not(test))]). It unwraps standard pgrx::spi connections to the database.
  • MockExecutor (mock.rs): The testing evaluator that is conditionally compiled (#[cfg(test)]). It absorbs SQL calls and captures parameter bindings in memory without executing them.

Universal Test Harness (src/tests/)

JSPG abandons the standard cargo pgrx test model in favor of native OS testing for a >1000x speed increase (~0.05s execution).

  1. JSON Fixtures: All core interactions are defined abstractly as JSON arrays in fixtures/. Each file contains suites of TestCase objects with an action flag (compile, validate, merge, query).
  2. build.rs Generator: The build script traverses the JSON fixtures, extracts their structural identities, and generates standard #[test] blocks into src/tests/fixtures.rs.
  3. Modular Test Dispatcher: The src/tests/types/ module deserializes the abstract JSON test payloads into Suite, Case, and Expect data structures.
    • The compile action natively asserts the exact output shape of jspg_stems, allowing structural and relationship mapping logic to be tested purely through JSON without writing brute-force manual tests in Rust.
  4. Unit Context Execution: When cargo test executes, the runner iterates the JSON payloads. Because the tests run natively inside the module via #cfg(test), the Rust compiler globally erases pgrx C-linkage, instantiates the MockExecutor, and allows for pure structural evaluation of complex database logic completely in memory in parallel.