massively improves the jspg validator by removing mathmatical functions like allOf, anyOf, ref, etc to effectively use discriminators and OOP with types to determine valid pathing an nno intersections, unions, or guesswork; added cases to replace the former conditionals

This commit is contained in:
2026-04-08 13:08:24 -04:00
parent e4286ac6a9
commit 7c8df22709
30 changed files with 2526 additions and 4816 deletions

182
GEMINI.md
View File

@ -10,7 +10,7 @@ JSPG operates by deeply integrating the JSON Schema Draft 2020-12 specification
* **Queryer**: Compile JSON Schemas into static, cached SQL SPI `SELECT` plans for fetching full entities or isolated ad-hoc object boundaries.
### 🎯 Goals
1. **Draft 2020-12 Compliance**: Attempt to adhere to the official JSON Schema Draft 2020-12 specification.
1. **Draft 2020-12 Based**: Attempt to adhere to the official JSON Schema Draft 2020-12 specification, while heavily augmenting it for strict structural typing.
2. **Ultra-Fast Execution**: Compile schemas into optimized in-memory validation trees and cached SQL SPIs to bypass Postgres Query Builder overheads.
3. **Connection-Bound Caching**: Leverage the PostgreSQL session lifecycle using an **Atomic Swap** pattern. Schemas are 100% frozen, completely eliminating locks during read access.
4. **Structural Inheritance**: Support object-oriented schema design via Implicit Keyword Shadowing and virtual `$family` references natively mapped to Postgres table constraints.
@ -20,9 +20,132 @@ JSPG operates by deeply integrating the JSON Schema Draft 2020-12 specification
To support high-throughput operations while allowing for runtime updates (e.g., during hot-reloading), JSPG uses an **Atomic Swap** pattern:
1. **Parser Phase**: Schema JSONs are parsed into ordered `Schema` structs.
2. **Compiler Phase**: The database iterates all parsed schemas and pre-computes native optimization maps (Descendants Map, Depths Map, Variations Map).
3. **Immutable AST Caching**: The `Validator` struct immutably owns the `Database` registry. Schemas themselves are frozen structurally, but utilize `OnceLock` interior mutability during the Compilation Phase to permanently cache resolved `$ref` inheritances, properties, and `compiled_edges` directly onto their AST nodes. This guarantees strict `O(1)` relationship and property validation execution at runtime without locking or recursive DB polling.
3. **Immutable AST Caching**: The `Validator` struct immutably owns the `Database` registry. Schemas themselves are frozen structurally, but utilize `OnceLock` interior mutability during the Compilation Phase to permanently cache resolved `type` inheritances, properties, and `compiled_edges` directly onto their AST nodes. This guarantees strict `O(1)` relationship and property validation execution at runtime without locking or recursive DB polling.
4. **Lock-Free Reads**: Incoming operations acquire a read lock just long enough to clone the `Arc` inside an `RwLock<Option<Arc<Validator>>>`, ensuring zero blocking during schema updates.
### Global API Reference
These functions operate on the global `GLOBAL_JSPG` engine instance and provide administrative boundaries:
* `jspg_setup(database jsonb) -> jsonb`: Initializes the engine. Deserializes the full database schema registry (types, enums, puncs, relations) from Postgres and compiles them into memory atomically.
* `jspg_teardown() -> jsonb`: Clears the current session's engine instance from `GLOBAL_JSPG`, resetting the cache.
* `jspg_schemas() -> jsonb`: Exports the fully compiled AST snapshot (including all inherited dependencies) out of `GLOBAL_JSPG` into standard JSON Schema representations.
---
## 2. Schema Modeling (Punc Developer Guide)
JSPG augments standard JSON Schema 2020-12 to provide an opinionated, strict, and highly ergonomic Object-Oriented paradigm. Developers defining Punc Data Models should follow these conventions.
### Types of Types
* **Table-Backed (Entity Types)**: Primarily defined in root type schemas. These represent physical Postgres tables.
* They absolutely **require** an `$id`.
* The schema conceptually requires a `type` discriminator at runtime so the engine knows what physical variation to interact with.
* Can inherit other entity types to build lineage (e.g. `person` -> `organization` -> `entity`).
* **Field-Backed (JSONB Bubbles)**: These are shapes that live entirely inside a Postgres JSONB column without being tied to a top-level table constraint.
* **Global `$id` Promotion**: Utilizing explicit `$id` declarations promotes the schema to the Global Registry. This effectively creates strictly-typed code-generator universes (e.g., generating an `InvoiceNotificationMetadata` Dart class) operating cleanly inside unstructured Postgres JSONB columns.
* They can re-use the standard `type` discriminator locally for `oneOf` polymorphism without conflicting with global Postgres Table constraints.
### Discriminators & The Dot Convention (A.B)
In Punc, polymorphic targets like explicit tagged unions or STI (Single Table Inheritance) rely on discriminators. Because Punc favors universal consistency, a schema's data contract must be explicit and mathematically identical regardless of the routing context an endpoint consumes it through.
**The 2-Tier Paradigm**: The system inherently prevents "God Tables" by restricting routing to exactly two dimensions, guaranteeing absolute $O(1)$ lookups without ambiguity:
1. **Vertical Routing (`type`)**: Identifies the specific Postgres Table lineage (e.g. `person` vs `organization`).
2. **Horizontal Routing (`kind.type`)**: Natively evaluates Single Table Inheritance. The runtime dynamically concatenates `$kind.$type` to yield the namespace-protected schema `$id` (e.g. `light.person`), maintaining collision-free schema registration.
Therefore, any schema that participates in polymorphic discrimination MUST explicitly define its discriminator properties natively inside its `properties` block. However, to stay DRY and maintain flexible APIs, you **DO NOT** need to hardcode `const` values, nor should you add them to your `required` array. The Punc engine treats `type` and `kind` as **magic properties**.
**Magic Validation Constraints**:
* **Dynamically Required**: The system inherently drives the need for their requirement. The Validator dynamically expects the discriminators and structurally bubbles `MISSING_TYPE` ultimata ONLY when a polymorphic router (`$family` / `oneOf`) dynamically requires them to resolve a path. You never manually put them in the JSON schema `required` block.
* **Implicit Resolution**: When wrapped in `$family` or `oneOf`, the polymorphic router can mathematically parse the schema `$id` (e.g. `light.person`) and natively validate that `type` equals `"person"` and `kind` equals `"light"`, bubbling `CONST_VIOLATED` if they mismatch, all without you ever hardcoding `const` limitations.
* **Generator Explicitness**: Because Postgres is the Single Source of Truth, forcing the explicit definition in `properties` initially guarantees the downstream Dart/Go code generators observe the fields and can cleanly serialize them dynamically back to the server.
For example, a schema representing `$id: "light.person"` must natively define its own structural boundaries:
```json
{
"$id": "light.person",
"type": "person",
"properties": {
"type": { "type": "string" },
"kind": { "type": "string" }
},
"required": ["type", "kind"]
}
```
* **The Object Contract (Presence)**: The Object enforces its own structural integrity mechanically. Standard JSON Validation natively ensures `type` and `kind` are present, bubbling `REQUIRED_FIELD_MISSING` organically if omitted.
* **The Dynamic Values (`db.types`)**: Because the `type` and `kind` properties technically exist, the Punc engine dynamically intercepts them during `validate_object`. It mathematically parses the schema `$id` (e.g. `light.person`) and natively validates that `type` equals `"person"` (or a valid descendant in `db.types`) and `kind` equals `"light"`, bubbling `CONST_VIOLATED` if they mismatch.
* **The Routing Contract**: When wrapped in `$family` or `oneOf`, the polymorphic router can execute Lightning Fast $O(1)$ fast-paths by reading the payload's `type`/`kind` identifiers, and gracefully fallback to standard structural failure if omitted.
### Composition & Inheritance (The `type` keyword)
Punc completely abandons the standard JSON Schema `$ref` keyword. Instead, it overloads the exact same `type` keyword used for primitives. A `"type"` in Punc is mathematically evaluated as either a Native Primitive (`"string"`, `"null"`) or a Custom Object Pointer (`"budget"`, `"user"`).
* **Single Inheritance**: Setting `"type": "user"` acts exactly like an `extends` keyword. The schema borrows all fields and constraints from the `user` identity. During `jspg_setup`, the compiler recursively crawls the dependencies to map the physical Postgres table, permanently mapping its type restriction to `"object"` under the hood so JSON standards remain unbroken.
* **Implicit Keyword Shadowing**: Unlike standard JSON Schema inheritance, local property definitions natively override and shadow inherited properties.
* **Primitive Array Shorthand (Optionality)**: The `type` array syntax is heavily optimized for nullable fields. Defining `"type": ["budget", "null"]` natively builds a nullable strict, generating `Budget? budget;` in Dart. You can freely mix primitives like `["string", "number", "null"]`.
* **Strict Array Constraint**: To explicitly prevent mathematically ambiguous Multiple Inheritance, a `type` array is strictly constrained to at most **ONE** Custom Object Pointer. Defining `"type": ["person", "organization"]` will intentionally trigger a fatal database compilation error natively instructing developers to build a proper tagged union (`oneOf`) instead.
### Polymorphism (`$family` and `oneOf`)
Polymorphism is how an object boundary can dynamically take on entirely different shapes based on the payload provided at runtime.
* **`$family` (Target-Based Polymorphism)**: An explicit Punc compiler macro instructing the database compiler to dynamically search its internal `db.descendants` registry and find all physical schemas that mathematically resolve to the target.
* *Across Tables (Vertical)*: If `$family: entity` is requested, the payload's `type` field acts as the discriminator, dynamically routing to standard variations like `organization` or `person` spanning multiple Postgres tables.
* *Single Table (Horizontal)*: If `$family: widget` is requested, the router explicitly evaluates the Dot Convention dynamically. If the payload possesses `"type": "widget"` and `"kind": "stock"`, the router mathematically resolves to the string `"stock.widget"` and routes exclusively to that explicit `JSPG` schema.
* **`oneOf` (Strict Tagged Unions)**: A hardcoded array of JSON Schema candidate options. Punc strictly bans mathematical "Union of Sets" evaluation. Every `oneOf` candidate item MUST either be a pure primitive (`{ "type": "null" }`) or a user-defined Object Pointer providing a specific discriminator (e.g., `{ "type": "invoice_metadata" }`). This ensures validations remain pure $O(1)$ fast-paths and allows the Dart generator to emit pristine `sealed classes`.
### Conditionals (`cases`)
Standard JSON Schema forces developers to write deeply nested `allOf` -> `if` -> `properties` blocks just to execute conditional branching. **JSPG completely abandons `allOf` and this practice.** For declarative business logic and structural mutations conditionally based upon property bounds, use the top-level `cases` array.
It evaluates as an **Independent Declarative Rules Engine**. Every `Case` block within the array is evaluated independently in parallel. For a given rule, if the `when` condition evaluates to true, its `then` schema is executed. If it evaluates to false, its `else` schema is executed (if present). To maintain strict standard JSON Schema compatibility internally, the `when` block utilizes pure JSON Schema `properties` definitions (e.g. `enum`, `const`) rather than injecting unstandardized MongoDB operators. Because `when`, `then`, and `else` are themselves standard schemas, they natively support nested `cases` to handle mutually exclusive `else if` architectures.
```json
{
"$id": "save_external_account",
"cases": [
{
"when": {
"properties": {
"status": { "const": "unverified" }
},
"required": ["status"]
},
"then": {
"required": ["amount_1", "amount_2"]
}
},
{
"when": {
"properties": { "kind": { "const": "credit" } },
"required": ["kind"]
},
"then": {
"required": ["details"]
},
"else": {
"cases": [
{
"when": { "properties": { "kind": { "const": "checking" } }, "required": ["kind"] },
"then": { "required": ["routing_number"] }
}
]
}
}
]
}
```
### Strict by Default & Extensibility
* **Strictness**: By default, any property not explicitly defined in the schema causes a validation error (effectively enforcing `additionalProperties: false` globally).
* **Extensibility (`extensible: true`)**: To allow a free-for-all of undefined properties, schemas must explicitly declare `"extensible": true`.
* **Structured Additional Properties**: If `additionalProperties: {...}` is defined as a schema, arbitrary keys are allowed so long as their values match the defined type constraint.
* **Inheritance Boundaries**: Strictness resets when crossing non-primitive `type` boundaries. A schema extending a strict parent remains strict unless it explicitly overrides with `"extensible": true`.
### Format Leniency for Empty Strings
To simplify frontend form validation, format validators specifically for `uuid`, `date-time`, and `email` explicitly allow empty strings (`""`), treating them as "present but unset".
---
## 3. Database
The Database module manages the core execution graphs and structural compilation of the Postgres environment.
### Relational Edge Resolution
When compiling nested object graphs or arrays, the JSPG engine must dynamically infer which Postgres Foreign Key constraint correctly bridges the parent to the nested schema. To guarantee deterministic SQL generation, it utilizes a strict, multi-step algebraic resolution process applied during the `OnceLock` Compilation phase:
@ -33,16 +156,14 @@ When compiling nested object graphs or arrays, the JSPG engine must dynamically
5. **Implicit Base Fallback (1:M)**: If no explicit prefix matches, and M:M deduction fails, the compiler filters for exactly one remaining relation with a `null` prefix (e.g. `fk_invoice_line_invoice` -> `prefix: null`). A `null` prefix mathematically denotes the core structural parent-child ownership edge and is used safely as a fallback.
6. **Deterministic Abort**: If the engine exhausts all deduction pathways and the edge remains ambiguous, it explicitly aborts schema compilation (`returns None`) rather than silently generating unpredictable SQL.
### Global API Reference
These functions operate on the global `GLOBAL_JSPG` engine instance and provide administrative boundaries:
* `jspg_setup(database jsonb) -> jsonb`: Initializes the engine. Deserializes the full database schema registry (types, enums, puncs, relations) from Postgres and compiles them into memory atomically.
* `jspg_teardown() -> jsonb`: Clears the current session's engine instance from `GLOBAL_JSPG`, resetting the cache.
* `jspg_schemas() -> jsonb`: Exports the fully compiled AST snapshot (including all inherited dependencies) out of `GLOBAL_JSPG` into standard JSON Schema representations.
### Ad-Hoc Schema Promotion
To seamlessly support deeply nested, inline Object definitions that don't declare an explicit `$id`, JSPG aggressively promotes them to standalone topological entities during the database compilation phase.
* **Hash Generation:** While evaluating the unified graph, if the compiler enters an `Object` or `Array` structure completely lacking an `$id`, it dynamically calculates a localized hash alias representing exactly its structural constraints.
* **Promotion:** This inline chunk is mathematically elevated to its own `$id` in the `db.schemas` cache registry. This guarantees that $O(1)$ WebSockets or isolated queries can natively target any arbitrary sub-object of a massive database topology directly without recursively re-parsing its parent's AST block every read.
---
## 2. Validator
## 4. Validator
The Validator provides strict, schema-driven evaluation for the "Punc" architecture.
@ -53,35 +174,13 @@ The Validator provides strict, schema-driven evaluation for the "Punc" architect
JSPG implements specific extensions to the Draft 2020-12 standard to support the Punc architecture's object-oriented needs while heavily optimizing for zero-runtime lookups.
* **Caching Strategy**: The Validator caches the pre-compiled `Database` registry in memory upon initialization (`jspg_setup`). This registry holds the comprehensive graph of schema boundaries, Types, ENUMs, and Foreign Key relationships, acting as the Single Source of Truth for all validation operations without polling Postgres.
#### A. Polymorphism & Referencing (`$ref`, `$family`, and Native Types)
* **Native Type Discrimination (`variations`)**: Schemas defined inside a Postgres `type` are Entities. The validator securely and implicitly manages their `"type"` property. If an entity inherits from `user`, incoming JSON can safely define `{"type": "person"}` without errors, thanks to `compiled_variations` inheritance.
* **Structural Inheritance & Viral Infection (`$ref`)**: `$ref` is used exclusively for structural inheritance and explicit composition, *never* for union creation. A `$ref` ALWAYS targets a specific, *single* schema struct (e.g., `full.person`). It represents an explicit, known structural shape. A Punc request schema that `$ref`s an Entity virally inherits all physical database polymorphism rules for that target.
* **Shape Polymorphism (`$family`)**: Unlike `$ref`, `$family` ALWAYS targets an abstract *table lineage* (e.g., `organization` or `widget`). It instructs the engine to dynamically expand the response payload into multiple possible schema shapes based on the row's physical database `type`. If `{"$family": "widget"}` is used, the Validator dynamically identifies *every* schema in the registry that `$ref`s `widget` (e.g., `stock.widget`, `task.widget`) and recursively evaluates the JSON against all of them.
* **Strict Matches & Depth Heuristic**: Polymorphic structures MUST match exactly **one** schema permutation. If multiple inherited struct permutations pass, JSPG applies the **Depth Heuristic Tie-Breaker**, selecting the candidate deepest in the inheritance tree.
#### B. Dot-Notation Schema Resolution & Database Mapping
* **The Dot Convention**: When a schema represents a specific variation or shape of an underlying physical database `Type` (e.g., a "summary" of a "person"), its `$id` must adhere to a dot-notation suffix convention (e.g., `summary.person` or `full.person`).
* **Entity Resolution**: The framework (Validator, Queryer, Merger) dynamically determines the backing PostgreSQL table structure by splitting the schema's `$id` (or `$ref`) by `.` and extracting the **last segment** (`next_back()`). If the last segment matches a known Database Type (like `person`), the framework natively applies that table's inheritance rules, variations, and physical foreign keys to the schema graph, regardless of the prefix.
#### C. Strict by Default & Extensibility
* **Strictness**: By default, any property not explicitly defined in the schema causes a validation error (effectively enforcing `additionalProperties: false` globally).
* **Extensibility (`extensible: true`)**: To allow a free-for-all of undefined properties, schemas must explicitly declare `"extensible": true`.
* **Structured Additional Properties**: If `additionalProperties: {...}` is defined as a schema, arbitrary keys are allowed so long as their values match the defined type constraint.
* **Inheritance Boundaries**: Strictness resets when crossing `$ref` boundaries. A schema extending a strict parent remains strict unless it explicitly overrides with `"extensible": true`.
#### D. Implicit Keyword Shadowing
* **Inheritance (`$ref` + properties)**: Unlike standard JSON Schema, when a schema uses `$ref` alongside local properties, JSPG implements **Smart Merge**. Local constraints natively take precedence over (shadow) inherited constraints for the same keyword.
* *Example*: If `entity` has `type: {const: "entity"}`, but `person` defines `type: {const: "person"}`, the local `person` const cleanly overrides the inherited one.
* **Composition (`allOf`)**: When evaluating `allOf`, standard intersection rules apply seamlessly. No shadowing occurs, meaning all constraints from all branches must pass.
#### E. Format Leniency for Empty Strings
To simplify frontend form validation, format validators specifically for `uuid`, `date-time`, and `email` explicitly allow empty strings (`""`), treating them as "present but unset".
* **Discriminator Fast Paths & Extraction**: When executing a polymorphic node (`oneOf` or `$family`), the engine statically analyzes the incoming JSON payload for the literal `type` and `kind` string coordinates. It routes the evaluation specifically to matching candidates in $O(1)$ while returning `MISSING_TYPE` ultimata directly.
* **Missing Type Ultimatum**: If an entity logically requires a discriminator and the JSON payload omits it, JSPG short-circuits branch execution entirely, bubbling a single, perfectly-pathed `MISSING_TYPE` error back to the UI natively to prevent confusing cascading failures.
* **Golden Match Context**: When exactly one structural candidate perfectly maps a discriminator, the Validator exclusively cascades that specific structural error context directly to the user, stripping away all noise generated by other parallel schemas.
---
## 3. Merger
## 5. Merger
The Merger provides an automated, high-performance graph synchronization engine. It orchestrates the complex mapping of nested JSON objects into normalized Postgres relational tables, honoring all inheritance and graph constraints.
@ -106,7 +205,7 @@ The Merger provides an automated, high-performance graph synchronization engine.
---
## 4. Queryer
## 6. Queryer
The Queryer transforms Postgres into a pre-compiled Semantic Query Engine, designed to serve the exact shape of Punc responses directly via SQL.
@ -116,7 +215,8 @@ The Queryer transforms Postgres into a pre-compiled Semantic Query Engine, desig
### Core Features
* **Caching Strategy (DashMap SQL Caching)**: The Queryer securely caches its compiled, static SQL string templates per schema permutation inside the `GLOBAL_JSPG` concurrent `DashMap`. This eliminates recursive AST schema crawling on consecutive requests. Furthermore, it evaluates the strings via Postgres SPI (Server Programming Interface) Prepared Statements, leveraging native database caching of execution plans for extreme performance.
* **Schema-to-SQL Compilation**: Compiles JSON Schema ASTs spanning deep arrays directly into static, pre-planned SQL multi-JOIN queries. This explicitly features the `Smart Merge` evaluation engine which natively translates properties through `allOf` and `$ref` inheritances, mapping JSON fields specifically to their physical database table aliases during translation.
* **Schema-to-SQL Compilation**: Compiles JSON Schema ASTs spanning deep arrays directly into static, pre-planned SQL multi-JOIN queries. This explicitly features the `Smart Merge` evaluation engine which natively translates properties through `type` inheritances, mapping JSON fields specifically to their physical database table aliases during translation.
* **Root Null-Stripping Optimization**: Unlike traditional nested document builders, the Queryer intelligently defers Postgres' natively recursive `jsonb_strip_nulls` execution to the absolute apex of the compiled query pipeline. The compiler organically layers millions of rapid `jsonb_build_object()` sub-query allocations instantly, wrapping them in a singular overarching pass. This strips all empty optionals uniformly before exiting the database, maximizing CPU throughput.
* **Dynamic Filtering**: Binds parameters natively through `cue.filters` objects. The queryer enforces a strict, structured, MongoDB-style operator syntax to map incoming JSON request constraints directly to their originating structural table columns. Filters support both flat path notation (e.g., `"contacts/is_primary": {...}`) and deeply nested recursive JSON structures (e.g., `{"contacts": {"is_primary": {...}}}`). The queryer recursively traverses and flattens these structures at AST compilation time.
* **Equality / Inequality**: `{"$eq": value}`, `{"$ne": value}` automatically map to `=` and `!=`.
* **Comparison**: `{"$gt": ...}`, `{"$gte": ...}`, `{"$lt": ...}`, `{"$lte": ...}` directly compile to Postgres comparison operators (`> `, `>=`, `<`, `<=`).
@ -128,13 +228,9 @@ The Queryer transforms Postgres into a pre-compiled Semantic Query Engine, desig
* **Multi-Table Branching**: If the Physical Table is a parent to other tables (e.g. `organization` has variations `["organization", "bot", "person"]`), the compiler generates a dynamic `CASE WHEN type = '...' THEN ...` query, expanding into `JOIN`s for each variation.
* **Single-Table Bypass**: If the Physical Table is a leaf node with only one variation (e.g. `person` has variations `["person"]`), the compiler cleanly bypasses `CASE` generation and compiles a simple `SELECT` across the base table, as all schema extensions (e.g. `light.person`, `full.person`) are guaranteed to reside in the exact same physical row.
### Ad-Hoc Schema Promotion
---
To seamlessly support deeply nested, inline Object definitions that don't declare an explicit `$id`, JSPG aggressively promotes them to standalone topological entities during the database compilation phase.
* **Hash Generation:** While evaluating the unified graph, if the compiler enters an `Object` or `Array` structure completely lacking an `$id`, it dynamically calculates a localized hash alias representing exactly its structural constraints.
* **Promotion:** This inline chunk is mathematically elevated to its own `$id` in the `db.schemas` cache registry. This guarantees that $O(1)$ WebSockets or isolated queries can natively target any arbitrary sub-object of a massive database topology directly without recursively re-parsing its parent's AST block every read.
## 5. Testing & Execution Architecture
## 7. Testing & Execution Architecture
JSPG implements a strict separation of concerns to bypass the need to boot a full PostgreSQL cluster for unit and integration testing. Because `pgrx::spi::Spi` directly links to PostgreSQL C-headers, building the library with `cargo test` on macOS natively normally results in fatal `dyld` crashes.