jspg progress

2026-02-17 17:41:54 -05:00
parent 6e06b6fdc2
commit 32ed463df8
188 changed files with 36654 additions and 15058 deletions
--- a/GEMINI.md
+++ b/GEMINI.md
@ -1,135 +1,99 @@
-# Gemini Project Overview: `jspg`
+# JSPG: JSON Schema Postgres

-This document outlines the purpose of the `jspg` project, its architecture, and the specific modifications made to the vendored `boon` JSON schema validator crate.
+**JSPG** is a high-performance PostgreSQL extension for in-memory JSON Schema validation, specifically targeting **Draft 2020-12**.

-## What is `jspg`?
+It is designed to serve as the validation engine for the "Punc" architecture, where the database is the single source of truth for all data models and API contracts.

-`jspg` is a PostgreSQL extension written in Rust using the `pgrx` framework. Its primary function is to provide fast, in-database JSON schema validation against the 2020-12 draft of the JSON Schema specification.
+## 🎯 Goals

-### How It Works
+1.  **Draft 2020-12 Compliance**: Attempt to adhere to the official JSON Schema Draft 2020-12 specification.
+2.  **Ultra-Fast Validation**: Compile schemas into an optimized in-memory representation for near-instant validation during high-throughput workloads.
+3.  **Connection-Bound Caching**: Leverage the PostgreSQL session lifecycle to maintain a per-connection schema cache, eliminating the need for repetitive parsing.
+4.  **Structural Inheritance**: Support object-oriented schema design via Implicit Keyword Shadowing and virtual `.family` schemas.
+5.  **Punc Integration**: validation is aware of the "Punc" context (request/response) and can validate `cue` objects efficiently.

-The extension is designed for high-performance scenarios where schemas are defined once and used many times for validation. It achieves this through an in-memory cache.
+## 🔌 API Reference

-1.  **Caching and Pre-processing:** A user first calls the `cache_json_schemas(enums, types, puncs)` SQL function. This function takes arrays of JSON objects representing different kinds of schemas:
-    -   `enums`: Standalone enum schemas (e.g., for a `task_priority` list).
-    -   `types`: Schemas for core application data models (e.g., `person`, `organization`). These may contain a `hierarchy` array for inheritance information.
-    -   `puncs`: Schemas for API/function-specific requests and responses.
+The extension exposes the following functions to PostgreSQL:

-    Before compiling, `jspg` performs a crucial **pre-processing step** for type hierarchies. It inspects each definition in the `types` array. If a type includes a `hierarchy` array (e.g., a `person` type with `["entity", "organization", "user", "person"]`), `jspg` uses this to build a map of "type families."
+### `cache_json_schemas(enums jsonb, types jsonb, puncs jsonb) -> jsonb`

-    From this map, it generates new, virtual schemas on the fly. For example, for the `organization` type, it will generate a schema with `$id: "organization.family"` that contains an `enum` of all its descendant types, such as `["organization", "user", 'person"]`.
+Loads and compiles the entire schema registry into the session's memory.

-    This allows developers to write more flexible schemas. Instead of strictly requiring a `const` type, you can validate against an entire inheritance chain:
+*   **Inputs**:
+    *   `enums`: Array of enum definitions.
+    *   `types`: Array of type definitions (core entities).
+    *   `puncs`: Array of punc (function) definitions with request/response schemas.
+*   **Behavior**:
+    *   Parses all inputs into an internal schema graph.
+    *   Resolves all internal references (`$ref`).
+    *   Generates virtual `.family` schemas for type hierarchies.
+    *   Compiles schemas into validators.
+*   **Returns**: `{"response": "success"}` or an error object.

-    ```json
-    // In an "organization" schema definition
-    "properties": {
-      "type": {
-        // Allows the 'type' field to be "organization", "user", or "person"
-        "$ref": "organization.family",
-        "override": true
-      }
-    }
-    ```
+### `validate_json_schema(schema_id text, instance jsonb) -> jsonb`

-    Finally, all user-defined schemas and the newly generated `.family` schemas are passed to the vendored `boon` crate, compiled into an efficient internal format, and stored in a static, in-memory `SCHEMA_CACHE`. This cache is managed by a `RwLock` to allow for high-performance, concurrent reads during validation.
+Validates a JSON instance against a pre-compiled schema.

-2.  **Validation:** The `validate_json_schema(schema_id, instance)` SQL function is then used to validate a JSONB `instance` against a specific, pre-cached schema identified by its `$id`. This function looks up the compiled schema in the cache and runs the validation, returning a success response or a detailed error report.
+*   **Inputs**:
+    *   `schema_id`: The `$id` of the schema to validate against (e.g., `person`, `save_person.request`).
+    *   `instance`: The JSON data to validate.
+*   **Returns**:
+    *   On success: `{"response": "success"}`
+    *   On failure: A JSON object containing structured errors (e.g., `{"errors": [...]}`).

-3.  **Custom Logic:** `jspg` uses a locally modified (vendored) version of the `boon` crate. This allows for powerful, application-specific validation logic that goes beyond the standard JSON Schema specification, such as runtime-based strictness.
+### `json_schema_cached(schema_id text) -> bool`

-### Error Handling
+Checks if a specific schema ID is currently present in the cache.

-When validation fails, `jspg` provides a detailed error report in a consistent JSON format, which we refer to as a "DropError". This process involves two main helper functions in `src/lib.rs`:
+### `clear_json_schemas() -> jsonb`

-1.  **`collect_errors`**: `boon` returns a nested tree of `ValidationError` objects. This function recursively traverses that tree to find the most specific, underlying causes of the failure. It filters out structural errors (like `allOf` or `anyOf`) to create a flat list of concrete validation failures.
+Clears the current session's schema cache, freeing memory.

-2.  **`format_errors`**: This function takes the flat list of errors and transforms each one into the final DropError JSON format. It also de-duplicates errors that occur at the same JSON Pointer path, ensuring a cleaner output if a single value violates multiple constraints.
+### `show_json_schemas() -> jsonb`

-#### DropError Format
+Returns a debug dump of the currently cached schemas (for development/debugging).

-A DropError object provides a clear, structured explanation of a validation failure:
+## ✨ Custom Features & Deviations

-```json
-{
-  "code": "ADDITIONAL_PROPERTIES_NOT_ALLOWED",
-  "message": "Property 'extra' is not allowed",
-  "details": {
-    "path": "/extra",
-    "context": "not allowed",
-    "cause": {
-      "got": [
-        "extra"
-      ]
-    },
-    "schema": "basic_strict_test.request"
-  }
-}
-```
+JSPG implements specific extensions to the Draft 2020-12 standard to support the Punc architecture's object-oriented needs.

-   `code` (string): A machine-readable error code (e.g., `ADDITIONAL_PROPERTIES_NOT_ALLOWED`, `MIN_LENGTH_VIOLATED`).
-   `message` (string): A human-readable summary of the error.
-   `details` (object):
-    -   `path` (string): The JSON Pointer path to the invalid data within the instance.
-    -   `context` (any): The actual value that failed validation.
-    -   `cause` (any): The low-level reason from the validator, often including the expected value (`want`) and the actual value (`got`).
-    -   `schema` (string): The `$id` of the schema that was being validated.
+### 1. Implicit Keyword Shadowing
+Standard JSON Schema composition (`allOf`) is additive (Intersection), meaning constraints can only be tightened, not replaced. However, JSPG treats `$ref` differently when it appears alongside other properties to support object-oriented inheritance.

---
+*   **Inheritance (`$ref` + `properties`)**: When a schema uses `$ref` *and* defines its own properties, JSPG implements **Smart Merge** (or Shadowing). If a property is defined in the current schema, its constraints take precedence over the inherited constraints for that specific keyword.
+    *   *Example*: If `Entity` defines `type: { const: "entity" }` and `Person` (which refs Entity) defines `type: { const: "person" }`, validation passes for "person". The local `const` shadows the inherited `const`.
+    *   *Granularity*: Shadowing is per-keyword. If `Entity` defined `type: { const: "entity", minLength: 5 }`, `Person` would shadow `const` but still inherit `minLength: 5`.

-## `boon` Crate Modifications
+*   **Composition (`allOf`)**: When using `allOf`, standard intersection rules apply. No shadowing occurs; all constraints from all branches must pass. This is used for mixins or interfaces.

-The version of `boon` located in the `validator/` directory has been significantly modified to support application-specific validation logic that goes beyond the standard JSON Schema specification.
+### 2. Virtual Family Schemas (`.family`)
+To support polymorphic fields (e.g., a field that accepts any "User" type), JSPG generates virtual schemas representing type hierarchies.

-### 1. Property-Level Overrides for Inheritance
+*   **Mechanism**: When caching types, if a type defines a `hierarchy` (e.g., `["entity", "organization", "person"]`), JSPG generates a schema like `organization.family` which is a `oneOf` containing refs to all valid descendants.

-   **Problem:** A primary use case for this project is validating data models that use `$ref` to create inheritance chains (e.g., a `person` schema `$ref`s a `user` schema, which `$ref`s an `entity` schema). A common pattern is to use a `const` keyword on a `type` property to identify the specific model (e.g., `"type": {"const": "person"}`). However, standard JSON Schema composition with `allOf` (which is implicitly used by `$ref`) treats these as a logical AND. This creates an impossible condition where an instance's `type` property would need to be "person" AND "user" AND "entity" simultaneously.
+### 3. Strict by Default & Extensibility
+JSPG enforces a "Secure by Default" philosophy. All schemas are treated as if `unevaluatedProperties: false` (and `unevaluatedItems: false`) is set, unless explicitly overridden.

-   **Solution:** We've implemented a custom, explicit override mechanism. A new keyword, `"override": true`, can be added to any property definition within a schema.
+*   **Strictness**: By default, any property in the instance data that is not explicitly defined in the schema causes a validation error. This prevents clients from sending undeclared fields.
+*   **Extensibility (`extensible: true`)**: To allow additional, undefined properties, you must add `"extensible": true` to the schema. This is useful for types that are designed to be open for extension.
+*   **Ref Boundaries**: Strictness is reset when crossing `$ref` boundaries. The referenced schema's strictness is determined by its own definition (strict by default unless `extensible: true`), ignoring the caller's state.
+*   **Inheritance**: Strictness is inherited. A schema extending a strict parent will also be strict unless it declares itself `extensible: true`. Conversely, a schema extending a loose parent will also be loose unless it declares itself `extensible: false`.

-    ```json
-    // person.json
-    {
-      "$id": "person",
-      "$ref": "user",
-      "properties": {
-        "type": { "const": "person", "override": true }
-      }
-    }
-    ```

-    This signals to the validator that this definition of the `type` property should be the *only* one applied, and any definitions for `type` found in base schemas (like `user` or `entity`) should be ignored for the duration of this validation.
+### 4. Format Leniency for Empty Strings
+To simplify frontend form logic, the format validators for `uuid`, `date-time`, and `email` explicitly allow empty strings (`""`). This treats an empty string as "present but unset" rather than "invalid format".

-#### Key Changes
+## 🏗️ Architecture

-This was achieved by making the validator stateful, using a pattern already present in `boon` for handling `unevaluatedProperties`.
+The extension is written in Rust using `pgrx` and structures its schema parser to mirror the Punc Generator's design:

-1.  **Meta-Schema Update**: The meta-schema for Draft 2020-12 was modified to recognize `"override": true` as a valid keyword within a schema object, preventing the compiler from rejecting our custom schemas.
+*   **Single `Schema` Struct**: A unified struct representing the exact layout of a JSON Schema object, including standard keywords and custom vocabularies (`form`, `display`, etc.).
+*   **Compiler Phase**: schema JSONs are parsed into this struct, linked (references resolved), and then compiled into an efficient validation tree.
+*   **Validation Phase**: The compiled validators traverse the JSON instance using `serde_json::Value`.

-2.  **Compiler Modification**: The schema compiler in `validator/src/compiler.rs` was updated. It now inspects sub-schemas within a `properties` keyword and, if it finds `"override": true`, it records the name of that property in a new `override_properties` `HashSet` on the compiled `Schema` struct.
+## 🧪 Testing

-3.  **Stateful Validator with `Override` Context**: The core `Validator` in `validator/src/validator.rs` was modified to carry an `Override` context (a `HashSet` of property names) throughout the validation process.
-    -   **Initialization**: When validation begins, the `Override` context is created and populated with the names of any properties that the top-level schema has marked with `override`.
-    -   **Propagation**: As the validator descends through a `$ref` or `allOf`, this `Override` context is cloned and passed down. The child schema adds its own override properties to the set, ensuring that higher-level overrides are always maintained.
-    -   **Enforcement**: In `obj_validate`, before a property is validated, the validator first checks if the property's name exists in the `Override` context it has received. If it does, it means a parent schema has already claimed responsibility for validating this property, so the child validator **skips** it entirely. This effectively achieves the "top-level wins" inheritance model.
+Testing is driven by standard Rust unit tests that load JSON fixtures.

-This approach cleanly integrates our desired inheritance behavior directly into the validator with minimal and explicit deviation from the standard, avoiding the need for a complex, post-processing validation function like the old `walk_and_validate_refs`.
-
-### 2. Recursive Runtime Strictness Control
-
-   **Problem:** The `jspg` project requires that certain schemas (specifically those for public `puncs` and global `type`s) enforce a strict "no extra properties" policy. This strictness needs to be decided at runtime and must cascade through the entire validation hierarchy, including all nested objects and `$ref` chains. A compile-time flag was unsuitable because it would incorrectly apply strictness to shared, reusable schemas.
-
-   **Solution:** A runtime validation option was implemented to enforce strictness recursively. This required several coordinated changes to the `boon` validator.
-
-#### Key Changes
-
-1.  **`ValidationOptions` Struct**: A new `ValidationOptions { be_strict: bool }` struct was added to `validator/src/lib.rs`. The `jspg` code in `src/lib.rs` determines if a validation run should be strict and passes this struct to the validator.
-
-2.  **Strictness Check in `uneval_validate`**: The original `boon` only checked for unevaluated properties if the `unevaluatedProperties` keyword was present in the schema. We added an `else if be_strict` block to `uneval_validate` in `validator/src/validator.rs`. This block triggers a check for any leftover unevaluated properties at the end of a validation pass and reports them as errors, effectively enforcing our runtime strictness rule.
-
-3.  **Correct Context Propagation**: The most complex part of the fix was ensuring the set of unevaluated properties was correctly maintained across different validation contexts (especially `$ref` and nested property validations). Three critical changes were made:
-    -   **Inheriting Context in `_validate_self`**: When validating keywords that apply to the same instance (like `$ref` or `allOf`), the sub-validator must know what properties the parent has already evaluated. We changed the creation of the `Validator` inside `_validate_self` to pass a clone of the parent's `uneval` state (`uneval: self.uneval.clone()`) instead of creating a new one from scratch. This allows the context to flow downwards.
-    -   **Isolating Context in `validate_val`**: Conversely, when validating a property's value, that value is a *different* part of the JSON instance. The sub-validation should not affect the parent's list of unevaluated properties. We fixed this by commenting out the `self.uneval.merge(...)` call in the `validate_val` function.
-    -   **Simplifying `Uneval::merge`**: The original logic for merging `uneval` state was different for `$ref` keywords. This was incorrect. We simplified the `merge` function to *always* perform an intersection (`retain`), which correctly combines the knowledge of evaluated properties from different schema parts that apply to the same instance.
-
-4.  **Removing Incompatible Assertions**: The changes to context propagation broke several `debug_assert!` macros in the `arr_validate` function, which were part of `boon`'s original design. Since our new validation flow is different but correct, these assertions were removed.
+The tests are located in `tests/fixtures/*.json` and are executed via `cargo test`.