# JSPG: JSON Schema Postgres **JSPG** is a high-performance PostgreSQL extension for in-memory JSON Schema validation, specifically targeting **Draft 2020-12**. It is designed to serve as the validation engine for the "Punc" architecture, where the database is the single source of truth for all data models and API contracts. ## ๐ŸŽฏ Goals 1. **Draft 2020-12 Compliance**: Attempt to adhere to the official JSON Schema Draft 2020-12 specification. 2. **Ultra-Fast Validation**: Compile schemas into an optimized in-memory representation for near-instant validation during high-throughput workloads. 3. **Connection-Bound Caching**: Leverage the PostgreSQL session lifecycle to maintain a per-connection schema cache, eliminating the need for repetitive parsing. 4. **Structural Inheritance**: Support object-oriented schema design via Implicit Keyword Shadowing and virtual `$family` references. 5. **Punc Integration**: validation is aware of the "Punc" context (request/response) and can validate `cue` objects efficiently. ## ๐Ÿ”Œ API Reference The extension exposes the following functions to PostgreSQL: ### `cache_json_schemas(enums jsonb, types jsonb, puncs jsonb) -> jsonb` Loads and compiles the entire schema registry into the session's memory, atomically replacing the previous validator. * **Inputs**: * `enums`: Array of enum definitions. * `types`: Array of type definitions (core entities). * `puncs`: Array of punc (function) definitions with request/response schemas. * **Behavior**: * Parses all inputs into an internal schema graph. * Resolves all internal references (`$ref`). * Generates virtual union schemas for type hierarchies referenced via `$family`. * Compiles schemas into validators. * **Returns**: `{"response": "success"}` or an error object. ### `mask_json_schema(schema_id text, instance jsonb) -> jsonb` Validates a JSON instance and returns a new JSON object with unknown properties removed (pruned) based on the schema. * **Inputs**: * `schema_id`: The `$id` of the schema to mask against. * `instance`: The JSON data to mask. * **Returns**: * On success: A `Drop` containing the **masked data**. * On failure: A `Drop` containing validation errors. ### `validate_json_schema(schema_id text, instance jsonb) -> jsonb` Validates a JSON instance against a pre-compiled schema. * **Inputs**: * `schema_id`: The `$id` of the schema to validate against (e.g., `person`, `save_person.request`). * `instance`: The JSON data to validate. * **Returns**: * On success: `{"response": "success"}` * On failure: A JSON object containing structured errors (e.g., `{"errors": [...]}`). ### `json_schema_cached(schema_id text) -> bool` Checks if a specific schema ID is currently present in the cache. ### `clear_json_schemas() -> jsonb` Clears the current session's schema cache, freeing memory. ### `show_json_schemas() -> jsonb` Returns a debug dump of the currently cached schemas (for development/debugging). ## โœจ Custom Features & Deviations JSPG implements specific extensions to the Draft 2020-12 standard to support the Punc architecture's object-oriented needs. ### 1. The Unified Semantic Graph & Native Inheritance JSPG goes beyond Draft 2020-12 to natively understand Object-Oriented inheritance and polymorphism. During the `cache_json_schemas()` phase, JSPG builds a single Directed Acyclic Graph (DAG) using **only** the `$ref` keyword. Every schema that uses `$ref` establishes a parent-to-child relationship. Furthermore, `jspg` knows which schemas belong directly to database tables (Entities) versus which are ad-hoc API shapes. * **Native `type` Discrimination**: For any schema that traces its ancestry back to the base `entity`, JSPG securely and implicitly manages the `type` property. You do **not** need to explicitly override `"type": {"const": "person"}` in entity subclasses. If a schema `$ref`s `organization`, JSPG automatically allows the incoming `type` to be anything in the `organization` family tree (e.g., `person`, `bot`), but rigidly truncates/masks the data structure to the requested `organization` shape. * **Ad-Hoc Objects**: If an ad-hoc schema `$ref`s a base object but does not trace back to `entity`, standard JSON Schema rules apply (no magical `type` tracking). > [!NOTE] > **`$ref` never creates a Union.** When you use `$ref`, you are asking for a single, concrete struct/shape. The schema's strict fields will be rigidly enforced, but the `type` property is permitted to match any valid descendant via the native discrimination. ### 2. Shape Polymorphism & Virtual Unions (`$family`) To support polymorphic API contracts and deeply nested UI Unions without manually writing massive `oneOf` blocks, JSPG provides the `$family` macro. While `$ref` guarantees a single shape, `$family` asks the code generators for a true Polymorphic Union class. When `{"$family": "organization.light"}` is encountered, JSPG: 1. Locates the base `organization` node in the Semantic Graph. 2. Recursively walks down to find all descendants via `$ref`. 3. **Strictly Filters** the descendants using the exact dot-notation suffix requested. It will only include descendants whose `$id` matches the shape modifier (e.g., `person.light`, `user.light`). If `bot` has no `.light` shape defined, it is securely omitted from the union. 4. Generates a virtual `oneOf` array containing those precise `$ref`s. This cleanly separates **Database Ancestry** (managed entirely and implicitly by `$ref` for single shapes) from **Shape Variations** (managed explicitly by `$family` to build `oneOf` unions). ### 3. Strict by Default & Extensibility JSPG enforces a "Secure by Default" philosophy. All schemas are treated as if `unevaluatedProperties: false` (and `unevaluatedItems: false`) is set, unless explicitly overridden. * **Strictness**: By default, any property or array item in the instance data that is not explicitly defined in the schema causes a validation error. This prevents clients from sending undeclared fields or extra array elements. * **Extensibility (`extensible: true`)**: To allow a free-for-all of additional, undefined properties or extra array items, you must add `"extensible": true` to the schema. This globally disables the strictness check for that object or array, useful for types designed to be completely open. * **Structured Additional Properties (`additionalProperties: {...}`)**: Instead of a boolean free-for-all, you can define `additionalProperties` as a schema object (e.g., `{"type": "string"}`). This maintains strictness (no arbitrary keys) but allows any extra keys as long as their values match the defined structure. * **Ref Boundaries**: Strictness is reset when crossing `$ref` boundaries. The referenced schema's strictness is determined by its own definition (strict by default unless `extensible: true`), ignoring the caller's state. * **Inheritance**: Strictness is inherited. A schema extending a strict parent will also be strict unless it declares itself `extensible: true`. Conversely, a schema extending a loose parent will also be loose unless it declares itself `extensible: false`. ### 4. Format Leniency for Empty Strings To simplify frontend form logic, the format validators for `uuid`, `date-time`, and `email` explicitly allow empty strings (`""`). This treats an empty string as "present but unset" rather than "invalid format". ### 5. Masking (Constructive Validation) JSPG supports a "Constructive Validation" mode via `mask_json_schema`. This is designed for high-performance API responses where the schema dictates the exact shape of the returned data. * **Mechanism**: The validator traverses the instance against the schema. * **Valid Fields**: Kept in the output. * **Unknown/Extra Fields**: Silently removed (pruned) if `extensible: false` (default). * **Invalid Fields**: Still trigger standard validation errors. This allows the database to return "raw" joined rows (e.g. `SELECT * FROM person JOIN organization ...`) and have JSPG automatically shape the result into the expected API response, removing any internal or unrelated columns not defined in the schema. ## ๐Ÿ—๏ธ Architecture The extension is written in Rust using `pgrx` and structures its schema parser to mirror the Punc Generator's design: * **Single `Schema` Struct**: A unified struct representing the exact layout of a JSON Schema object, including standard keywords and custom vocabularies (`form`, `display`, etc.). * **Compiler Phase**: schema JSONs are parsed into this struct, linked (references resolved), and then compiled into an efficient validation tree. * **Validation Phase**: The compiled validators traverse the JSON instance using `serde_json::Value`. ### Concurrency & Threading ("Atomic Swap") To support high-throughput validation while allowing for runtime schema updates (e.g., during development or hot-reloading), JSPG uses an **Atomic Swap** pattern. 1. **Immutable Validator**: The `Validator` struct immutably owns the `Registry`. Once created, a validator instance (and its registry) never changes. 2. **Global Pointer**: A global `RwLock>>` holds the current active validator. 3. **Lock-Free Reads**: Validation requests acquire a read lock just long enough to clone the `Arc` (incrementing a reference count), then release the lock immediately. Validation proceeds on the snapshot, ensuring no blocking during schema updates. 4. **Atomic Updates**: When schemas are reloaded (`cache_json_schemas`), a new `Registry` and `Validator` are built entirely on the stack. The global pointer is then atomically swapped to the new instance under a write lock. ## ๐Ÿงช Testing Testing is driven by standard Rust unit tests that load JSON fixtures. * **Isolation**: Each test file runs with its own isolated `Registry` and `Validator` instance, created on the stack. This eliminates global state interference and allows tests to run in parallel. * **Fixtures**: The tests are located in `tests/fixtures/*.json` and are executed via `cargo test`.