146 lines
11 KiB
Markdown
146 lines
11 KiB
Markdown
# JSPG: JSON Schema Postgres
|
|
|
|
**JSPG** is a high-performance PostgreSQL extension for in-memory JSON Schema validation, specifically targeting **Draft 2020-12**.
|
|
|
|
It is designed to serve as the validation engine for the "Punc" architecture, where the database is the single source of truth for all data models and API contracts.
|
|
|
|
## 🎯 Goals
|
|
|
|
1. **Draft 2020-12 Compliance**: Attempt to adhere to the official JSON Schema Draft 2020-12 specification.
|
|
2. **Ultra-Fast Validation**: Compile schemas into an optimized in-memory representation for near-instant validation during high-throughput workloads.
|
|
3. **Connection-Bound Caching**: Leverage the PostgreSQL session lifecycle to maintain a per-connection schema cache, eliminating the need for repetitive parsing.
|
|
4. **Structural Inheritance**: Support object-oriented schema design via Implicit Keyword Shadowing and virtual `$family` references.
|
|
5. **Punc Integration**: validation is aware of the "Punc" context (request/response) and can validate `cue` objects efficiently.
|
|
|
|
## 🔌 API Reference
|
|
|
|
The extension exposes the following functions to PostgreSQL:
|
|
|
|
### `cache_json_schemas(enums jsonb, types jsonb, puncs jsonb) -> jsonb`
|
|
|
|
Loads and compiles the entire schema registry into the session's memory, atomically replacing the previous validator.
|
|
|
|
* **Inputs**:
|
|
* `enums`: Array of enum definitions.
|
|
* `types`: Array of type definitions (core entities).
|
|
* `puncs`: Array of punc (function) definitions with request/response schemas.
|
|
* **Behavior**:
|
|
* Parses all inputs into an internal schema graph.
|
|
* Resolves all internal references (`$ref`).
|
|
* Generates virtual union schemas for type hierarchies referenced via `$family`.
|
|
* Compiles schemas into validators.
|
|
* **Returns**: `{"response": "success"}` or an error object.
|
|
|
|
### `mask_json_schema(schema_id text, instance jsonb) -> jsonb`
|
|
|
|
Validates a JSON instance and returns a new JSON object with unknown properties removed (pruned) based on the schema.
|
|
|
|
* **Inputs**:
|
|
* `schema_id`: The `$id` of the schema to mask against.
|
|
* `instance`: The JSON data to mask.
|
|
* **Returns**:
|
|
* On success: A `Drop` containing the **masked data**.
|
|
* On failure: A `Drop` containing validation errors.
|
|
|
|
### `validate_json_schema(schema_id text, instance jsonb) -> jsonb`
|
|
|
|
Validates a JSON instance against a pre-compiled schema.
|
|
|
|
* **Inputs**:
|
|
* `schema_id`: The `$id` of the schema to validate against (e.g., `person`, `save_person.request`).
|
|
* `instance`: The JSON data to validate.
|
|
* **Returns**:
|
|
* On success: `{"response": "success"}`
|
|
* On failure: A JSON object containing structured errors (e.g., `{"errors": [...]}`).
|
|
|
|
### `json_schema_cached(schema_id text) -> bool`
|
|
|
|
Checks if a specific schema ID is currently present in the cache.
|
|
|
|
### `clear_json_schemas() -> jsonb`
|
|
|
|
Clears the current session's schema cache, freeing memory.
|
|
|
|
### `show_json_schemas() -> jsonb`
|
|
|
|
Returns a debug dump of the currently cached schemas (for development/debugging).
|
|
|
|
## ✨ Custom Features & Deviations
|
|
|
|
JSPG implements specific extensions to the Draft 2020-12 standard to support the Punc architecture's object-oriented needs while heavily optimizing for zero-runtime lookups.
|
|
|
|
### 1. Polymorphism & Referencing (`$ref`, `$family`, and Native Types)
|
|
|
|
JSPG replaces the complex, dynamic reference resolution logic of standard JSON Schema (e.g., `$defs`, relative URIs, `$dynamicRef`, `$dynamicAnchor`, `if/then/else`) with a strict, explicitly structured global `$id` system. This powers predictable code generation and blazing-fast runtime validation.
|
|
|
|
#### A. Global `$id` Conventions & Schema Buckets
|
|
Every schema is part of a flat, globally addressable namespace. However, where a schema is defined in the database determines its physical boundaries:
|
|
* **Types (Entities)**: Schemas defined within a Postgres `type` represent entities. The `$id` must be exactly the type name (`person`) or suffixed (`full.person`). All schemas in this bucket receive strict Native Type Discrimination based on the physical table hierarchy.
|
|
* **Puncs (APIs)**: Schemas defined within a `punc` are ad-hoc containers. The `$id` must be exactly `[punc_name].request` or `[punc_name].response`. They are never entities themselves.
|
|
* **Enums (Domains)**: Schemas defined within an `enum` represent enum definitions. The `$id` must be exactly the enum name (`job_status`) or suffixed (`short.job_status`).
|
|
|
|
#### B. Native Type Discrimination (The `variations` Property)
|
|
Because `jspg` knows which schemas are Entities based on their origin bucket (Types), it securely and implicitly manages the `"type"` property by attaching `compiled_variations`.
|
|
If a schema originates in the `user` bucket, the validator does *not* rigidly require `{"type": "user"}`. Instead, it queries the physical Postgres type inheritance graph (e.g. `[entity, organization, user]`) and allows the JSON to be `{"type": "person"}` or `{"type": "bot"}` automatically, enabling seamless API polymorphism.
|
|
|
|
#### C. Structural Inheritance & Viral Infection (`$ref`)
|
|
`$ref` is used exclusively for structural inheritance.
|
|
* **Viral Infection**: If an anonymous schema or an ad-hoc schema (like a Punc Request) `$ref`s a strict Entity schema (like `person.light`), it *virally inherits* the `compiled_variations` of that target. This means a Punc request instantly gains the exact polymorphic security boundaries of the Entity it points to.
|
|
* **`$ref` never creates a Union.** When you use `$ref`, you are asking for a single, concrete struct/shape.
|
|
|
|
#### D. Shape Polymorphism & Virtual Unions (`$family`)
|
|
To support polymorphic API contracts (e.g., heterogeneous arrays of generic widgets) without manually writing massive `oneOf` blocks, JSPG provides the `$family` macro.
|
|
While `$ref` defines rigid structure, `$family` relies on an abstract **Descendants Graph**.
|
|
|
|
During compilation, `jspg` temporarily tracks every `$ref` pointer globally to build a reverse-lookup graph of "Descendants".
|
|
When `{"$family": "widget"}` is encountered, JSPG:
|
|
1. Locates the `widget` schema in the Descendants graph.
|
|
2. Expands the macro by finding *every* schema in the entire database that structurally `$ref`s `widget`, directly or indirectly (e.g., `stock.widget`, an anonymous object, etc.).
|
|
3. Replaces the `$family` keyword with a standard `one_of` array containing `$ref`s to those discovered descendants.
|
|
|
|
If you request `{"$family": "light.widget"}`, it simply expands to all schemas that `$ref` the generic abstract `light.widget` interface.
|
|
This cleanly separates **Database Physics** (derived from the Postgres `Types` bucket and viral `$ref` inheritance) from **Structural Polymorphism** (derived purely from the abstract `$ref` tree).
|
|
|
|
### 2. Strict by Default & Extensibility
|
|
JSPG enforces a "Secure by Default" philosophy. All schemas are treated as if `unevaluatedProperties: false` (and `unevaluatedItems: false`) is set, unless explicitly overridden.
|
|
|
|
* **Strictness**: By default, any property or array item in the instance data that is not explicitly defined in the schema causes a validation error. This prevents clients from sending undeclared fields or extra array elements.
|
|
* **Extensibility (`extensible: true`)**: To allow a free-for-all of additional, undefined properties or extra array items, you must add `"extensible": true` to the schema. This globally disables the strictness check for that object or array, useful for types designed to be completely open.
|
|
* **Structured Additional Properties (`additionalProperties: {...}`)**: Instead of a boolean free-for-all, you can define `additionalProperties` as a schema object (e.g., `{"type": "string"}`). This maintains strictness (no arbitrary keys) but allows any extra keys as long as their values match the defined structure.
|
|
* **Ref Boundaries**: Strictness is reset when crossing `$ref` boundaries. The referenced schema's strictness is determined by its own definition (strict by default unless `extensible: true`), ignoring the caller's state.
|
|
* **Inheritance**: Strictness is inherited. A schema extending a strict parent will also be strict unless it declares itself `extensible: true`. Conversely, a schema extending a loose parent will also be loose unless it declares itself `extensible: false`.
|
|
|
|
### 3. Implicit Keyword Shadowing
|
|
Standard JSON Schema composition (`allOf`) is additive (Intersection), meaning constraints can only be tightened, not replaced. However, JSPG treats `$ref` differently when it appears alongside other properties to support object-oriented inheritance.
|
|
|
|
* **Inheritance (`$ref` + properties)**: When a schema uses `$ref` and defines its own properties, JSPG implements Smart Merge (or Shadowing). If a property is defined in the current schema, its constraints take precedence over the inherited constraints for that specific keyword.
|
|
* **Example**: If Entity defines `type: { const: "entity" }` and Person (which refs Entity) defines `type: { const: "person" }`, validation passes for "person". The local const shadows the inherited const.
|
|
* **Granularity**: Shadowing is per-keyword. If Entity defined `type: { const: "entity", minLength: 5 }`, Person would shadow `const` but still inherit `minLength: 5`.
|
|
* **Composition (`allOf`)**: When using `allOf`, standard intersection rules apply. No shadowing occurs; all constraints from all branches must pass. This is used for mixins or interfaces.
|
|
|
|
### 4. Format Leniency for Empty Strings
|
|
To simplify frontend form logic, the format validators for `uuid`, `date-time`, and `email` explicitly allow empty strings (`""`). This treats an empty string as "present but unset" rather than "invalid format".
|
|
|
|
## 🏗️ Architecture
|
|
|
|
The extension is written in Rust using `pgrx` and structures its schema parser to mirror the Punc Generator's design:
|
|
|
|
* **Single `Schema` Struct**: A unified struct representing the exact layout of a JSON Schema object, including standard keywords and custom vocabularies (`form`, `display`, etc.).
|
|
* **Compiler Phase**: schema JSONs are parsed into this struct, linked (references resolved), and then compiled into an efficient validation tree.
|
|
* **Validation Phase**: The compiled validators traverse the JSON instance using `serde_json::Value`.
|
|
|
|
### Concurrency & Threading ("Atomic Swap")
|
|
|
|
To support high-throughput validation while allowing for runtime schema updates (e.g., during development or hot-reloading), JSPG uses an **Atomic Swap** pattern.
|
|
|
|
1. **Immutable Validator**: The `Validator` struct immutably owns the `Registry`. Once created, a validator instance (and its registry) never changes.
|
|
2. **Global Pointer**: A global `RwLock<Option<Arc<Validator>>>` holds the current active validator.
|
|
3. **Lock-Free Reads**: Validation requests acquire a read lock just long enough to clone the `Arc` (incrementing a reference count), then release the lock immediately. Validation proceeds on the snapshot, ensuring no blocking during schema updates.
|
|
4. **Atomic Updates**: When schemas are reloaded (`cache_json_schemas`), a new `Registry` and `Validator` are built entirely on the stack. The global pointer is then atomically swapped to the new instance under a write lock.
|
|
|
|
## 🧪 Testing
|
|
|
|
Testing is driven by standard Rust unit tests that load JSON fixtures.
|
|
|
|
* **Isolation**: Each test file runs with its own isolated `Registry` and `Validator` instance, created on the stack. This eliminates global state interference and allows tests to run in parallel.
|
|
* **Fixtures**: The tests are located in `tests/fixtures/*.json` and are executed via `cargo test`. |