Files
jspg/GEMINI.md

9.9 KiB

JSPG: JSON Schema Postgres

JSPG is a high-performance PostgreSQL extension for in-memory JSON Schema validation, specifically targeting Draft 2020-12.

It is designed to serve as the validation engine for the "Punc" architecture, where the database is the single source of truth for all data models and API contracts.

🎯 Goals

  1. Draft 2020-12 Compliance: Attempt to adhere to the official JSON Schema Draft 2020-12 specification.
  2. Ultra-Fast Validation: Compile schemas into an optimized in-memory representation for near-instant validation during high-throughput workloads.
  3. Connection-Bound Caching: Leverage the PostgreSQL session lifecycle to maintain a per-connection schema cache, eliminating the need for repetitive parsing.
  4. Structural Inheritance: Support object-oriented schema design via Implicit Keyword Shadowing and virtual $family references.
  5. Punc Integration: validation is aware of the "Punc" context (request/response) and can validate cue objects efficiently.

🔌 API Reference

The extension exposes the following functions to PostgreSQL:

cache_json_schemas(enums jsonb, types jsonb, puncs jsonb) -> jsonb

Loads and compiles the entire schema registry into the session's memory, atomically replacing the previous validator.

  • Inputs:
    • enums: Array of enum definitions.
    • types: Array of type definitions (core entities).
    • puncs: Array of punc (function) definitions with request/response schemas.
  • Behavior:
    • Parses all inputs into an internal schema graph.
    • Resolves all internal references ($ref).
    • Generates virtual union schemas for type hierarchies referenced via $family.
    • Compiles schemas into validators.
  • Returns: {"response": "success"} or an error object.

mask_json_schema(schema_id text, instance jsonb) -> jsonb

Validates a JSON instance and returns a new JSON object with unknown properties removed (pruned) based on the schema.

  • Inputs:
    • schema_id: The $id of the schema to mask against.
    • instance: The JSON data to mask.
  • Returns:
    • On success: A Drop containing the masked data.
    • On failure: A Drop containing validation errors.

validate_json_schema(schema_id text, instance jsonb) -> jsonb

Validates a JSON instance against a pre-compiled schema.

  • Inputs:
    • schema_id: The $id of the schema to validate against (e.g., person, save_person.request).
    • instance: The JSON data to validate.
  • Returns:
    • On success: {"response": "success"}
    • On failure: A JSON object containing structured errors (e.g., {"errors": [...]}).

json_schema_cached(schema_id text) -> bool

Checks if a specific schema ID is currently present in the cache.

clear_json_schemas() -> jsonb

Clears the current session's schema cache, freeing memory.

show_json_schemas() -> jsonb

Returns a debug dump of the currently cached schemas (for development/debugging).

Custom Features & Deviations

JSPG implements specific extensions to the Draft 2020-12 standard to support the Punc architecture's object-oriented needs.

1. The Unified Semantic Graph & Native Inheritance

JSPG goes beyond Draft 2020-12 to natively understand Object-Oriented inheritance and polymorphism. During the cache_json_schemas() phase, JSPG builds a single Directed Acyclic Graph (DAG) using only the $ref keyword. Every schema that uses $ref establishes a parent-to-child relationship.

Furthermore, jspg knows which schemas belong directly to database tables (Entities) versus which are ad-hoc API shapes.

  • Native type Discrimination: For any schema that traces its ancestry back to the base entity, JSPG securely and implicitly manages the type property. You do not need to explicitly override "type": {"const": "person"} in entity subclasses. If a schema $refs organization, JSPG automatically allows the incoming type to be anything in the organization family tree (e.g., person, bot), but rigidly truncates/masks the data structure to the requested organization shape.
  • Ad-Hoc Objects: If an ad-hoc schema $refs a base object but does not trace back to entity, standard JSON Schema rules apply (no magical type tracking).

Note

$ref never creates a Union. When you use $ref, you are asking for a single, concrete struct/shape. The schema's strict fields will be rigidly enforced, but the type property is permitted to match any valid descendant via the native discrimination.

2. Shape Polymorphism & Virtual Unions ($family)

To support polymorphic API contracts and deeply nested UI Unions without manually writing massive oneOf blocks, JSPG provides the $family macro. While $ref guarantees a single shape, $family asks the code generators for a true Polymorphic Union class.

When {"$family": "organization.light"} is encountered, JSPG:

  1. Locates the base organization node in the Semantic Graph.
  2. Recursively walks down to find all descendants via $ref.
  3. Strictly Filters the descendants using the exact dot-notation suffix requested. It will only include descendants whose $id matches the shape modifier (e.g., person.light, user.light). If bot has no .light shape defined, it is securely omitted from the union.
  4. Generates a virtual oneOf array containing those precise $refs.

This cleanly separates Database Ancestry (managed entirely and implicitly by $ref for single shapes) from Shape Variations (managed explicitly by $family to build oneOf unions).

3. Strict by Default & Extensibility

JSPG enforces a "Secure by Default" philosophy. All schemas are treated as if unevaluatedProperties: false (and unevaluatedItems: false) is set, unless explicitly overridden.

  • Strictness: By default, any property or array item in the instance data that is not explicitly defined in the schema causes a validation error. This prevents clients from sending undeclared fields or extra array elements.
  • Extensibility (extensible: true): To allow a free-for-all of additional, undefined properties or extra array items, you must add "extensible": true to the schema. This globally disables the strictness check for that object or array, useful for types designed to be completely open.
  • Structured Additional Properties (additionalProperties: {...}): Instead of a boolean free-for-all, you can define additionalProperties as a schema object (e.g., {"type": "string"}). This maintains strictness (no arbitrary keys) but allows any extra keys as long as their values match the defined structure.
  • Ref Boundaries: Strictness is reset when crossing $ref boundaries. The referenced schema's strictness is determined by its own definition (strict by default unless extensible: true), ignoring the caller's state.
  • Inheritance: Strictness is inherited. A schema extending a strict parent will also be strict unless it declares itself extensible: true. Conversely, a schema extending a loose parent will also be loose unless it declares itself extensible: false.

4. Format Leniency for Empty Strings

To simplify frontend form logic, the format validators for uuid, date-time, and email explicitly allow empty strings (""). This treats an empty string as "present but unset" rather than "invalid format".

5. Masking (Constructive Validation)

JSPG supports a "Constructive Validation" mode via mask_json_schema. This is designed for high-performance API responses where the schema dictates the exact shape of the returned data.

  • Mechanism: The validator traverses the instance against the schema.
  • Valid Fields: Kept in the output.
  • Unknown/Extra Fields: Silently removed (pruned) if extensible: false (default).
  • Invalid Fields: Still trigger standard validation errors.

This allows the database to return "raw" joined rows (e.g. SELECT * FROM person JOIN organization ...) and have JSPG automatically shape the result into the expected API response, removing any internal or unrelated columns not defined in the schema.

🏗️ Architecture

The extension is written in Rust using pgrx and structures its schema parser to mirror the Punc Generator's design:

  • Single Schema Struct: A unified struct representing the exact layout of a JSON Schema object, including standard keywords and custom vocabularies (form, display, etc.).
  • Compiler Phase: schema JSONs are parsed into this struct, linked (references resolved), and then compiled into an efficient validation tree.
  • Validation Phase: The compiled validators traverse the JSON instance using serde_json::Value.

Concurrency & Threading ("Atomic Swap")

To support high-throughput validation while allowing for runtime schema updates (e.g., during development or hot-reloading), JSPG uses an Atomic Swap pattern.

  1. Immutable Validator: The Validator struct immutably owns the Registry. Once created, a validator instance (and its registry) never changes.
  2. Global Pointer: A global RwLock<Option<Arc<Validator>>> holds the current active validator.
  3. Lock-Free Reads: Validation requests acquire a read lock just long enough to clone the Arc (incrementing a reference count), then release the lock immediately. Validation proceeds on the snapshot, ensuring no blocking during schema updates.
  4. Atomic Updates: When schemas are reloaded (cache_json_schemas), a new Registry and Validator are built entirely on the stack. The global pointer is then atomically swapped to the new instance under a write lock.

🧪 Testing

Testing is driven by standard Rust unit tests that load JSON fixtures.

  • Isolation: Each test file runs with its own isolated Registry and Validator instance, created on the stack. This eliminates global state interference and allows tests to run in parallel.
  • Fixtures: The tests are located in tests/fixtures/*.json and are executed via cargo test.