8.6 KiB
JSPG: JSON Schema Postgres
JSPG is a high-performance PostgreSQL extension for in-memory JSON Schema validation, specifically targeting Draft 2020-12.
It is designed to serve as the validation engine for the "Punc" architecture, where the database is the single source of truth for all data models and API contracts.
🎯 Goals
- Draft 2020-12 Compliance: Attempt to adhere to the official JSON Schema Draft 2020-12 specification.
- Ultra-Fast Validation: Compile schemas into an optimized in-memory representation for near-instant validation during high-throughput workloads.
- Connection-Bound Caching: Leverage the PostgreSQL session lifecycle to maintain a per-connection schema cache, eliminating the need for repetitive parsing.
- Structural Inheritance: Support object-oriented schema design via Implicit Keyword Shadowing and virtual
.familyschemas. - Punc Integration: validation is aware of the "Punc" context (request/response) and can validate
cueobjects efficiently.
🔌 API Reference
The extension exposes the following functions to PostgreSQL:
cache_json_schemas(enums jsonb, types jsonb, puncs jsonb) -> jsonb
Loads and compiles the entire schema registry into the session's memory, atomically replacing the previous validator.
- Inputs:
enums: Array of enum definitions.types: Array of type definitions (core entities).puncs: Array of punc (function) definitions with request/response schemas.
- Behavior:
- Parses all inputs into an internal schema graph.
- Resolves all internal references (
$ref). - Generates virtual
.familyschemas for type hierarchies. - Compiles schemas into validators.
- Returns:
{"response": "success"}or an error object.
mask_json_schema(schema_id text, instance jsonb) -> jsonb
Validates a JSON instance and returns a new JSON object with unknown properties removed (pruned) based on the schema.
- Inputs:
schema_id: The$idof the schema to mask against.instance: The JSON data to mask.
- Returns:
- On success: A
Dropcontaining the masked data. - On failure: A
Dropcontaining validation errors.
- On success: A
validate_json_schema(schema_id text, instance jsonb) -> jsonb
Validates a JSON instance against a pre-compiled schema.
- Inputs:
schema_id: The$idof the schema to validate against (e.g.,person,save_person.request).instance: The JSON data to validate.
- Returns:
- On success:
{"response": "success"} - On failure: A JSON object containing structured errors (e.g.,
{"errors": [...]}).
- On success:
json_schema_cached(schema_id text) -> bool
Checks if a specific schema ID is currently present in the cache.
clear_json_schemas() -> jsonb
Clears the current session's schema cache, freeing memory.
show_json_schemas() -> jsonb
Returns a debug dump of the currently cached schemas (for development/debugging).
✨ Custom Features & Deviations
JSPG implements specific extensions to the Draft 2020-12 standard to support the Punc architecture's object-oriented needs.
1. Implicit Keyword Shadowing
Standard JSON Schema composition (allOf) is additive (Intersection), meaning constraints can only be tightened, not replaced. However, JSPG treats $ref differently when it appears alongside other properties to support object-oriented inheritance.
-
Inheritance (
$ref+properties): When a schema uses$refand defines its own properties, JSPG implements Smart Merge (or Shadowing). If a property is defined in the current schema, its constraints take precedence over the inherited constraints for that specific keyword.- Example: If
Entitydefinestype: { const: "entity" }andPerson(which refs Entity) definestype: { const: "person" }, validation passes for "person". The localconstshadows the inheritedconst. - Granularity: Shadowing is per-keyword. If
Entitydefinedtype: { const: "entity", minLength: 5 },Personwould shadowconstbut still inheritminLength: 5.
- Example: If
-
Composition (
allOf): When usingallOf, standard intersection rules apply. No shadowing occurs; all constraints from all branches must pass. This is used for mixins or interfaces.
2. Virtual Family Schemas (.family)
To support polymorphic fields (e.g., a field that accepts any "User" type), JSPG generates virtual schemas representing type hierarchies.
- Mechanism: When caching types, if a type defines a
hierarchy(e.g.,["entity", "organization", "person"]), JSPG generates a schema likeorganization.familywhich is aoneOfcontaining refs to all valid descendants.
3. Strict by Default & Extensibility
JSPG enforces a "Secure by Default" philosophy. All schemas are treated as if unevaluatedProperties: false (and unevaluatedItems: false) is set, unless explicitly overridden.
- Strictness: By default, any property in the instance data that is not explicitly defined in the schema causes a validation error. This prevents clients from sending undeclared fields.
- Extensibility (
extensible: true): To allow additional, undefined properties, you must add"extensible": trueto the schema. This is useful for types that are designed to be open for extension. - Ref Boundaries: Strictness is reset when crossing
$refboundaries. The referenced schema's strictness is determined by its own definition (strict by default unlessextensible: true), ignoring the caller's state. - Inheritance: Strictness is inherited. A schema extending a strict parent will also be strict unless it declares itself
extensible: true. Conversely, a schema extending a loose parent will also be loose unless it declares itselfextensible: false.
4. Format Leniency for Empty Strings
To simplify frontend form logic, the format validators for uuid, date-time, and email explicitly allow empty strings (""). This treats an empty string as "present but unset" rather than "invalid format".
5. Masking (Constructive Validation)
JSPG supports a "Constructive Validation" mode via mask_json_schema. This is designed for high-performance API responses where the schema dictates the exact shape of the returned data.
- Mechanism: The validator traverses the instance against the schema.
- Valid Fields: Kept in the output.
- Unknown/Extra Fields: Silently removed (pruned) if
extensible: false(default). - Invalid Fields: Still trigger standard validation errors.
This allows the database to return "raw" joined rows (e.g. SELECT * FROM person JOIN organization ...) and have JSPG automatically shape the result into the expected API response, removing any internal or unrelated columns not defined in the schema.
🏗️ Architecture
The extension is written in Rust using pgrx and structures its schema parser to mirror the Punc Generator's design:
- Single
SchemaStruct: A unified struct representing the exact layout of a JSON Schema object, including standard keywords and custom vocabularies (form,display, etc.). - Compiler Phase: schema JSONs are parsed into this struct, linked (references resolved), and then compiled into an efficient validation tree.
- Validation Phase: The compiled validators traverse the JSON instance using
serde_json::Value.
Concurrency & Threading ("Atomic Swap")
To support high-throughput validation while allowing for runtime schema updates (e.g., during development or hot-reloading), JSPG uses an Atomic Swap pattern.
- Immutable Validator: The
Validatorstruct immutably owns theRegistry. Once created, a validator instance (and its registry) never changes. - Global Pointer: A global
RwLock<Option<Arc<Validator>>>holds the current active validator. - Lock-Free Reads: Validation requests acquire a read lock just long enough to clone the
Arc(incrementing a reference count), then release the lock immediately. Validation proceeds on the snapshot, ensuring no blocking during schema updates. - Atomic Updates: When schemas are reloaded (
cache_json_schemas), a newRegistryandValidatorare built entirely on the stack. The global pointer is then atomically swapped to the new instance under a write lock.
🧪 Testing
Testing is driven by standard Rust unit tests that load JSON fixtures.
- Isolation: Each test file runs with its own isolated
RegistryandValidatorinstance, created on the stack. This eliminates global state interference and allows tests to run in parallel. - Fixtures: The tests are located in
tests/fixtures/*.jsonand are executed viacargo test.