Gemini Project Overview: `jspg`

This document outlines the purpose of the jspg project, its architecture, and the specific modifications made to the vendored boon JSON schema validator crate.

What is `jspg`?

jspg is a PostgreSQL extension written in Rust using the pgrx framework. Its primary function is to provide fast, in-database JSON schema validation against the 2020-12 draft of the JSON Schema specification.

How It Works

The extension is designed for high-performance scenarios where schemas are defined once and used many times for validation. It achieves this through an in-memory cache.

Caching: A user first calls the cache_json_schemas(enums, types, puncs) SQL function. This function takes arrays of JSON objects representing different kinds of schemas within a larger application framework. It uses the vendored boon crate to compile all these schemas into an efficient internal format and stores them in a static, in-memory SCHEMA_CACHE. This cache is managed by a RwLock to allow concurrent reads during validation.
Validation: The validate_json_schema(schema_id, instance) SQL function is then used to validate a JSONB instance against a specific, pre-cached schema identified by its $id. This function looks up the compiled schema in the cache and runs the validation, returning a success response or a detailed error report.
Custom Logic: jspg uses a locally modified (vendored) version of the boon crate. This allows for powerful, application-specific validation logic that goes beyond the standard JSON Schema specification, such as runtime-based strictness.

Error Handling

When validation fails, jspg provides a detailed error report in a consistent JSON format, which we refer to as a "DropError". This process involves two main helper functions in src/lib.rs:

collect_errors: boon returns a nested tree of ValidationError objects. This function recursively traverses that tree to find the most specific, underlying causes of the failure. It filters out structural errors (like allOf or anyOf) to create a flat list of concrete validation failures.
format_errors: This function takes the flat list of errors and transforms each one into the final DropError JSON format. It also de-duplicates errors that occur at the same JSON Pointer path, ensuring a cleaner output if a single value violates multiple constraints.

DropError Format

A DropError object provides a clear, structured explanation of a validation failure:

{
  "code": "ADDITIONAL_PROPERTIES_NOT_ALLOWED",
  "message": "Property 'extra' is not allowed",
  "details": {
    "path": "/extra",
    "context": "not allowed",
    "cause": {
      "got": [
        "extra"
      ]
    },
    "schema": "basic_strict_test.request"
  }
}

code (string): A machine-readable error code (e.g., ADDITIONAL_PROPERTIES_NOT_ALLOWED, MIN_LENGTH_VIOLATED).
message (string): A human-readable summary of the error.
details (object):
- path (string): The JSON Pointer path to the invalid data within the instance.
- context (any): The actual value that failed validation.
- cause (any): The low-level reason from the validator, often including the expected value (want) and the actual value (got).
- schema (string): The $id of the schema that was being validated.

`boon` Crate Modifications

The version of boon located in the validator/ directory has been significantly modified to support application-specific validation logic that goes beyond the standard JSON Schema specification.

1. Property-Level Overrides for Inheritance

Problem: A primary use case for this project is validating data models that use $ref to create inheritance chains (e.g., a person schema $refs a user schema, which $refs an entity schema). A common pattern is to use a const keyword on a type property to identify the specific model (e.g., "type": {"const": "person"}). However, standard JSON Schema composition with allOf (which is implicitly used by $ref) treats these as a logical AND. This creates an impossible condition where an instance's type property would need to be "person" AND "user" AND "entity" simultaneously.
Solution: We've implemented a custom, explicit override mechanism. A new keyword, "override": true, can be added to any property definition within a schema.
```
// person.json
{
  "$id": "person",
  "$ref": "user",
  "properties": {
    "type": { "const": "person", "override": true }
  }
}
```
This signals to the validator that this definition of the type property should be the only one applied, and any definitions for type found in base schemas (like user or entity) should be ignored for the duration of this validation.

Key Changes

This was achieved by making the validator stateful, using a pattern already present in boon for handling unevaluatedProperties.

Meta-Schema Update: The meta-schema for Draft 2020-12 was modified to recognize "override": true as a valid keyword within a schema object, preventing the compiler from rejecting our custom schemas.
Compiler Modification: The schema compiler in validator/src/compiler.rs was updated. It now inspects sub-schemas within a properties keyword and, if it finds "override": true, it records the name of that property in a new override_properties HashSet on the compiled Schema struct.
Stateful Validator with Override Context: The core Validator in validator/src/validator.rs was modified to carry an Override context (a HashSet of property names) throughout the validation process.
- Initialization: When validation begins, the Override context is created and populated with the names of any properties that the top-level schema has marked with override.
- Propagation: As the validator descends through a $ref or allOf, this Override context is cloned and passed down. The child schema adds its own override properties to the set, ensuring that higher-level overrides are always maintained.
- Enforcement: In obj_validate, before a property is validated, the validator first checks if the property's name exists in the Override context it has received. If it does, it means a parent schema has already claimed responsibility for validating this property, so the child validator skips it entirely. This effectively achieves the "top-level wins" inheritance model.

This approach cleanly integrates our desired inheritance behavior directly into the validator with minimal and explicit deviation from the standard, avoiding the need for a complex, post-processing validation function like the old walk_and_validate_refs.

2. Recursive Runtime Strictness Control

Problem: The jspg project requires that certain schemas (specifically those for public puncs and global types) enforce a strict "no extra properties" policy. This strictness needs to be decided at runtime and must cascade through the entire validation hierarchy, including all nested objects and $ref chains. A compile-time flag was unsuitable because it would incorrectly apply strictness to shared, reusable schemas.
Solution: A runtime validation option was implemented to enforce strictness recursively. This required several coordinated changes to the boon validator.

Key Changes

ValidationOptions Struct: A new ValidationOptions { be_strict: bool } struct was added to validator/src/lib.rs. The jspg code in src/lib.rs determines if a validation run should be strict and passes this struct to the validator.
Strictness Check in uneval_validate: The original boon only checked for unevaluated properties if the unevaluatedProperties keyword was present in the schema. We added an else if be_strict block to uneval_validate in validator/src/validator.rs. This block triggers a check for any leftover unevaluated properties at the end of a validation pass and reports them as errors, effectively enforcing our runtime strictness rule.
Correct Context Propagation: The most complex part of the fix was ensuring the set of unevaluated properties was correctly maintained across different validation contexts (especially $ref and nested property validations). Three critical changes were made:
- Inheriting Context in _validate_self: When validating keywords that apply to the same instance (like $ref or allOf), the sub-validator must know what properties the parent has already evaluated. We changed the creation of the Validator inside _validate_self to pass a clone of the parent's uneval state (uneval: self.uneval.clone()) instead of creating a new one from scratch. This allows the context to flow downwards.
- Isolating Context in validate_val: Conversely, when validating a property's value, that value is a different part of the JSON instance. The sub-validation should not affect the parent's list of unevaluated properties. We fixed this by commenting out the self.uneval.merge(...) call in the validate_val function.
- Simplifying Uneval::merge: The original logic for merging uneval state was different for $ref keywords. This was incorrect. We simplified the merge function to always perform an intersection (retain), which correctly combines the knowledge of evaluated properties from different schema parts that apply to the same instance.
Removing Incompatible Assertions: The changes to context propagation broke several debug_assert! macros in the arr_validate function, which were part of boon's original design. Since our new validation flow is different but correct, these assertions were removed.

9.7 KiB Raw Permalink Blame History

Gemini Project Overview: jspg

What is jspg?

How It Works

Error Handling

DropError Format

boon Crate Modifications

1. Property-Level Overrides for Inheritance

Key Changes

2. Recursive Runtime Strictness Control

Key Changes

9.7 KiB

Raw Permalink Blame History

Gemini Project Overview: `jspg`

What is `jspg`?

`boon` Crate Modifications