9.7 KiB
Gemini Project Overview: jspg
This document outlines the purpose of the jspg project, its architecture, and the specific modifications made to the vendored boon JSON schema validator crate.
What is jspg?
jspg is a PostgreSQL extension written in Rust using the pgrx framework. Its primary function is to provide fast, in-database JSON schema validation against the 2020-12 draft of the JSON Schema specification.
How It Works
The extension is designed for high-performance scenarios where schemas are defined once and used many times for validation. It achieves this through an in-memory cache.
-
Caching: A user first calls the
cache_json_schemas(enums, types, puncs)SQL function. This function takes arrays of JSON objects representing different kinds of schemas within a larger application framework. It uses the vendoredbooncrate to compile all these schemas into an efficient internal format and stores them in a static, in-memorySCHEMA_CACHE. This cache is managed by aRwLockto allow concurrent reads during validation. -
Validation: The
validate_json_schema(schema_id, instance)SQL function is then used to validate a JSONBinstanceagainst a specific, pre-cached schema identified by its$id. This function looks up the compiled schema in the cache and runs the validation, returning a success response or a detailed error report. -
Custom Logic:
jspguses a locally modified (vendored) version of thebooncrate. This allows for powerful, application-specific validation logic that goes beyond the standard JSON Schema specification, such as runtime-based strictness.
Error Handling
When validation fails, jspg provides a detailed error report in a consistent JSON format, which we refer to as a "DropError". This process involves two main helper functions in src/lib.rs:
-
collect_errors:boonreturns a nested tree ofValidationErrorobjects. This function recursively traverses that tree to find the most specific, underlying causes of the failure. It filters out structural errors (likeallOforanyOf) to create a flat list of concrete validation failures. -
format_errors: This function takes the flat list of errors and transforms each one into the final DropError JSON format. It also de-duplicates errors that occur at the same JSON Pointer path, ensuring a cleaner output if a single value violates multiple constraints.
DropError Format
A DropError object provides a clear, structured explanation of a validation failure:
{
"code": "ADDITIONAL_PROPERTIES_NOT_ALLOWED",
"message": "Property 'extra' is not allowed",
"details": {
"path": "/extra",
"context": "not allowed",
"cause": {
"got": [
"extra"
]
},
"schema": "basic_strict_test.request"
}
}
code(string): A machine-readable error code (e.g.,ADDITIONAL_PROPERTIES_NOT_ALLOWED,MIN_LENGTH_VIOLATED).message(string): A human-readable summary of the error.details(object):path(string): The JSON Pointer path to the invalid data within the instance.context(any): The actual value that failed validation.cause(any): The low-level reason from the validator, often including the expected value (want) and the actual value (got).schema(string): The$idof the schema that was being validated.
boon Crate Modifications
The version of boon located in the validator/ directory has been significantly modified to support application-specific validation logic that goes beyond the standard JSON Schema specification.
1. Property-Level Overrides for Inheritance
-
Problem: A primary use case for this project is validating data models that use
$refto create inheritance chains (e.g., apersonschema$refs auserschema, which$refs anentityschema). A common pattern is to use aconstkeyword on atypeproperty to identify the specific model (e.g.,"type": {"const": "person"}). However, standard JSON Schema composition withallOf(which is implicitly used by$ref) treats these as a logical AND. This creates an impossible condition where an instance'stypeproperty would need to be "person" AND "user" AND "entity" simultaneously. -
Solution: We've implemented a custom, explicit override mechanism. A new keyword,
"override": true, can be added to any property definition within a schema.// person.json { "$id": "person", "$ref": "user", "properties": { "type": { "const": "person", "override": true } } }This signals to the validator that this definition of the
typeproperty should be the only one applied, and any definitions fortypefound in base schemas (likeuserorentity) should be ignored for the duration of this validation.
Key Changes
This was achieved by making the validator stateful, using a pattern already present in boon for handling unevaluatedProperties.
-
Meta-Schema Update: The meta-schema for Draft 2020-12 was modified to recognize
"override": trueas a valid keyword within a schema object, preventing the compiler from rejecting our custom schemas. -
Compiler Modification: The schema compiler in
validator/src/compiler.rswas updated. It now inspects sub-schemas within apropertieskeyword and, if it finds"override": true, it records the name of that property in a newoverride_propertiesHashSeton the compiledSchemastruct. -
Stateful Validator with
OverrideContext: The coreValidatorinvalidator/src/validator.rswas modified to carry anOverridecontext (aHashSetof property names) throughout the validation process.- Initialization: When validation begins, the
Overridecontext is created and populated with the names of any properties that the top-level schema has marked withoverride. - Propagation: As the validator descends through a
$reforallOf, thisOverridecontext is cloned and passed down. The child schema adds its own override properties to the set, ensuring that higher-level overrides are always maintained. - Enforcement: In
obj_validate, before a property is validated, the validator first checks if the property's name exists in theOverridecontext it has received. If it does, it means a parent schema has already claimed responsibility for validating this property, so the child validator skips it entirely. This effectively achieves the "top-level wins" inheritance model.
- Initialization: When validation begins, the
This approach cleanly integrates our desired inheritance behavior directly into the validator with minimal and explicit deviation from the standard, avoiding the need for a complex, post-processing validation function like the old walk_and_validate_refs.
2. Recursive Runtime Strictness Control
-
Problem: The
jspgproject requires that certain schemas (specifically those for publicpuncsand globaltypes) enforce a strict "no extra properties" policy. This strictness needs to be decided at runtime and must cascade through the entire validation hierarchy, including all nested objects and$refchains. A compile-time flag was unsuitable because it would incorrectly apply strictness to shared, reusable schemas. -
Solution: A runtime validation option was implemented to enforce strictness recursively. This required several coordinated changes to the
boonvalidator.
Key Changes
-
ValidationOptionsStruct: A newValidationOptions { be_strict: bool }struct was added tovalidator/src/lib.rs. Thejspgcode insrc/lib.rsdetermines if a validation run should be strict and passes this struct to the validator. -
Strictness Check in
uneval_validate: The originalboononly checked for unevaluated properties if theunevaluatedPropertieskeyword was present in the schema. We added anelse if be_strictblock touneval_validateinvalidator/src/validator.rs. This block triggers a check for any leftover unevaluated properties at the end of a validation pass and reports them as errors, effectively enforcing our runtime strictness rule. -
Correct Context Propagation: The most complex part of the fix was ensuring the set of unevaluated properties was correctly maintained across different validation contexts (especially
$refand nested property validations). Three critical changes were made:- Inheriting Context in
_validate_self: When validating keywords that apply to the same instance (like$reforallOf), the sub-validator must know what properties the parent has already evaluated. We changed the creation of theValidatorinside_validate_selfto pass a clone of the parent'sunevalstate (uneval: self.uneval.clone()) instead of creating a new one from scratch. This allows the context to flow downwards. - Isolating Context in
validate_val: Conversely, when validating a property's value, that value is a different part of the JSON instance. The sub-validation should not affect the parent's list of unevaluated properties. We fixed this by commenting out theself.uneval.merge(...)call in thevalidate_valfunction. - Simplifying
Uneval::merge: The original logic for mergingunevalstate was different for$refkeywords. This was incorrect. We simplified themergefunction to always perform an intersection (retain), which correctly combines the knowledge of evaluated properties from different schema parts that apply to the same instance.
- Inheriting Context in
-
Removing Incompatible Assertions: The changes to context propagation broke several
debug_assert!macros in thearr_validatefunction, which were part ofboon's original design. Since our new validation flow is different but correct, these assertions were removed.