114 lines
9.7 KiB
Markdown
114 lines
9.7 KiB
Markdown
# Gemini Project Overview: `jspg`
|
|
|
|
This document outlines the purpose of the `jspg` project, its architecture, and the specific modifications made to the vendored `boon` JSON schema validator crate.
|
|
|
|
## What is `jspg`?
|
|
|
|
`jspg` is a PostgreSQL extension written in Rust using the `pgrx` framework. Its primary function is to provide fast, in-database JSON schema validation against the 2020-12 draft of the JSON Schema specification.
|
|
|
|
### How It Works
|
|
|
|
The extension is designed for high-performance scenarios where schemas are defined once and used many times for validation. It achieves this through an in-memory cache.
|
|
|
|
1. **Caching:** A user first calls the `cache_json_schemas(enums, types, puncs)` SQL function. This function takes arrays of JSON objects representing different kinds of schemas within a larger application framework. It uses the vendored `boon` crate to compile all these schemas into an efficient internal format and stores them in a static, in-memory `SCHEMA_CACHE`. This cache is managed by a `RwLock` to allow concurrent reads during validation.
|
|
|
|
2. **Validation:** The `validate_json_schema(schema_id, instance)` SQL function is then used to validate a JSONB `instance` against a specific, pre-cached schema identified by its `$id`. This function looks up the compiled schema in the cache and runs the validation, returning a success response or a detailed error report.
|
|
|
|
3. **Custom Logic:** `jspg` uses a locally modified (vendored) version of the `boon` crate. This allows for powerful, application-specific validation logic that goes beyond the standard JSON Schema specification, such as runtime-based strictness.
|
|
|
|
### Error Handling
|
|
|
|
When validation fails, `jspg` provides a detailed error report in a consistent JSON format, which we refer to as a "DropError". This process involves two main helper functions in `src/lib.rs`:
|
|
|
|
1. **`collect_errors`**: `boon` returns a nested tree of `ValidationError` objects. This function recursively traverses that tree to find the most specific, underlying causes of the failure. It filters out structural errors (like `allOf` or `anyOf`) to create a flat list of concrete validation failures.
|
|
|
|
2. **`format_errors`**: This function takes the flat list of errors and transforms each one into the final DropError JSON format. It also de-duplicates errors that occur at the same JSON Pointer path, ensuring a cleaner output if a single value violates multiple constraints.
|
|
|
|
#### DropError Format
|
|
|
|
A DropError object provides a clear, structured explanation of a validation failure:
|
|
|
|
```json
|
|
{
|
|
"code": "ADDITIONAL_PROPERTIES_NOT_ALLOWED",
|
|
"message": "Property 'extra' is not allowed",
|
|
"details": {
|
|
"path": "/extra",
|
|
"context": "not allowed",
|
|
"cause": {
|
|
"got": [
|
|
"extra"
|
|
]
|
|
},
|
|
"schema": "basic_strict_test.request"
|
|
}
|
|
}
|
|
```
|
|
|
|
- `code` (string): A machine-readable error code (e.g., `ADDITIONAL_PROPERTIES_NOT_ALLOWED`, `MIN_LENGTH_VIOLATED`).
|
|
- `message` (string): A human-readable summary of the error.
|
|
- `details` (object):
|
|
- `path` (string): The JSON Pointer path to the invalid data within the instance.
|
|
- `context` (any): The actual value that failed validation.
|
|
- `cause` (any): The low-level reason from the validator, often including the expected value (`want`) and the actual value (`got`).
|
|
- `schema` (string): The `$id` of the schema that was being validated.
|
|
|
|
---
|
|
|
|
## `boon` Crate Modifications
|
|
|
|
The version of `boon` located in the `validator/` directory has been significantly modified to support application-specific validation logic that goes beyond the standard JSON Schema specification.
|
|
|
|
### 1. Property-Level Overrides for Inheritance
|
|
|
|
- **Problem:** A primary use case for this project is validating data models that use `$ref` to create inheritance chains (e.g., a `person` schema `$ref`s a `user` schema, which `$ref`s an `entity` schema). A common pattern is to use a `const` keyword on a `type` property to identify the specific model (e.g., `"type": {"const": "person"}`). However, standard JSON Schema composition with `allOf` (which is implicitly used by `$ref`) treats these as a logical AND. This creates an impossible condition where an instance's `type` property would need to be "person" AND "user" AND "entity" simultaneously.
|
|
|
|
- **Solution:** We've implemented a custom, explicit override mechanism. A new keyword, `"override": true`, can be added to any property definition within a schema.
|
|
|
|
```json
|
|
// person.json
|
|
{
|
|
"$id": "person",
|
|
"$ref": "user",
|
|
"properties": {
|
|
"type": { "const": "person", "override": true }
|
|
}
|
|
}
|
|
```
|
|
|
|
This signals to the validator that this definition of the `type` property should be the *only* one applied, and any definitions for `type` found in base schemas (like `user` or `entity`) should be ignored for the duration of this validation.
|
|
|
|
#### Key Changes
|
|
|
|
This was achieved by making the validator stateful, using a pattern already present in `boon` for handling `unevaluatedProperties`.
|
|
|
|
1. **Meta-Schema Update**: The meta-schema for Draft 2020-12 was modified to recognize `"override": true` as a valid keyword within a schema object, preventing the compiler from rejecting our custom schemas.
|
|
|
|
2. **Compiler Modification**: The schema compiler in `validator/src/compiler.rs` was updated. It now inspects sub-schemas within a `properties` keyword and, if it finds `"override": true`, it records the name of that property in a new `override_properties` `HashSet` on the compiled `Schema` struct.
|
|
|
|
3. **Stateful Validator with `Override` Context**: The core `Validator` in `validator/src/validator.rs` was modified to carry an `Override` context (a `HashSet` of property names) throughout the validation process.
|
|
- **Initialization**: When validation begins, the `Override` context is created and populated with the names of any properties that the top-level schema has marked with `override`.
|
|
- **Propagation**: As the validator descends through a `$ref` or `allOf`, this `Override` context is cloned and passed down. The child schema adds its own override properties to the set, ensuring that higher-level overrides are always maintained.
|
|
- **Enforcement**: In `obj_validate`, before a property is validated, the validator first checks if the property's name exists in the `Override` context it has received. If it does, it means a parent schema has already claimed responsibility for validating this property, so the child validator **skips** it entirely. This effectively achieves the "top-level wins" inheritance model.
|
|
|
|
This approach cleanly integrates our desired inheritance behavior directly into the validator with minimal and explicit deviation from the standard, avoiding the need for a complex, post-processing validation function like the old `walk_and_validate_refs`.
|
|
|
|
### 2. Recursive Runtime Strictness Control
|
|
|
|
- **Problem:** The `jspg` project requires that certain schemas (specifically those for public `puncs` and global `type`s) enforce a strict "no extra properties" policy. This strictness needs to be decided at runtime and must cascade through the entire validation hierarchy, including all nested objects and `$ref` chains. A compile-time flag was unsuitable because it would incorrectly apply strictness to shared, reusable schemas.
|
|
|
|
- **Solution:** A runtime validation option was implemented to enforce strictness recursively. This required several coordinated changes to the `boon` validator.
|
|
|
|
#### Key Changes
|
|
|
|
1. **`ValidationOptions` Struct**: A new `ValidationOptions { be_strict: bool }` struct was added to `validator/src/lib.rs`. The `jspg` code in `src/lib.rs` determines if a validation run should be strict and passes this struct to the validator.
|
|
|
|
2. **Strictness Check in `uneval_validate`**: The original `boon` only checked for unevaluated properties if the `unevaluatedProperties` keyword was present in the schema. We added an `else if be_strict` block to `uneval_validate` in `validator/src/validator.rs`. This block triggers a check for any leftover unevaluated properties at the end of a validation pass and reports them as errors, effectively enforcing our runtime strictness rule.
|
|
|
|
3. **Correct Context Propagation**: The most complex part of the fix was ensuring the set of unevaluated properties was correctly maintained across different validation contexts (especially `$ref` and nested property validations). Three critical changes were made:
|
|
- **Inheriting Context in `_validate_self`**: When validating keywords that apply to the same instance (like `$ref` or `allOf`), the sub-validator must know what properties the parent has already evaluated. We changed the creation of the `Validator` inside `_validate_self` to pass a clone of the parent's `uneval` state (`uneval: self.uneval.clone()`) instead of creating a new one from scratch. This allows the context to flow downwards.
|
|
- **Isolating Context in `validate_val`**: Conversely, when validating a property's value, that value is a *different* part of the JSON instance. The sub-validation should not affect the parent's list of unevaluated properties. We fixed this by commenting out the `self.uneval.merge(...)` call in the `validate_val` function.
|
|
- **Simplifying `Uneval::merge`**: The original logic for merging `uneval` state was different for `$ref` keywords. This was incorrect. We simplified the `merge` function to *always* perform an intersection (`retain`), which correctly combines the knowledge of evaluated properties from different schema parts that apply to the same instance.
|
|
|
|
4. **Removing Incompatible Assertions**: The changes to context propagation broke several `debug_assert!` macros in the `arr_validate` function, which were part of `boon`'s original design. Since our new validation flow is different but correct, these assertions were removed.
|