From 491fb3a3e36015ef1c39e884a14b76f414e482d4 Mon Sep 17 00:00:00 2001 From: Alex Groleau Date: Tue, 30 Sep 2025 20:01:49 -0400 Subject: [PATCH] docs updated --- GEMINI.md | 76 +++++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 65 insertions(+), 11 deletions(-) diff --git a/GEMINI.md b/GEMINI.md index c4f5cd4..7ae4358 100644 --- a/GEMINI.md +++ b/GEMINI.md @@ -1,25 +1,79 @@ # Gemini Project Overview: `jspg` -This document outlines the purpose of the `jspg` project and the specific modifications made to the vendored `boon` JSON schema validator crate. +This document outlines the purpose of the `jspg` project, its architecture, and the specific modifications made to the vendored `boon` JSON schema validator crate. ## What is `jspg`? `jspg` is a PostgreSQL extension written in Rust using the `pgrx` framework. Its primary function is to provide fast, in-database JSON schema validation against the 2020-12 draft of the JSON Schema specification. -It works by: -1. Exposing a SQL function, `cache_json_schemas(...)`, which takes arrays of schema objects, compiles them, and caches them in memory. -2. Exposing a SQL validation function, `validate_json_schema(schema_id, instance)`, which validates a JSONB instance against one of the pre-cached schemas. -3. Using a locally modified (vendored) version of the `boon` crate to perform the validation, allowing for custom enhancements to its core logic. +### How It Works + +The extension is designed for high-performance scenarios where schemas are defined once and used many times for validation. It achieves this through an in-memory cache. + +1. **Caching:** A user first calls the `cache_json_schemas(enums, types, puncs)` SQL function. This function takes arrays of JSON objects representing different kinds of schemas within a larger application framework. It uses the vendored `boon` crate to compile all these schemas into an efficient internal format and stores them in a static, in-memory `SCHEMA_CACHE`. This cache is managed by a `RwLock` to allow concurrent reads during validation. + +2. **Validation:** The `validate_json_schema(schema_id, instance)` SQL function is then used to validate a JSONB `instance` against a specific, pre-cached schema identified by its `$id`. This function looks up the compiled schema in the cache and runs the validation, returning a success response or a detailed error report. + +3. **Custom Logic:** `jspg` uses a locally modified (vendored) version of the `boon` crate. This allows for powerful, application-specific validation logic that goes beyond the standard JSON Schema specification, such as runtime-based strictness. + +### Error Handling + +When validation fails, `jspg` provides a detailed error report in a consistent JSON format, which we refer to as a "DropError". This process involves two main helper functions in `src/lib.rs`: + +1. **`collect_errors`**: `boon` returns a nested tree of `ValidationError` objects. This function recursively traverses that tree to find the most specific, underlying causes of the failure. It filters out structural errors (like `allOf` or `anyOf`) to create a flat list of concrete validation failures. + +2. **`format_errors`**: This function takes the flat list of errors and transforms each one into the final DropError JSON format. It also de-duplicates errors that occur at the same JSON Pointer path, ensuring a cleaner output if a single value violates multiple constraints. + +#### DropError Format + +A DropError object provides a clear, structured explanation of a validation failure: + +```json +{ + "code": "ADDITIONAL_PROPERTIES_NOT_ALLOWED", + "message": "Property 'extra' is not allowed", + "details": { + "path": "/extra", + "context": "not allowed", + "cause": { + "got": [ + "extra" + ] + }, + "schema": "basic_strict_test.request" + } +} +``` + +- `code` (string): A machine-readable error code (e.g., `ADDITIONAL_PROPERTIES_NOT_ALLOWED`, `MIN_LENGTH_VIOLATED`). +- `message` (string): A human-readable summary of the error. +- `details` (object): + - `path` (string): The JSON Pointer path to the invalid data within the instance. + - `context` (any): The actual value that failed validation. + - `cause` (any): The low-level reason from the validator, often including the expected value (`want`) and the actual value (`got`). + - `schema` (string): The `$id` of the schema that was being validated. + +--- ## `boon` Crate Modifications -The version of `boon` located in the `validator/` directory has been modified to address specific requirements of the `jspg` project. The key deviations from the upstream `boon` crate are as follows: +The version of `boon` located in the `validator/` directory has been significantly modified to support runtime-based strict validation. The original `boon` crate only supports compile-time strictness and lacks the necessary mechanisms to propagate validation context correctly for our use case. ### 1. Recursive Runtime Strictness Control -- **Problem:** The `jspg` project requires that certain schemas enforce a strict "no extra properties" policy (specifically, schemas for public `puncs` and global `type`s). This strictness needs to cascade through the entire validation hierarchy, including all nested objects and `$ref` chains. A compile-time flag was unsuitable because it would incorrectly apply strictness to shared, reusable schemas. +- **Problem:** The `jspg` project requires that certain schemas (specifically those for public `puncs` and global `type`s) enforce a strict "no extra properties" policy. This strictness needs to be decided at runtime and must cascade through the entire validation hierarchy, including all nested objects and `$ref` chains. A compile-time flag was unsuitable because it would incorrectly apply strictness to shared, reusable schemas. -- **Solution:** A runtime validation option was implemented to enforce strictness recursively. - 1. A `ValidationOptions { be_strict: bool }` struct was added. The `jspg` code in `src/lib.rs` determines whether a validation run should be strict (based on the `punc`'s `public` flag or if validating a global `type`) and passes the appropriate option to the validator. - 2. The `be_strict` option is propagated through the entire recursive validation process. A bug was fixed in `_validate_self` (which handles `$ref`s) to ensure that the sub-validator is always initialized to track unevaluated properties when `be_strict` is enabled. Previously, tracking was only initiated if the parent was already tracking unevaluated properties, causing strictness to be dropped across certain `$ref` boundaries. - 3. At any time, if `unevaluatedProperties` or `additionalProperties` is found in the schema, it should override the strict (or non-strict) validation at that level. \ No newline at end of file +- **Solution:** A runtime validation option was implemented to enforce strictness recursively. This required several coordinated changes to the `boon` validator. + +#### Key Changes + +1. **`ValidationOptions` Struct**: A new `ValidationOptions { be_strict: bool }` struct was added to `validator/src/lib.rs`. The `jspg` code in `src/lib.rs` determines if a validation run should be strict and passes this struct to the validator. + +2. **Strictness Check in `uneval_validate`**: The original `boon` only checked for unevaluated properties if the `unevaluatedProperties` keyword was present in the schema. We added an `else if be_strict` block to `uneval_validate` in `validator/src/validator.rs`. This block triggers a check for any leftover unevaluated properties at the end of a validation pass and reports them as errors, effectively enforcing our runtime strictness rule. + +3. **Correct Context Propagation**: The most complex part of the fix was ensuring the set of unevaluated properties was correctly maintained across different validation contexts (especially `$ref` and nested property validations). Three critical changes were made: + - **Inheriting Context in `_validate_self`**: When validating keywords that apply to the same instance (like `$ref` or `allOf`), the sub-validator must know what properties the parent has already evaluated. We changed the creation of the `Validator` inside `_validate_self` to pass a clone of the parent's `uneval` state (`uneval: self.uneval.clone()`) instead of creating a new one from scratch. This allows the context to flow downwards. + - **Isolating Context in `validate_val`**: Conversely, when validating a property's value, that value is a *different* part of the JSON instance. The sub-validation should not affect the parent's list of unevaluated properties. We fixed this by commenting out the `self.uneval.merge(...)` call in the `validate_val` function. + - **Simplifying `Uneval::merge`**: The original logic for merging `uneval` state was different for `$ref` keywords. This was incorrect. We simplified the `merge` function to *always* perform an intersection (`retain`), which correctly combines the knowledge of evaluated properties from different schema parts that apply to the same instance. + +4. **Removing Incompatible Assertions**: The changes to context propagation broke several `debug_assert!` macros in the `arr_validate` function, which were part of `boon`'s original design. Since our new validation flow is different but correct, these assertions were removed.