Files
jspg/GEMINI.md
2025-09-30 20:01:49 -04:00

6.6 KiB

Gemini Project Overview: jspg

This document outlines the purpose of the jspg project, its architecture, and the specific modifications made to the vendored boon JSON schema validator crate.

What is jspg?

jspg is a PostgreSQL extension written in Rust using the pgrx framework. Its primary function is to provide fast, in-database JSON schema validation against the 2020-12 draft of the JSON Schema specification.

How It Works

The extension is designed for high-performance scenarios where schemas are defined once and used many times for validation. It achieves this through an in-memory cache.

  1. Caching: A user first calls the cache_json_schemas(enums, types, puncs) SQL function. This function takes arrays of JSON objects representing different kinds of schemas within a larger application framework. It uses the vendored boon crate to compile all these schemas into an efficient internal format and stores them in a static, in-memory SCHEMA_CACHE. This cache is managed by a RwLock to allow concurrent reads during validation.

  2. Validation: The validate_json_schema(schema_id, instance) SQL function is then used to validate a JSONB instance against a specific, pre-cached schema identified by its $id. This function looks up the compiled schema in the cache and runs the validation, returning a success response or a detailed error report.

  3. Custom Logic: jspg uses a locally modified (vendored) version of the boon crate. This allows for powerful, application-specific validation logic that goes beyond the standard JSON Schema specification, such as runtime-based strictness.

Error Handling

When validation fails, jspg provides a detailed error report in a consistent JSON format, which we refer to as a "DropError". This process involves two main helper functions in src/lib.rs:

  1. collect_errors: boon returns a nested tree of ValidationError objects. This function recursively traverses that tree to find the most specific, underlying causes of the failure. It filters out structural errors (like allOf or anyOf) to create a flat list of concrete validation failures.

  2. format_errors: This function takes the flat list of errors and transforms each one into the final DropError JSON format. It also de-duplicates errors that occur at the same JSON Pointer path, ensuring a cleaner output if a single value violates multiple constraints.

DropError Format

A DropError object provides a clear, structured explanation of a validation failure:

{
  "code": "ADDITIONAL_PROPERTIES_NOT_ALLOWED",
  "message": "Property 'extra' is not allowed",
  "details": {
    "path": "/extra",
    "context": "not allowed",
    "cause": {
      "got": [
        "extra"
      ]
    },
    "schema": "basic_strict_test.request"
  }
}
  • code (string): A machine-readable error code (e.g., ADDITIONAL_PROPERTIES_NOT_ALLOWED, MIN_LENGTH_VIOLATED).
  • message (string): A human-readable summary of the error.
  • details (object):
    • path (string): The JSON Pointer path to the invalid data within the instance.
    • context (any): The actual value that failed validation.
    • cause (any): The low-level reason from the validator, often including the expected value (want) and the actual value (got).
    • schema (string): The $id of the schema that was being validated.

boon Crate Modifications

The version of boon located in the validator/ directory has been significantly modified to support runtime-based strict validation. The original boon crate only supports compile-time strictness and lacks the necessary mechanisms to propagate validation context correctly for our use case.

1. Recursive Runtime Strictness Control

  • Problem: The jspg project requires that certain schemas (specifically those for public puncs and global types) enforce a strict "no extra properties" policy. This strictness needs to be decided at runtime and must cascade through the entire validation hierarchy, including all nested objects and $ref chains. A compile-time flag was unsuitable because it would incorrectly apply strictness to shared, reusable schemas.

  • Solution: A runtime validation option was implemented to enforce strictness recursively. This required several coordinated changes to the boon validator.

Key Changes

  1. ValidationOptions Struct: A new ValidationOptions { be_strict: bool } struct was added to validator/src/lib.rs. The jspg code in src/lib.rs determines if a validation run should be strict and passes this struct to the validator.

  2. Strictness Check in uneval_validate: The original boon only checked for unevaluated properties if the unevaluatedProperties keyword was present in the schema. We added an else if be_strict block to uneval_validate in validator/src/validator.rs. This block triggers a check for any leftover unevaluated properties at the end of a validation pass and reports them as errors, effectively enforcing our runtime strictness rule.

  3. Correct Context Propagation: The most complex part of the fix was ensuring the set of unevaluated properties was correctly maintained across different validation contexts (especially $ref and nested property validations). Three critical changes were made:

    • Inheriting Context in _validate_self: When validating keywords that apply to the same instance (like $ref or allOf), the sub-validator must know what properties the parent has already evaluated. We changed the creation of the Validator inside _validate_self to pass a clone of the parent's uneval state (uneval: self.uneval.clone()) instead of creating a new one from scratch. This allows the context to flow downwards.
    • Isolating Context in validate_val: Conversely, when validating a property's value, that value is a different part of the JSON instance. The sub-validation should not affect the parent's list of unevaluated properties. We fixed this by commenting out the self.uneval.merge(...) call in the validate_val function.
    • Simplifying Uneval::merge: The original logic for merging uneval state was different for $ref keywords. This was incorrect. We simplified the merge function to always perform an intersection (retain), which correctly combines the knowledge of evaluated properties from different schema parts that apply to the same instance.
  4. Removing Incompatible Assertions: The changes to context propagation broke several debug_assert! macros in the arr_validate function, which were part of boon's original design. Since our new validation flow is different but correct, these assertions were removed.