Compare commits

...

8 Commits

12 changed files with 189 additions and 47 deletions

View File

@ -1,25 +1,79 @@
# Gemini Project Overview: `jspg`
This document outlines the purpose of the `jspg` project and the specific modifications made to the vendored `boon` JSON schema validator crate.
This document outlines the purpose of the `jspg` project, its architecture, and the specific modifications made to the vendored `boon` JSON schema validator crate.
## What is `jspg`?
`jspg` is a PostgreSQL extension written in Rust using the `pgrx` framework. Its primary function is to provide fast, in-database JSON schema validation against the 2020-12 draft of the JSON Schema specification.
It works by:
1. Exposing a SQL function, `cache_json_schemas(...)`, which takes arrays of schema objects, compiles them, and caches them in memory.
2. Exposing a SQL validation function, `validate_json_schema(schema_id, instance)`, which validates a JSONB instance against one of the pre-cached schemas.
3. Using a locally modified (vendored) version of the `boon` crate to perform the validation, allowing for custom enhancements to its core logic.
### How It Works
The extension is designed for high-performance scenarios where schemas are defined once and used many times for validation. It achieves this through an in-memory cache.
1. **Caching:** A user first calls the `cache_json_schemas(enums, types, puncs)` SQL function. This function takes arrays of JSON objects representing different kinds of schemas within a larger application framework. It uses the vendored `boon` crate to compile all these schemas into an efficient internal format and stores them in a static, in-memory `SCHEMA_CACHE`. This cache is managed by a `RwLock` to allow concurrent reads during validation.
2. **Validation:** The `validate_json_schema(schema_id, instance)` SQL function is then used to validate a JSONB `instance` against a specific, pre-cached schema identified by its `$id`. This function looks up the compiled schema in the cache and runs the validation, returning a success response or a detailed error report.
3. **Custom Logic:** `jspg` uses a locally modified (vendored) version of the `boon` crate. This allows for powerful, application-specific validation logic that goes beyond the standard JSON Schema specification, such as runtime-based strictness.
### Error Handling
When validation fails, `jspg` provides a detailed error report in a consistent JSON format, which we refer to as a "DropError". This process involves two main helper functions in `src/lib.rs`:
1. **`collect_errors`**: `boon` returns a nested tree of `ValidationError` objects. This function recursively traverses that tree to find the most specific, underlying causes of the failure. It filters out structural errors (like `allOf` or `anyOf`) to create a flat list of concrete validation failures.
2. **`format_errors`**: This function takes the flat list of errors and transforms each one into the final DropError JSON format. It also de-duplicates errors that occur at the same JSON Pointer path, ensuring a cleaner output if a single value violates multiple constraints.
#### DropError Format
A DropError object provides a clear, structured explanation of a validation failure:
```json
{
"code": "ADDITIONAL_PROPERTIES_NOT_ALLOWED",
"message": "Property 'extra' is not allowed",
"details": {
"path": "/extra",
"context": "not allowed",
"cause": {
"got": [
"extra"
]
},
"schema": "basic_strict_test.request"
}
}
```
- `code` (string): A machine-readable error code (e.g., `ADDITIONAL_PROPERTIES_NOT_ALLOWED`, `MIN_LENGTH_VIOLATED`).
- `message` (string): A human-readable summary of the error.
- `details` (object):
- `path` (string): The JSON Pointer path to the invalid data within the instance.
- `context` (any): The actual value that failed validation.
- `cause` (any): The low-level reason from the validator, often including the expected value (`want`) and the actual value (`got`).
- `schema` (string): The `$id` of the schema that was being validated.
---
## `boon` Crate Modifications
The version of `boon` located in the `validator/` directory has been modified to address specific requirements of the `jspg` project. The key deviations from the upstream `boon` crate are as follows:
The version of `boon` located in the `validator/` directory has been significantly modified to support runtime-based strict validation. The original `boon` crate only supports compile-time strictness and lacks the necessary mechanisms to propagate validation context correctly for our use case.
### 1. Recursive Runtime Strictness Control
- **Problem:** The `jspg` project requires that certain schemas enforce a strict "no extra properties" policy (specifically, schemas for public `puncs` and global `type`s). This strictness needs to cascade through the entire validation hierarchy, including all nested objects and `$ref` chains. A compile-time flag was unsuitable because it would incorrectly apply strictness to shared, reusable schemas.
- **Problem:** The `jspg` project requires that certain schemas (specifically those for public `puncs` and global `type`s) enforce a strict "no extra properties" policy. This strictness needs to be decided at runtime and must cascade through the entire validation hierarchy, including all nested objects and `$ref` chains. A compile-time flag was unsuitable because it would incorrectly apply strictness to shared, reusable schemas.
- **Solution:** A runtime validation option was implemented to enforce strictness recursively.
1. A `ValidationOptions { be_strict: bool }` struct was added. The `jspg` code in `src/lib.rs` determines whether a validation run should be strict (based on the `punc`'s `public` flag or if validating a global `type`) and passes the appropriate option to the validator.
2. The `be_strict` option is propagated through the entire recursive validation process. A bug was fixed in `_validate_self` (which handles `$ref`s) to ensure that the sub-validator is always initialized to track unevaluated properties when `be_strict` is enabled. Previously, tracking was only initiated if the parent was already tracking unevaluated properties, causing strictness to be dropped across certain `$ref` boundaries.
3. At any time, if `unevaluatedProperties` or `additionalProperties` is found in the schema, it should override the strict (or non-strict) validation at that level.
- **Solution:** A runtime validation option was implemented to enforce strictness recursively. This required several coordinated changes to the `boon` validator.
#### Key Changes
1. **`ValidationOptions` Struct**: A new `ValidationOptions { be_strict: bool }` struct was added to `validator/src/lib.rs`. The `jspg` code in `src/lib.rs` determines if a validation run should be strict and passes this struct to the validator.
2. **Strictness Check in `uneval_validate`**: The original `boon` only checked for unevaluated properties if the `unevaluatedProperties` keyword was present in the schema. We added an `else if be_strict` block to `uneval_validate` in `validator/src/validator.rs`. This block triggers a check for any leftover unevaluated properties at the end of a validation pass and reports them as errors, effectively enforcing our runtime strictness rule.
3. **Correct Context Propagation**: The most complex part of the fix was ensuring the set of unevaluated properties was correctly maintained across different validation contexts (especially `$ref` and nested property validations). Three critical changes were made:
- **Inheriting Context in `_validate_self`**: When validating keywords that apply to the same instance (like `$ref` or `allOf`), the sub-validator must know what properties the parent has already evaluated. We changed the creation of the `Validator` inside `_validate_self` to pass a clone of the parent's `uneval` state (`uneval: self.uneval.clone()`) instead of creating a new one from scratch. This allows the context to flow downwards.
- **Isolating Context in `validate_val`**: Conversely, when validating a property's value, that value is a *different* part of the JSON instance. The sub-validation should not affect the parent's list of unevaluated properties. We fixed this by commenting out the `self.uneval.merge(...)` call in the `validate_val` function.
- **Simplifying `Uneval::merge`**: The original logic for merging `uneval` state was different for `$ref` keywords. This was incorrect. We simplified the `merge` function to *always* perform an intersection (`retain`), which correctly combines the knowledge of evaluated properties from different schema parts that apply to the same instance.
4. **Removing Incompatible Assertions**: The changes to context propagation broke several `debug_assert!` macros in the `arr_validate` function, which were part of `boon`'s original design. Since our new validation flow is different but correct, these assertions were removed.

13
flow
View File

@ -97,11 +97,16 @@ install() {
fi
}
test() {
test-jspg() {
info "Running jspg tests..."
cargo pgrx test "pg${POSTGRES_VERSION}" "$@" || return $?
}
test-validator() {
info "Running validator tests..."
cargo test -p boon --features "pgrx/pg${POSTGRES_VERSION}" "$@" || return $?
}
clean() {
info "Cleaning build artifacts..."
cargo clean || return $?
@ -111,7 +116,8 @@ jspg-usage() {
printf "prepare\tCheck OS, Cargo, and PGRX dependencies.\n"
printf "install\tBuild and install the extension locally (after prepare).\n"
printf "reinstall\tClean, build, and install the extension locally (after prepare).\n"
printf "test\t\tRun pgrx integration tests.\n"
printf "test-jspg\t\tRun pgrx integration tests.\n"
printf "test-validator\t\tRun validator integration tests.\n"
printf "clean\t\tRemove pgrx build artifacts.\n"
}
@ -121,7 +127,8 @@ jspg-flow() {
build) build; return $?;;
install) install; return $?;;
reinstall) clean && install; return $?;;
test) test "${@:2}"; return $?;;
test-jspg) test-jspg "${@:2}"; return $?;;
test-validator) test-validator "${@:2}"; return $?;;
clean) clean; return $?;;
*) return 1 ;;
esac

View File

@ -304,11 +304,11 @@ fn validate_json_schema(schema_id: &str, instance: JsonB) -> JsonB {
Some(schema) => {
let instance_value: Value = instance.0;
let options = match schema.t {
SchemaType::Type | SchemaType::PublicPunc => Some(ValidationOptions { be_strict: true }),
SchemaType::PublicPunc => Some(ValidationOptions { be_strict: true }),
_ => None,
};
match cache.schemas.validate(&instance_value, schema.index, options.as_ref()) {
match cache.schemas.validate(&instance_value, schema.index, options) {
Ok(_) => {
let mut custom_errors = Vec::new();
if schema.t == SchemaType::Type || schema.t == SchemaType::PublicPunc || schema.t == SchemaType::PrivatePunc {
@ -394,7 +394,7 @@ fn collect_errors(error: &ValidationError, errors_list: &mut Vec<Error>) {
ErrorKind::AnyOf => handle_any_of_error(&base_path),
ErrorKind::OneOf(matched) => handle_one_of_error(&base_path, matched),
};
errors_list.extend(errors_to_add);
}

View File

@ -355,7 +355,16 @@ pub fn additional_properties_schemas() -> JsonB {
pub fn unevaluated_properties_schemas() -> JsonB {
let enums = json!([]);
let types = json!([]);
let types = json!([{
"name": "nested_for_uneval",
"schemas": [{
"$id": "nested_for_uneval",
"type": "object",
"properties": {
"deep_prop": { "type": "string" }
}
}]
}]);
let puncs = json!([
{
"name": "simple_unevaluated_test",
@ -396,6 +405,29 @@ pub fn unevaluated_properties_schemas() -> JsonB {
},
"unevaluatedProperties": false
}]
},
{
"name": "nested_unevaluated_test",
"public": true, // To trigger strict mode
"schemas": [{
"$id": "nested_unevaluated_test.request",
"type": "object",
"properties": {
"non_strict_branch": {
"type": "object",
"unevaluatedProperties": true, // The magic switch
"properties": {
"some_prop": { "$ref": "nested_for_uneval" }
}
},
"strict_branch": {
"type": "object",
"properties": {
"another_prop": { "type": "string" }
}
}
}
}]
}
]);

View File

@ -452,6 +452,33 @@ fn test_validate_unevaluated_properties() {
let valid_result = validate_json_schema("simple_unevaluated_test.request", jsonb(valid_instance));
assert_success(&valid_result);
// Test 4: Test that unevaluatedProperties: true cascades down refs
let cascading_instance = json!({
"strict_branch": {
"another_prop": "is_ok"
},
"non_strict_branch": {
"extra_at_toplevel": "is_ok", // Extra property at this level
"some_prop": {
"deep_prop": "is_ok",
"extra_in_ref": "is_also_ok" // Extra property in the $ref'd schema
}
}
});
let cascading_result = validate_json_schema("nested_unevaluated_test.request", jsonb(cascading_instance));
assert_success(&cascading_result);
// Test 5: For good measure, test that the strict branch is still strict
let strict_fail_instance = json!({
"strict_branch": {
"another_prop": "is_ok",
"extra_in_strict": "is_not_ok"
}
});
let strict_fail_result = validate_json_schema("nested_unevaluated_test.request", jsonb(strict_fail_instance));
assert_error_count(&strict_fail_result, 1);
assert_has_error(&strict_fail_result, "ADDITIONAL_PROPERTIES_NOT_ALLOWED", "/strict_branch/extra_in_strict");
}
#[pg_test]

View File

@ -10,7 +10,7 @@ let mut schemas = Schemas::new(); // container for compiled schemas
let mut compiler = Compiler::new();
let sch_index = compiler.compile("schema.json", &mut schemas)?;
let instance: Value = serde_json::from_reader(File::open("instance.json")?)?;
let valid = schemas.validate(&instance, sch_index).is_ok();
let valid = schemas.validate(&instance, sch_index, None).is_ok();
# Ok(())
# }
```
@ -194,7 +194,7 @@ impl Schemas {
&'s self,
v: &'v Value,
sch_index: SchemaIndex,
options: Option<&'s ValidationOptions>,
options: Option<ValidationOptions>,
) -> Result<(), ValidationError<'s, 'v>> {
let Some(sch) = self.list.get(sch_index.0) else {
panic!("Schemas::validate: schema index out of bounds");

View File

@ -20,7 +20,7 @@ pub(crate) fn validate<'s, 'v>(
v: &'v Value,
schema: &'s Schema,
schemas: &'s Schemas,
options: Option<&'s ValidationOptions>,
options: Option<ValidationOptions>,
) -> Result<(), ValidationError<'s, 'v>> {
let scope = Scope {
sch: schema.idx,
@ -29,7 +29,7 @@ pub(crate) fn validate<'s, 'v>(
parent: None,
};
let mut vloc = Vec::with_capacity(8);
let be_strict = options.map_or(false, |o| o.be_strict);
let options = options.unwrap_or_default();
let (result, _) = Validator {
v,
vloc: &mut vloc,
@ -37,7 +37,7 @@ pub(crate) fn validate<'s, 'v>(
schemas,
scope,
options,
uneval: Uneval::from(v, schema, be_strict),
uneval: Uneval::from(v, schema, options.be_strict),
errors: vec![],
bool_result: false,
}
@ -89,7 +89,7 @@ struct Validator<'v, 's, 'd, 'e> {
schema: &'s Schema,
schemas: &'s Schemas,
scope: Scope<'d>,
options: Option<&'s ValidationOptions>,
options: ValidationOptions,
uneval: Uneval<'v>,
errors: Vec<ValidationError<'s, 'v>>,
bool_result: bool, // is interested to know valid or not (but not actuall error)
@ -296,7 +296,7 @@ impl<'v> Validator<'v, '_, '_, '_> {
if let Some(sch) = &s.property_names {
for pname in obj.keys() {
let v = Value::String(pname.to_owned());
if let Err(mut e) = self.schemas.validate(&v, *sch, self.options) {
if let Err(mut e) = self.schemas.validate(&v, *sch, Some(self.options)) {
e.schema_url = &s.loc;
e.kind = ErrorKind::PropertyName {
prop: pname.to_owned(),
@ -510,7 +510,7 @@ impl<'v> Validator<'v, '_, '_, '_> {
// contentSchema --
if let (Some(sch), Some(v)) = (s.content_schema, deserialized) {
if let Err(mut e) = self.schemas.validate(&v, sch, self.options) {
if let Err(mut e) = self.schemas.validate(&v, sch, Some(self.options)) {
e.schema_url = &s.loc;
e.kind = kind!(ContentSchema);
self.errors.push(e.clone_static());
@ -762,8 +762,6 @@ impl Validator<'_, '_, '_, '_> {
};
}
let be_strict = self.options.map_or(false, |o| o.be_strict);
// unevaluatedProperties --
if let Value::Object(obj) = v {
if let Some(sch_idx) = s.unevaluated_properties {
@ -786,7 +784,7 @@ impl Validator<'_, '_, '_, '_> {
}
self.uneval.props.clear();
}
} else if be_strict && !self.bool_result {
} else if self.options.be_strict && !self.bool_result {
// 2. Runtime strictness check
if !self.uneval.props.is_empty() {
let props: Vec<Cow<str>> = self.uneval.props.iter().map(|p| Cow::from((*p).as_str())).collect();
@ -824,15 +822,27 @@ impl<'v, 's> Validator<'v, 's, '_, '_> {
}
let scope = self.scope.child(sch, None, self.scope.vid + 1);
let schema = &self.schemas.get(sch);
let be_strict = self.options.map_or(false, |o| o.be_strict);
// Check if the new schema turns off strictness
let allows_unevaluated = schema.boolean == Some(true) ||
if let Some(idx) = schema.unevaluated_properties {
self.schemas.get(idx).boolean == Some(true)
} else {
false
};
let mut new_options = self.options;
if allows_unevaluated {
new_options.be_strict = false;
}
let (result, _reply) = Validator {
v,
vloc: self.vloc,
schema,
schemas: self.schemas,
scope,
options: self.options,
uneval: Uneval::from(v, schema, be_strict || !self.uneval.is_empty()),
options: new_options,
uneval: Uneval::from(v, schema, new_options.be_strict || !self.uneval.is_empty()),
errors: vec![],
bool_result: self.bool_result,
}
@ -849,14 +859,26 @@ impl<'v, 's> Validator<'v, 's, '_, '_> {
) -> Result<(), ValidationError<'s, 'v>> {
let scope = self.scope.child(sch, ref_kw, self.scope.vid);
let schema = &self.schemas.get(sch);
let be_strict = self.options.map_or(false, |o| o.be_strict);
// Check if the new schema turns off strictness
let allows_unevaluated = schema.boolean == Some(true) ||
if let Some(idx) = schema.unevaluated_properties {
self.schemas.get(idx).boolean == Some(true)
} else {
false
};
let mut new_options = self.options;
if allows_unevaluated {
new_options.be_strict = false;
}
let (result, reply) = Validator {
v: self.v,
vloc: self.vloc,
schema,
schemas: self.schemas,
scope,
options: self.options,
options: new_options,
uneval: self.uneval.clone(),
errors: vec![],
bool_result: self.bool_result || bool_result,

View File

@ -15,7 +15,7 @@ fn test_debug() -> Result<(), Box<dyn Error>> {
let url = "http://debug.com/schema.json";
compiler.add_resource(url, test["schema"].clone())?;
let sch = compiler.compile(url, &mut schemas)?;
let result = schemas.validate(&test["data"], sch);
let result = schemas.validate(&test["data"], sch, None);
if let Err(e) = &result {
for line in format!("{e}").lines() {
println!(" {line}");

View File

@ -13,7 +13,7 @@ fn example_from_files() -> Result<(), Box<dyn Error>> {
let mut schemas = Schemas::new();
let mut compiler = Compiler::new();
let sch_index = compiler.compile(schema_file, &mut schemas)?;
let result = schemas.validate(&instance, sch_index);
let result = schemas.validate(&instance, sch_index, None);
assert!(result.is_ok());
Ok(())
@ -51,7 +51,7 @@ fn example_from_strings() -> Result<(), Box<dyn Error>> {
compiler.add_resource("tests/examples/pet.json", pet_schema)?;
compiler.add_resource("tests/examples/cat.json", cat_schema)?;
let sch_index = compiler.compile("tests/examples/pet.json", &mut schemas)?;
let result = schemas.validate(&instance, sch_index);
let result = schemas.validate(&instance, sch_index, None);
assert!(result.is_ok());
Ok(())
@ -79,7 +79,7 @@ fn example_from_https() -> Result<(), Box<dyn Error>> {
loader.register("https", Box::new(HttpUrlLoader));
compiler.use_loader(Box::new(loader));
let sch_index = compiler.compile(schema_url, &mut schemas)?;
let result = schemas.validate(&instance, sch_index);
let result = schemas.validate(&instance, sch_index, None);
assert!(result.is_ok());
Ok(())
@ -114,7 +114,7 @@ fn example_from_yaml_files() -> Result<(), Box<dyn Error>> {
loader.register("file", Box::new(FileUrlLoader));
compiler.use_loader(Box::new(loader));
let sch_index = compiler.compile(schema_file, &mut schemas)?;
let result = schemas.validate(&instance, sch_index);
let result = schemas.validate(&instance, sch_index, None);
assert!(result.is_ok());
Ok(())
@ -148,7 +148,7 @@ fn example_custom_format() -> Result<(), Box<dyn Error>> {
});
compiler.add_resource(schema_url, schema)?;
let sch_index = compiler.compile(schema_url, &mut schemas)?;
let result = schemas.validate(&instance, sch_index);
let result = schemas.validate(&instance, sch_index, None);
assert!(result.is_ok());
Ok(())
@ -193,7 +193,7 @@ fn example_custom_content_encoding() -> Result<(), Box<dyn Error>> {
});
compiler.add_resource(schema_url, schema)?;
let sch_index = compiler.compile(schema_url, &mut schemas)?;
let result = schemas.validate(&instance, sch_index);
let result = schemas.validate(&instance, sch_index, None);
assert!(result.is_err());
Ok(())
@ -223,7 +223,7 @@ fn example_custom_content_media_type() -> Result<(), Box<dyn Error>> {
});
compiler.add_resource(schema_url, schema)?;
let sch_index = compiler.compile(schema_url, &mut schemas)?;
let result = schemas.validate(&instance, sch_index);
let result = schemas.validate(&instance, sch_index, None);
assert!(result.is_ok());
Ok(())

View File

@ -52,7 +52,7 @@ fn test_folder(suite: &str, folder: &str, draft: Draft) -> Result<(), Box<dyn Er
let sch = compiler.compile(schema_url, &mut schemas)?;
for test in group.tests {
println!(" {}", test.description);
match schemas.validate(&test.data, sch) {
match schemas.validate(&test.data, sch, None) {
Ok(_) => println!(" validation success"),
Err(e) => {
if let Some(sch) = test.output.basic {
@ -64,7 +64,7 @@ fn test_folder(suite: &str, folder: &str, draft: Draft) -> Result<(), Box<dyn Er
compiler.add_resource(schema_url, sch)?;
let sch = compiler.compile(schema_url, &mut schemas)?;
let basic: Value = serde_json::from_str(&e.basic_output().to_string())?;
let result = schemas.validate(&basic, sch);
let result = schemas.validate(&basic, sch, None);
if let Err(e) = result {
println!("{basic:#}\n");
for line in format!("{e}").lines() {
@ -83,7 +83,7 @@ fn test_folder(suite: &str, folder: &str, draft: Draft) -> Result<(), Box<dyn Er
let sch = compiler.compile(schema_url, &mut schemas)?;
let detailed: Value =
serde_json::from_str(&e.detailed_output().to_string())?;
let result = schemas.validate(&detailed, sch);
let result = schemas.validate(&detailed, sch, None);
if let Err(e) = result {
println!("{detailed:#}\n");
for line in format!("{e}").lines() {

View File

@ -90,7 +90,7 @@ fn test_file(suite: &str, path: &str, draft: Draft) -> Result<(), Box<dyn Error>
let sch_index = compiler.compile(url, &mut schemas)?;
for test in group.tests {
println!(" {}", test.description);
let result = schemas.validate(&test.data, sch_index);
let result = schemas.validate(&test.data, sch_index, None);
if let Err(e) = &result {
for line in format!("{e}").lines() {
println!(" {line}");

View File

@ -1 +1 @@
1.0.37
1.0.40