Compare commits
6 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 6b6647f2d6 | |||
| d301d5fab9 | |||
| 61511b595d | |||
| c7ae975275 | |||
| aa58082cd7 | |||
| 491fb3a3e3 |
76
GEMINI.md
76
GEMINI.md
@ -1,25 +1,79 @@
|
|||||||
# Gemini Project Overview: `jspg`
|
# Gemini Project Overview: `jspg`
|
||||||
|
|
||||||
This document outlines the purpose of the `jspg` project and the specific modifications made to the vendored `boon` JSON schema validator crate.
|
This document outlines the purpose of the `jspg` project, its architecture, and the specific modifications made to the vendored `boon` JSON schema validator crate.
|
||||||
|
|
||||||
## What is `jspg`?
|
## What is `jspg`?
|
||||||
|
|
||||||
`jspg` is a PostgreSQL extension written in Rust using the `pgrx` framework. Its primary function is to provide fast, in-database JSON schema validation against the 2020-12 draft of the JSON Schema specification.
|
`jspg` is a PostgreSQL extension written in Rust using the `pgrx` framework. Its primary function is to provide fast, in-database JSON schema validation against the 2020-12 draft of the JSON Schema specification.
|
||||||
|
|
||||||
It works by:
|
### How It Works
|
||||||
1. Exposing a SQL function, `cache_json_schemas(...)`, which takes arrays of schema objects, compiles them, and caches them in memory.
|
|
||||||
2. Exposing a SQL validation function, `validate_json_schema(schema_id, instance)`, which validates a JSONB instance against one of the pre-cached schemas.
|
The extension is designed for high-performance scenarios where schemas are defined once and used many times for validation. It achieves this through an in-memory cache.
|
||||||
3. Using a locally modified (vendored) version of the `boon` crate to perform the validation, allowing for custom enhancements to its core logic.
|
|
||||||
|
1. **Caching:** A user first calls the `cache_json_schemas(enums, types, puncs)` SQL function. This function takes arrays of JSON objects representing different kinds of schemas within a larger application framework. It uses the vendored `boon` crate to compile all these schemas into an efficient internal format and stores them in a static, in-memory `SCHEMA_CACHE`. This cache is managed by a `RwLock` to allow concurrent reads during validation.
|
||||||
|
|
||||||
|
2. **Validation:** The `validate_json_schema(schema_id, instance)` SQL function is then used to validate a JSONB `instance` against a specific, pre-cached schema identified by its `$id`. This function looks up the compiled schema in the cache and runs the validation, returning a success response or a detailed error report.
|
||||||
|
|
||||||
|
3. **Custom Logic:** `jspg` uses a locally modified (vendored) version of the `boon` crate. This allows for powerful, application-specific validation logic that goes beyond the standard JSON Schema specification, such as runtime-based strictness.
|
||||||
|
|
||||||
|
### Error Handling
|
||||||
|
|
||||||
|
When validation fails, `jspg` provides a detailed error report in a consistent JSON format, which we refer to as a "DropError". This process involves two main helper functions in `src/lib.rs`:
|
||||||
|
|
||||||
|
1. **`collect_errors`**: `boon` returns a nested tree of `ValidationError` objects. This function recursively traverses that tree to find the most specific, underlying causes of the failure. It filters out structural errors (like `allOf` or `anyOf`) to create a flat list of concrete validation failures.
|
||||||
|
|
||||||
|
2. **`format_errors`**: This function takes the flat list of errors and transforms each one into the final DropError JSON format. It also de-duplicates errors that occur at the same JSON Pointer path, ensuring a cleaner output if a single value violates multiple constraints.
|
||||||
|
|
||||||
|
#### DropError Format
|
||||||
|
|
||||||
|
A DropError object provides a clear, structured explanation of a validation failure:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"code": "ADDITIONAL_PROPERTIES_NOT_ALLOWED",
|
||||||
|
"message": "Property 'extra' is not allowed",
|
||||||
|
"details": {
|
||||||
|
"path": "/extra",
|
||||||
|
"context": "not allowed",
|
||||||
|
"cause": {
|
||||||
|
"got": [
|
||||||
|
"extra"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"schema": "basic_strict_test.request"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
- `code` (string): A machine-readable error code (e.g., `ADDITIONAL_PROPERTIES_NOT_ALLOWED`, `MIN_LENGTH_VIOLATED`).
|
||||||
|
- `message` (string): A human-readable summary of the error.
|
||||||
|
- `details` (object):
|
||||||
|
- `path` (string): The JSON Pointer path to the invalid data within the instance.
|
||||||
|
- `context` (any): The actual value that failed validation.
|
||||||
|
- `cause` (any): The low-level reason from the validator, often including the expected value (`want`) and the actual value (`got`).
|
||||||
|
- `schema` (string): The `$id` of the schema that was being validated.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## `boon` Crate Modifications
|
## `boon` Crate Modifications
|
||||||
|
|
||||||
The version of `boon` located in the `validator/` directory has been modified to address specific requirements of the `jspg` project. The key deviations from the upstream `boon` crate are as follows:
|
The version of `boon` located in the `validator/` directory has been significantly modified to support runtime-based strict validation. The original `boon` crate only supports compile-time strictness and lacks the necessary mechanisms to propagate validation context correctly for our use case.
|
||||||
|
|
||||||
### 1. Recursive Runtime Strictness Control
|
### 1. Recursive Runtime Strictness Control
|
||||||
|
|
||||||
- **Problem:** The `jspg` project requires that certain schemas enforce a strict "no extra properties" policy (specifically, schemas for public `puncs` and global `type`s). This strictness needs to cascade through the entire validation hierarchy, including all nested objects and `$ref` chains. A compile-time flag was unsuitable because it would incorrectly apply strictness to shared, reusable schemas.
|
- **Problem:** The `jspg` project requires that certain schemas (specifically those for public `puncs` and global `type`s) enforce a strict "no extra properties" policy. This strictness needs to be decided at runtime and must cascade through the entire validation hierarchy, including all nested objects and `$ref` chains. A compile-time flag was unsuitable because it would incorrectly apply strictness to shared, reusable schemas.
|
||||||
|
|
||||||
- **Solution:** A runtime validation option was implemented to enforce strictness recursively.
|
- **Solution:** A runtime validation option was implemented to enforce strictness recursively. This required several coordinated changes to the `boon` validator.
|
||||||
1. A `ValidationOptions { be_strict: bool }` struct was added. The `jspg` code in `src/lib.rs` determines whether a validation run should be strict (based on the `punc`'s `public` flag or if validating a global `type`) and passes the appropriate option to the validator.
|
|
||||||
2. The `be_strict` option is propagated through the entire recursive validation process. A bug was fixed in `_validate_self` (which handles `$ref`s) to ensure that the sub-validator is always initialized to track unevaluated properties when `be_strict` is enabled. Previously, tracking was only initiated if the parent was already tracking unevaluated properties, causing strictness to be dropped across certain `$ref` boundaries.
|
#### Key Changes
|
||||||
3. At any time, if `unevaluatedProperties` or `additionalProperties` is found in the schema, it should override the strict (or non-strict) validation at that level.
|
|
||||||
|
1. **`ValidationOptions` Struct**: A new `ValidationOptions { be_strict: bool }` struct was added to `validator/src/lib.rs`. The `jspg` code in `src/lib.rs` determines if a validation run should be strict and passes this struct to the validator.
|
||||||
|
|
||||||
|
2. **Strictness Check in `uneval_validate`**: The original `boon` only checked for unevaluated properties if the `unevaluatedProperties` keyword was present in the schema. We added an `else if be_strict` block to `uneval_validate` in `validator/src/validator.rs`. This block triggers a check for any leftover unevaluated properties at the end of a validation pass and reports them as errors, effectively enforcing our runtime strictness rule.
|
||||||
|
|
||||||
|
3. **Correct Context Propagation**: The most complex part of the fix was ensuring the set of unevaluated properties was correctly maintained across different validation contexts (especially `$ref` and nested property validations). Three critical changes were made:
|
||||||
|
- **Inheriting Context in `_validate_self`**: When validating keywords that apply to the same instance (like `$ref` or `allOf`), the sub-validator must know what properties the parent has already evaluated. We changed the creation of the `Validator` inside `_validate_self` to pass a clone of the parent's `uneval` state (`uneval: self.uneval.clone()`) instead of creating a new one from scratch. This allows the context to flow downwards.
|
||||||
|
- **Isolating Context in `validate_val`**: Conversely, when validating a property's value, that value is a *different* part of the JSON instance. The sub-validation should not affect the parent's list of unevaluated properties. We fixed this by commenting out the `self.uneval.merge(...)` call in the `validate_val` function.
|
||||||
|
- **Simplifying `Uneval::merge`**: The original logic for merging `uneval` state was different for `$ref` keywords. This was incorrect. We simplified the `merge` function to *always* perform an intersection (`retain`), which correctly combines the knowledge of evaluated properties from different schema parts that apply to the same instance.
|
||||||
|
|
||||||
|
4. **Removing Incompatible Assertions**: The changes to context propagation broke several `debug_assert!` macros in the `arr_validate` function, which were part of `boon`'s original design. Since our new validation flow is different but correct, these assertions were removed.
|
||||||
|
|||||||
13
flow
13
flow
@ -97,11 +97,16 @@ install() {
|
|||||||
fi
|
fi
|
||||||
}
|
}
|
||||||
|
|
||||||
test() {
|
test-jspg() {
|
||||||
info "Running jspg tests..."
|
info "Running jspg tests..."
|
||||||
cargo pgrx test "pg${POSTGRES_VERSION}" "$@" || return $?
|
cargo pgrx test "pg${POSTGRES_VERSION}" "$@" || return $?
|
||||||
}
|
}
|
||||||
|
|
||||||
|
test-validator() {
|
||||||
|
info "Running validator tests..."
|
||||||
|
cargo test -p boon --features "pgrx/pg${POSTGRES_VERSION}" "$@" || return $?
|
||||||
|
}
|
||||||
|
|
||||||
clean() {
|
clean() {
|
||||||
info "Cleaning build artifacts..."
|
info "Cleaning build artifacts..."
|
||||||
cargo clean || return $?
|
cargo clean || return $?
|
||||||
@ -111,7 +116,8 @@ jspg-usage() {
|
|||||||
printf "prepare\tCheck OS, Cargo, and PGRX dependencies.\n"
|
printf "prepare\tCheck OS, Cargo, and PGRX dependencies.\n"
|
||||||
printf "install\tBuild and install the extension locally (after prepare).\n"
|
printf "install\tBuild and install the extension locally (after prepare).\n"
|
||||||
printf "reinstall\tClean, build, and install the extension locally (after prepare).\n"
|
printf "reinstall\tClean, build, and install the extension locally (after prepare).\n"
|
||||||
printf "test\t\tRun pgrx integration tests.\n"
|
printf "test-jspg\t\tRun pgrx integration tests.\n"
|
||||||
|
printf "test-validator\t\tRun validator integration tests.\n"
|
||||||
printf "clean\t\tRemove pgrx build artifacts.\n"
|
printf "clean\t\tRemove pgrx build artifacts.\n"
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -121,7 +127,8 @@ jspg-flow() {
|
|||||||
build) build; return $?;;
|
build) build; return $?;;
|
||||||
install) install; return $?;;
|
install) install; return $?;;
|
||||||
reinstall) clean && install; return $?;;
|
reinstall) clean && install; return $?;;
|
||||||
test) test "${@:2}"; return $?;;
|
test-jspg) test-jspg "${@:2}"; return $?;;
|
||||||
|
test-validator) test-validator "${@:2}"; return $?;;
|
||||||
clean) clean; return $?;;
|
clean) clean; return $?;;
|
||||||
*) return 1 ;;
|
*) return 1 ;;
|
||||||
esac
|
esac
|
||||||
|
|||||||
@ -304,7 +304,7 @@ fn validate_json_schema(schema_id: &str, instance: JsonB) -> JsonB {
|
|||||||
Some(schema) => {
|
Some(schema) => {
|
||||||
let instance_value: Value = instance.0;
|
let instance_value: Value = instance.0;
|
||||||
let options = match schema.t {
|
let options = match schema.t {
|
||||||
SchemaType::Type | SchemaType::PublicPunc => Some(ValidationOptions { be_strict: true }),
|
SchemaType::PublicPunc => Some(ValidationOptions { be_strict: true }),
|
||||||
_ => None,
|
_ => None,
|
||||||
};
|
};
|
||||||
|
|
||||||
|
|||||||
@ -10,7 +10,7 @@ let mut schemas = Schemas::new(); // container for compiled schemas
|
|||||||
let mut compiler = Compiler::new();
|
let mut compiler = Compiler::new();
|
||||||
let sch_index = compiler.compile("schema.json", &mut schemas)?;
|
let sch_index = compiler.compile("schema.json", &mut schemas)?;
|
||||||
let instance: Value = serde_json::from_reader(File::open("instance.json")?)?;
|
let instance: Value = serde_json::from_reader(File::open("instance.json")?)?;
|
||||||
let valid = schemas.validate(&instance, sch_index).is_ok();
|
let valid = schemas.validate(&instance, sch_index, None).is_ok();
|
||||||
# Ok(())
|
# Ok(())
|
||||||
# }
|
# }
|
||||||
```
|
```
|
||||||
|
|||||||
@ -849,7 +849,6 @@ impl<'v, 's> Validator<'v, 's, '_, '_> {
|
|||||||
) -> Result<(), ValidationError<'s, 'v>> {
|
) -> Result<(), ValidationError<'s, 'v>> {
|
||||||
let scope = self.scope.child(sch, ref_kw, self.scope.vid);
|
let scope = self.scope.child(sch, ref_kw, self.scope.vid);
|
||||||
let schema = &self.schemas.get(sch);
|
let schema = &self.schemas.get(sch);
|
||||||
let be_strict = self.options.map_or(false, |o| o.be_strict);
|
|
||||||
let (result, reply) = Validator {
|
let (result, reply) = Validator {
|
||||||
v: self.v,
|
v: self.v,
|
||||||
vloc: self.vloc,
|
vloc: self.vloc,
|
||||||
|
|||||||
@ -15,7 +15,7 @@ fn test_debug() -> Result<(), Box<dyn Error>> {
|
|||||||
let url = "http://debug.com/schema.json";
|
let url = "http://debug.com/schema.json";
|
||||||
compiler.add_resource(url, test["schema"].clone())?;
|
compiler.add_resource(url, test["schema"].clone())?;
|
||||||
let sch = compiler.compile(url, &mut schemas)?;
|
let sch = compiler.compile(url, &mut schemas)?;
|
||||||
let result = schemas.validate(&test["data"], sch);
|
let result = schemas.validate(&test["data"], sch, None);
|
||||||
if let Err(e) = &result {
|
if let Err(e) = &result {
|
||||||
for line in format!("{e}").lines() {
|
for line in format!("{e}").lines() {
|
||||||
println!(" {line}");
|
println!(" {line}");
|
||||||
|
|||||||
@ -13,7 +13,7 @@ fn example_from_files() -> Result<(), Box<dyn Error>> {
|
|||||||
let mut schemas = Schemas::new();
|
let mut schemas = Schemas::new();
|
||||||
let mut compiler = Compiler::new();
|
let mut compiler = Compiler::new();
|
||||||
let sch_index = compiler.compile(schema_file, &mut schemas)?;
|
let sch_index = compiler.compile(schema_file, &mut schemas)?;
|
||||||
let result = schemas.validate(&instance, sch_index);
|
let result = schemas.validate(&instance, sch_index, None);
|
||||||
assert!(result.is_ok());
|
assert!(result.is_ok());
|
||||||
|
|
||||||
Ok(())
|
Ok(())
|
||||||
@ -51,7 +51,7 @@ fn example_from_strings() -> Result<(), Box<dyn Error>> {
|
|||||||
compiler.add_resource("tests/examples/pet.json", pet_schema)?;
|
compiler.add_resource("tests/examples/pet.json", pet_schema)?;
|
||||||
compiler.add_resource("tests/examples/cat.json", cat_schema)?;
|
compiler.add_resource("tests/examples/cat.json", cat_schema)?;
|
||||||
let sch_index = compiler.compile("tests/examples/pet.json", &mut schemas)?;
|
let sch_index = compiler.compile("tests/examples/pet.json", &mut schemas)?;
|
||||||
let result = schemas.validate(&instance, sch_index);
|
let result = schemas.validate(&instance, sch_index, None);
|
||||||
assert!(result.is_ok());
|
assert!(result.is_ok());
|
||||||
|
|
||||||
Ok(())
|
Ok(())
|
||||||
@ -79,7 +79,7 @@ fn example_from_https() -> Result<(), Box<dyn Error>> {
|
|||||||
loader.register("https", Box::new(HttpUrlLoader));
|
loader.register("https", Box::new(HttpUrlLoader));
|
||||||
compiler.use_loader(Box::new(loader));
|
compiler.use_loader(Box::new(loader));
|
||||||
let sch_index = compiler.compile(schema_url, &mut schemas)?;
|
let sch_index = compiler.compile(schema_url, &mut schemas)?;
|
||||||
let result = schemas.validate(&instance, sch_index);
|
let result = schemas.validate(&instance, sch_index, None);
|
||||||
assert!(result.is_ok());
|
assert!(result.is_ok());
|
||||||
|
|
||||||
Ok(())
|
Ok(())
|
||||||
@ -114,7 +114,7 @@ fn example_from_yaml_files() -> Result<(), Box<dyn Error>> {
|
|||||||
loader.register("file", Box::new(FileUrlLoader));
|
loader.register("file", Box::new(FileUrlLoader));
|
||||||
compiler.use_loader(Box::new(loader));
|
compiler.use_loader(Box::new(loader));
|
||||||
let sch_index = compiler.compile(schema_file, &mut schemas)?;
|
let sch_index = compiler.compile(schema_file, &mut schemas)?;
|
||||||
let result = schemas.validate(&instance, sch_index);
|
let result = schemas.validate(&instance, sch_index, None);
|
||||||
assert!(result.is_ok());
|
assert!(result.is_ok());
|
||||||
|
|
||||||
Ok(())
|
Ok(())
|
||||||
@ -148,7 +148,7 @@ fn example_custom_format() -> Result<(), Box<dyn Error>> {
|
|||||||
});
|
});
|
||||||
compiler.add_resource(schema_url, schema)?;
|
compiler.add_resource(schema_url, schema)?;
|
||||||
let sch_index = compiler.compile(schema_url, &mut schemas)?;
|
let sch_index = compiler.compile(schema_url, &mut schemas)?;
|
||||||
let result = schemas.validate(&instance, sch_index);
|
let result = schemas.validate(&instance, sch_index, None);
|
||||||
assert!(result.is_ok());
|
assert!(result.is_ok());
|
||||||
|
|
||||||
Ok(())
|
Ok(())
|
||||||
@ -193,7 +193,7 @@ fn example_custom_content_encoding() -> Result<(), Box<dyn Error>> {
|
|||||||
});
|
});
|
||||||
compiler.add_resource(schema_url, schema)?;
|
compiler.add_resource(schema_url, schema)?;
|
||||||
let sch_index = compiler.compile(schema_url, &mut schemas)?;
|
let sch_index = compiler.compile(schema_url, &mut schemas)?;
|
||||||
let result = schemas.validate(&instance, sch_index);
|
let result = schemas.validate(&instance, sch_index, None);
|
||||||
assert!(result.is_err());
|
assert!(result.is_err());
|
||||||
|
|
||||||
Ok(())
|
Ok(())
|
||||||
@ -223,7 +223,7 @@ fn example_custom_content_media_type() -> Result<(), Box<dyn Error>> {
|
|||||||
});
|
});
|
||||||
compiler.add_resource(schema_url, schema)?;
|
compiler.add_resource(schema_url, schema)?;
|
||||||
let sch_index = compiler.compile(schema_url, &mut schemas)?;
|
let sch_index = compiler.compile(schema_url, &mut schemas)?;
|
||||||
let result = schemas.validate(&instance, sch_index);
|
let result = schemas.validate(&instance, sch_index, None);
|
||||||
assert!(result.is_ok());
|
assert!(result.is_ok());
|
||||||
|
|
||||||
Ok(())
|
Ok(())
|
||||||
|
|||||||
@ -52,7 +52,7 @@ fn test_folder(suite: &str, folder: &str, draft: Draft) -> Result<(), Box<dyn Er
|
|||||||
let sch = compiler.compile(schema_url, &mut schemas)?;
|
let sch = compiler.compile(schema_url, &mut schemas)?;
|
||||||
for test in group.tests {
|
for test in group.tests {
|
||||||
println!(" {}", test.description);
|
println!(" {}", test.description);
|
||||||
match schemas.validate(&test.data, sch) {
|
match schemas.validate(&test.data, sch, None) {
|
||||||
Ok(_) => println!(" validation success"),
|
Ok(_) => println!(" validation success"),
|
||||||
Err(e) => {
|
Err(e) => {
|
||||||
if let Some(sch) = test.output.basic {
|
if let Some(sch) = test.output.basic {
|
||||||
@ -64,7 +64,7 @@ fn test_folder(suite: &str, folder: &str, draft: Draft) -> Result<(), Box<dyn Er
|
|||||||
compiler.add_resource(schema_url, sch)?;
|
compiler.add_resource(schema_url, sch)?;
|
||||||
let sch = compiler.compile(schema_url, &mut schemas)?;
|
let sch = compiler.compile(schema_url, &mut schemas)?;
|
||||||
let basic: Value = serde_json::from_str(&e.basic_output().to_string())?;
|
let basic: Value = serde_json::from_str(&e.basic_output().to_string())?;
|
||||||
let result = schemas.validate(&basic, sch);
|
let result = schemas.validate(&basic, sch, None);
|
||||||
if let Err(e) = result {
|
if let Err(e) = result {
|
||||||
println!("{basic:#}\n");
|
println!("{basic:#}\n");
|
||||||
for line in format!("{e}").lines() {
|
for line in format!("{e}").lines() {
|
||||||
@ -83,7 +83,7 @@ fn test_folder(suite: &str, folder: &str, draft: Draft) -> Result<(), Box<dyn Er
|
|||||||
let sch = compiler.compile(schema_url, &mut schemas)?;
|
let sch = compiler.compile(schema_url, &mut schemas)?;
|
||||||
let detailed: Value =
|
let detailed: Value =
|
||||||
serde_json::from_str(&e.detailed_output().to_string())?;
|
serde_json::from_str(&e.detailed_output().to_string())?;
|
||||||
let result = schemas.validate(&detailed, sch);
|
let result = schemas.validate(&detailed, sch, None);
|
||||||
if let Err(e) = result {
|
if let Err(e) = result {
|
||||||
println!("{detailed:#}\n");
|
println!("{detailed:#}\n");
|
||||||
for line in format!("{e}").lines() {
|
for line in format!("{e}").lines() {
|
||||||
|
|||||||
@ -90,7 +90,7 @@ fn test_file(suite: &str, path: &str, draft: Draft) -> Result<(), Box<dyn Error>
|
|||||||
let sch_index = compiler.compile(url, &mut schemas)?;
|
let sch_index = compiler.compile(url, &mut schemas)?;
|
||||||
for test in group.tests {
|
for test in group.tests {
|
||||||
println!(" {}", test.description);
|
println!(" {}", test.description);
|
||||||
let result = schemas.validate(&test.data, sch_index);
|
let result = schemas.validate(&test.data, sch_index, None);
|
||||||
if let Err(e) = &result {
|
if let Err(e) = &result {
|
||||||
for line in format!("{e}").lines() {
|
for line in format!("{e}").lines() {
|
||||||
println!(" {line}");
|
println!(" {line}");
|
||||||
|
|||||||
Reference in New Issue
Block a user