gjson pathing for stem paths

This commit is contained in:
2026-03-13 23:35:37 -04:00
parent d6deaa0b0f
commit 290464adc1
6 changed files with 2576 additions and 139 deletions

View File

@ -101,22 +101,24 @@ The Queryer transforms Postgres into a pre-compiled Semantic Query Engine via th
* **Array Inclusion**: `{"$in": [values]}`, `{"$nin": [values]}` use native `jsonb_array_elements_text()` bindings to enforce `IN` and `NOT IN` logic without runtime SQL injection risks.
* **Text Matching (ILIKE)**: Evaluates `$eq` or `$ne` against string fields containing the `%` character natively into Postgres `ILIKE` and `NOT ILIKE` partial substring matches.
* **Type Casting**: Safely resolves dynamic combinations by casting values instantly into the physical database types mapped in the schema (e.g. parsing `uuid` bindings to `::uuid`, formatting DateTimes to `::timestamptz`, and numbers to `::numeric`).
### 4. The Stem Engine
### The Stem Engine
Rather than over-fetching heavy Entity payloads and trimming them, Punc Framework Websockets depend on isolated subgraphs defined as **Stems**.
A `Stem` is **not a JSON Pointer** or a physical path string (like `/properties/contacts/items/phone_number`). It is simply a declaration of an **Entity Type boundary** that exists somewhere within the compiled JSON Schema graph.
A `Stem` is a declaration of an **Entity Type boundary** that exists somewhere within the compiled JSON Schema graph, expressed using **`gjson` multipath syntax** (e.g., `contacts.#.phone_numbers.#`).
Because `pg_notify` (Beats) fire rigidly from physical Postgres tables (e.g. `{"type": "phone_number"}`), the Go Framework only ever needs to know: "Does the schema `with_contacts.person` contain the `phone_number` Entity anywhere inside its tree?"
Because `pg_notify` (Beats) fire rigidly from physical Postgres tables (e.g. `{"type": "phone_number"}`), the Go Framework only ever needs to know: "Does the schema `with_contacts.person` contain the `phone_number` Entity anywhere inside its tree, and if so, what is the gjson path to iterate its payload?"
* **Initialization:** During startup (`jspg_stems()`), the database crawls all Schemas and maps out every physical Entity Type it references. It builds a flat dictionary of `Schema ID -> [Entity Types]` (e.g. `with_contacts.person -> ["person", "contact", "phone_number", "email_address"]`).
* **Identifier Prioritization**: When determining if a nested object boundary is an Entity, JSPG natively prioritizes defined `$id` tags over `$ref` inheritance pointers to prevent polymorphic boundaries from devolving into their generic base classes.
* **Cyclical Deduplication**: Because Punc relationships often reference back on themselves via deeply nested classes, the Stem Engine applies intelligent path deduplication. If the active `current_path` already ends with the target entity string, it traverses the inheritance properties without appending the entity to the stem path again, eliminating infinite powerset loops.
* **Relationship Path Squashing:** When calculating nested string paths structurally to discover these boundaries, JSPG intentionally **omits** properties natively named `target` or `source` if they belong to a native database `relationship` table override. This ensures paths like `phone_numbers/contact/target` correctly register their beat resolution pattern as `phone_numbers/contact/phone_number`.
* **Initialization:** During startup (`jspg_stems()`), the database crawls all Schemas and maps out every physical Entity Type it references. It builds a highly optimized `HashMap<String, HashMap<String, Arc<Stem>>>` providing strictly `O(1)` memory lookups mapping `Schema ID -> { Stem Path -> Entity Type }`.
* **GJSON Pathing:** Unlike standard JSON Pointers, stems utilize `.#` array iterator syntax. The Go web server consumes this native path (e.g. `lines.#`) across the raw Postgres JSON byte payload, extracting all active UUIDs in one massive sub-millisecond sweep without unmarshaling Go ASTs.
* **Polymorphic Condition Selectors:** When trailing paths would otherwise collide because of abstract polymorphic type definitions (e.g., a `target` property bounded by a `oneOf` taking either `phone_number` or `email_address`), JSPG natively appends evaluated `gjson` type conditions into the path (e.g. `contacts.#.target#(type=="phone_number")`). This guarantees `O(1)` key uniqueness in the HashMap while retaining extreme array extraction speeds natively without runtime AST evaluation.
* **Identifier Prioritization:** When determining if a nested object boundary is an Entity, JSPG natively prioritizes defined `$id` tags over `$ref` inheritance pointers to prevent polymorphic boundaries from devolving into their generic base classes.
* **Cyclical Deduplication:** Because Punc relationships often reference back on themselves via deeply nested classes, the Stem Engine applies intelligent path deduplication. If the active `current_path` already ends with the target entity string, it traverses the inheritance properties without appending the entity to the stem path again, eliminating infinite powerset loops.
* **Relationship Path Squashing:** When calculating string paths structurally, JSPG intentionally **omits** properties natively named `target` or `source` if they belong to a native database `relationship` table override.
* **The Go Router**: The Golang Punc framework uses this exact mapping to register WebSocket Beat frequencies exclusively on the Entity types discovered.
* **The Queryer Execution**: When the Go framework asks JSPG to hydrate a partial `phone_number` stem for the `with_contacts.person` schema, instead of jumping through string paths, the SQL Compiler simply reaches into the Schema's AST using the `phone_number` Type string, pulls out exactly that entity's mapping rules, and returns a fully correlated `SELECT` block! This natively handles nested array properties injected via `oneOf` or array references efficiently bypassing runtime powerset expansion.
* **Performance:** These Stem execution structures are fully statically compiled via SPI and map perfectly to `O(1)` real-time routing logic on the application tier.
## 5. Testing & Execution Architecture
JSPG implements a strict separation of concerns to bypass the need to boot a full PostgreSQL cluster for unit and integration testing. Because `pgrx::spi::Spi` directly links to PostgreSQL C-headers, building the library with `cargo test` on macOS natively normally results in fatal `dyld` crashes.

View File

@ -1148,7 +1148,7 @@
"description": "Full person stem query on phone number contact",
"action": "query",
"schema_id": "full.person",
"stem": "phone_numbers/contact",
"stem": "phone_numbers.#",
"expect": {
"success": true,
"sql": [
@ -1172,7 +1172,7 @@
"description": "Full person stem query on phone number contact on phone number",
"action": "query",
"schema_id": "full.person",
"stem": "phone_numbers/contact/phone_number",
"stem": "phone_numbers.#.target",
"expect": {
"success": true,
"sql": [
@ -1195,7 +1195,7 @@
"description": "Full person stem query on contact email address",
"action": "query",
"schema_id": "full.person",
"stem": "contacts/contact/email_address",
"stem": "contacts.#.target#(type==\"email_address\")",
"expect": {
"success": true,
"sql": [

View File

@ -10,9 +10,13 @@
"type": "relation",
"constraint": "fk_contact_entity",
"source_type": "contact",
"source_columns": ["entity_id"],
"source_columns": [
"entity_id"
],
"destination_type": "person",
"destination_columns": ["id"],
"destination_columns": [
"id"
],
"prefix": null
},
{
@ -20,88 +24,132 @@
"type": "relation",
"constraint": "fk_relationship_target",
"source_type": "relationship",
"source_columns": ["target_id", "target_type"],
"source_columns": [
"target_id",
"target_type"
],
"destination_type": "entity",
"destination_columns": ["id", "type"],
"destination_columns": [
"id",
"type"
],
"prefix": "target"
}
],
"types": [
{
"name": "entity",
"hierarchy": ["entity"],
"schemas": [{
"$id": "entity",
"type": "object",
"properties": {}
}]
"hierarchy": [
"entity"
],
"schemas": [
{
"$id": "entity",
"type": "object",
"properties": {}
}
]
},
{
"name": "person",
"hierarchy": ["person", "entity"],
"schemas": [{
"$id": "person",
"$ref": "entity",
"properties": {}
}]
"hierarchy": [
"person",
"entity"
],
"schemas": [
{
"$id": "person",
"$ref": "entity",
"properties": {}
}
]
},
{
"name": "email_address",
"hierarchy": ["email_address", "entity"],
"schemas": [{
"$id": "email_address",
"$ref": "entity",
"properties": {}
}]
"hierarchy": [
"email_address",
"entity"
],
"schemas": [
{
"$id": "email_address",
"$ref": "entity",
"properties": {}
}
]
},
{
"name": "phone_number",
"hierarchy": ["phone_number", "entity"],
"schemas": [{
"$id": "phone_number",
"$ref": "entity",
"properties": {}
}]
"hierarchy": [
"phone_number",
"entity"
],
"schemas": [
{
"$id": "phone_number",
"$ref": "entity",
"properties": {}
}
]
},
{
"name": "relationship",
"relationship": true,
"hierarchy": ["relationship", "entity"],
"schemas": [{
"$id": "relationship",
"$ref": "entity",
"properties": {}
}]
"hierarchy": [
"relationship",
"entity"
],
"schemas": [
{
"$id": "relationship",
"$ref": "entity",
"properties": {}
}
]
},
{
"name": "contact",
"relationship": true,
"hierarchy": ["contact", "relationship", "entity"],
"schemas": [{
"$id": "contact",
"$ref": "relationship",
"properties": {
"target": {
"oneOf": [
{ "$ref": "phone_number" },
{ "$ref": "email_address" }
]
"hierarchy": [
"contact",
"relationship",
"entity"
],
"schemas": [
{
"$id": "contact",
"$ref": "relationship",
"properties": {
"target": {
"oneOf": [
{
"$ref": "phone_number"
},
{
"$ref": "email_address"
}
]
}
}
}
}]
]
},
{
"name": "save_person",
"schemas": [{
"$id": "save_person.response",
"$ref": "person",
"properties": {
"contacts": {
"type": "array",
"items": { "$ref": "contact" }
"schemas": [
{
"$id": "save_person.response",
"$ref": "person",
"properties": {
"contacts": {
"type": "array",
"items": {
"$ref": "contact"
}
}
}
}
}]
]
}
]
},
@ -116,15 +164,15 @@
"": {
"type": "person"
},
"contacts/contact": {
"contacts.#": {
"type": "contact",
"relation": "contacts_id"
},
"contacts/contact/email_address": {
"contacts.#.target#(type==\"email_address\")": {
"type": "email_address",
"relation": "target_id"
},
"contacts/contact/phone_number": {
"contacts.#.target#(type==\"phone_number\")": {
"type": "phone_number",
"relation": "target_id"
}
@ -133,11 +181,11 @@
"": {
"type": "contact"
},
"email_address": {
"target#(type==\"email_address\")": {
"type": "email_address",
"relation": "target_id"
},
"phone_number": {
"target#(type==\"phone_number\")": {
"type": "phone_number",
"relation": "target_id"
}
@ -152,7 +200,7 @@
"type": "email_address"
}
},
"phone_number": {
"phone_number": {
"": {
"type": "phone_number"
}

View File

@ -265,12 +265,12 @@ impl Database {
String::from(""),
None,
None,
true,
false,
&mut inner_map,
Vec::new(),
&mut errors,
);
if !inner_map.is_empty() {
println!("SCHEMA: {} STEMS: {:?}", schema_id, inner_map.keys());
db_stems.insert(schema_id, inner_map);
}
}
@ -288,11 +288,12 @@ impl Database {
db: &Database,
root_schema_id: &str,
schema: &Schema,
mut current_path: String,
current_path: String,
parent_type: Option<String>,
property_name: Option<String>,
is_root: bool,
is_polymorphic: bool,
inner_map: &mut HashMap<String, Arc<Stem>>,
seen_entities: Vec<String>,
errors: &mut Vec<crate::drop::Error>,
) {
let mut is_entity = false;
@ -323,6 +324,12 @@ impl Database {
}
}
if is_entity {
if seen_entities.contains(&entity_type) {
return; // Break cyclical schemas!
}
}
let mut relation_col = None;
if is_entity {
if let (Some(pt), Some(prop)) = (&parent_type, &property_name) {
@ -344,46 +351,21 @@ impl Database {
}
}
let mut final_path = current_path.clone();
if is_polymorphic && !final_path.is_empty() && !final_path.ends_with(&entity_type) {
if final_path.ends_with(".#") {
final_path = format!("{}(type==\"{}\")", final_path, entity_type);
} else {
final_path = format!("{}#(type==\"{}\")", final_path, entity_type);
}
}
let stem = Stem {
r#type: entity_type.clone(),
relation: relation_col,
schema: Arc::new(schema.clone()),
};
let mut branch_path = if is_root {
String::new()
} else if current_path.is_empty() {
entity_type.clone()
} else {
format!("{}/{}", current_path, entity_type)
};
// DEDUPLICATION: If we just recursed into the EXACT same entity type definition,
// do not append again and do not re-register the stem.
let already_registered =
if current_path == entity_type || current_path.ends_with(&format!("/{}", entity_type)) {
branch_path = current_path.clone();
true
} else {
false
};
if !already_registered {
if inner_map.contains_key(&branch_path) {
errors.push(crate::drop::Error {
code: "STEM_COLLISION".to_string(),
message: format!("The stem path `{}` resolves to multiple Entity boundaries. This usually occurs during un-wrapped $family or oneOf polymorphic schemas where multiple Entities are directly assigned to the same property. To fix this, encapsulate the polymorphic branch.", branch_path),
details: crate::drop::ErrorDetails {
path: root_schema_id.to_string(),
},
});
}
inner_map.insert(branch_path.clone(), Arc::new(stem));
}
// Update current_path for structural children
current_path = branch_path;
inner_map.insert(final_path, Arc::new(stem));
}
let next_parent = if is_entity {
@ -392,34 +374,22 @@ impl Database {
parent_type.clone()
};
let pass_seen = if is_entity {
let mut ns = seen_entities.clone();
ns.push(entity_type.clone());
ns
} else {
seen_entities.clone()
};
// Properties branch
if let Some(props) = &schema.obj.properties {
for (k, v) in props {
// Bypass target and source properties if we are in a relationship
if let Some(parent_str) = &next_parent {
if let Some(pt) = db.types.get(parent_str) {
if pt.relationship && (k == "target" || k == "source") {
Self::discover_stems(
db,
root_schema_id,
v,
current_path.clone(),
next_parent.clone(),
Some(k.clone()),
false,
inner_map,
errors,
);
continue;
}
}
}
// Standard Property Pathing
let next_path = if current_path.is_empty() {
k.clone()
} else {
format!("{}/{}", current_path, k)
format!("{}.{}", current_path, k)
};
Self::discover_stems(
@ -431,6 +401,7 @@ impl Database {
Some(k.clone()),
false,
inner_map,
pass_seen.clone(),
errors,
);
}
@ -438,15 +409,22 @@ impl Database {
// Array Item branch
if let Some(items) = &schema.obj.items {
let next_path = if current_path.is_empty() {
String::from("#")
} else {
format!("{}.#", current_path)
};
Self::discover_stems(
db,
root_schema_id,
items,
current_path.clone(),
next_path,
next_parent.clone(),
property_name.clone(),
false, // Arrays themselves aren't polymorphic branches, their items might be
false,
inner_map,
pass_seen.clone(),
errors,
);
}
@ -463,8 +441,9 @@ impl Database {
current_path.clone(),
next_parent.clone(),
property_name.clone(),
false,
is_polymorphic,
inner_map,
seen_entities.clone(),
errors,
);
}
@ -481,8 +460,9 @@ impl Database {
current_path.clone(),
next_parent.clone(),
property_name.clone(),
false,
true,
inner_map,
pass_seen.clone(),
errors,
);
}
@ -496,8 +476,9 @@ impl Database {
current_path.clone(),
next_parent.clone(),
property_name.clone(),
false,
is_polymorphic,
inner_map,
pass_seen.clone(),
errors,
);
}

View File

@ -112,7 +112,7 @@ pub fn jspg_validate(schema_id: &str, instance: JsonB) -> JsonB {
#[cfg_attr(not(test), pg_extern)]
pub fn jspg_stems() -> JsonB {
use serde_json::{Map, Value};
use serde_json::Value;
let engine_opt = {
let lock = GLOBAL_JSPG.read().unwrap();
@ -121,9 +121,24 @@ pub fn jspg_stems() -> JsonB {
match engine_opt {
Some(engine) => {
JsonB(serde_json::to_value(&engine.database.stems).unwrap_or(Value::Object(Map::new())))
let mut result_arr = Vec::new();
for (schema_name, stems_map) in &engine.database.stems {
let mut stems_arr = Vec::new();
for (path_key, stem_arc) in stems_map {
stems_arr.push(serde_json::json!({
"path": path_key,
"type": stem_arc.r#type,
"relation": stem_arc.relation
}));
}
result_arr.push(serde_json::json!({
"schema": schema_name,
"stems": stems_arr
}));
}
JsonB(serde_json::to_value(result_arr).unwrap_or(Value::Array(Vec::new())))
}
None => JsonB(Value::Object(Map::new())),
None => JsonB(Value::Array(Vec::new())),
}
}

2391
test_output.txt Normal file

File diff suppressed because it is too large Load Diff