Query Internals
Understand how ObjectQuel queries are parsed, validated, transformed, optimized, and finally compiled to SQL — a complete walkthrough of the internal pipeline from query string to executed result.
The Query Pipeline
Every ObjectQuel query passes through six distinct stages before results are returned. Each stage operates on an Abstract Syntax Tree (AST) and hands it to the next, with no stage skipped and no stage revisited:
- Parsing — The query string is lexed and parsed into an
AstRetrieveAST. - Semantic Analysis — The AST is validated for structural correctness and schema compliance.
- Transformation — Macros are expanded, namespaces are resolved, and
viarelationships are rewritten as direct property lookups. - Optimization — Dead joins are eliminated, subqueries are converted to joins where possible, and aggregates are restructured.
- Decomposition — The single AST is split into per-source execution stages (database, JSON) and arranged into an
ExecutionPlan. - SQL Generation — Each database stage is compiled to a SQL string by
QuelToSQLand executed.
// Entry point in QueryExecutor::executeQuery()
$ast = $this->getObjectQuel()->parse(trim($query)); // Stage 1
$decomposer = new QueryDecomposer();
$executionPlan = $decomposer->buildExecutionPlan($ast, $params); // Stages 2-5 happen inside ObjectQuel::parse()
$result = $this->planExecutor->execute($executionPlan); // Stage 6
return new QuelResult($this->entityManager, $ast, $result);
Stage 1: Parsing
The parser converts the raw query string into an AstRetrieve object — the root node of the AST. The AST captures every part of the query as a tree of typed nodes:
| AST Node | Represents |
|---|---|
AstRetrieve | The complete query (root) |
AstRangeDatabase | A range of x is EntityClass declaration |
AstRangeJsonSource | A range backed by a JSON data source |
AstIdentifier | A property reference such as u.email |
AstExpression | A comparison: u.age > 18 |
AstBinaryOperator | Logical connectives: AND, OR |
AstSearch / AstSearchScore | Full-text search() and search_score() calls |
AstCount, AstSum, AstAvg, … | Aggregate functions |
AstAlias | An aliased projection value (expr as name) |
Nested queries — ranges declared with an inner retrieve — are parsed recursively. The inner AstRetrieve is stored on the parent AstRangeDatabase node and each pipeline stage processes it depth-first before handling the outer query.
Stage 2: Semantic Analysis
SemanticAnalyzer validates the AST against both structural rules and the entity schema. Validation runs in a fixed order so that cheaper structural checks fail fast before the more expensive schema lookups are attempted:
Structural Checks
- No regular expressions in the value list (they are WHERE-only constructs).
- No duplicate range names within one query — each alias must be unique, matching the SQL requirement that every table reference in a query have a distinct alias.
- At least one range must exist without a
viaclause to serve as the SQLFROMtable. A query where every range is aJOINis invalid. - Join properties may only reference other ranges declared in the same query — a join cannot reach an entity that is not part of the current query.
Schema Checks
EntityReferenceValidatorconfirms every entity class named in a range declaration exists in the entity store.ViaClauseValidatorwalks eachviarelationship chain and verifies that every intermediate entity and property exists.EntityPropertyValidatorconfirms every field reference (e.g.u.email) corresponds to a mapped property on the declared entity.NoExpressionsAllowedOnEntitiesValidatorensures arithmetic and other expressions are not applied directly to whole-entity identifiers.
SQL Compliance Checks
Aggregate functions (COUNT, SUM, AVG, MIN, MAX, and their DISTINCT variants) are prohibited in WHERE clauses. The analyzer traverses the condition tree with NodeTypeValidator and throws a QuelException if any aggregate node is found there.
Subqueries
When validating a subquery (a nested retrieve), an additional rule applies: the value list may not reference entire entities. A temporary table must have explicitly named columns — retrieve(u.id, u.name) is valid, but retrieve(u) is not.
Stage 3: Transformation
QueryTransformer applies six sequential visitor passes to rewrite the AST into a form that QuelToSQL can compile directly. Each pass uses the visitor pattern: the transformer creates a visitor object and calls $ast->accept($visitor), which causes the AST to traverse itself and invoke the appropriate visitor method on each node.
| Pass | Visitor | What it does |
|---|---|---|
| 1 | MacroSubstitutor | Finds macro references in the AST and inserts placeholder nodes for later expansion. |
| 2 | RangeDatabaseEntityNormalizer | Adds fully-qualified namespaces to range entity names using the entity store. |
| 3 | EntityProcessRange | Converts range declarations into table references with aliases and join conditions. |
| 4 | MacroExpander | Replaces the placeholder nodes from pass 1 with the full macro body. |
| 5 | EntityNameNormalizer | Resolves all remaining entity name references to their fully-qualified forms. |
| 6 | TransformRelationInViaToPropertyLookup | Converts via relationship chains into direct field-to-field mappings that SQL can understand. |
The via transformation (pass 6) is applied per-range rather than tree-wide. For each range that has a join property, a dedicated TransformRelationInViaToPropertyLookup converter rewrites the join property expression and then traverses the rest of the range for any remaining via references:
// Inside QueryTransformer::transformViaRelations()
foreach ($ast->getRanges() as $range) {
$joinProperty = $range->getJoinProperty();
if ($joinProperty === null) { continue; }
$converter = new TransformRelationInViaToPropertyLookup($this->entityStore, $range);
$range->setJoinProperty($converter->processNodeSide($joinProperty)); // rewrite the join itself
$range->accept($converter); // rewrite anything else in the range
}
Stage 4: Optimization
QueryOptimizer is a facade that delegates to six specialized sub-optimizers. Optimization runs depth-first: all nested queries are fully optimized before the outer query is touched.
Optimization Phases
The optimizer applies its sub-optimizers in a specific order. Later phases can create opportunities for earlier ones, which is why the join optimizer and unused-range removal run a second time at the end:
| Phase | Optimizer | What it does |
|---|---|---|
| 1 | RangeOptimizer | Applies early filtering to reduce dataset size before join processing. |
| 2 | RangeOptimizer (cleanup) | Removes left join ranges and temporary ranges whose results are never referenced in projections or conditions. |
| 3 | JoinOptimizer | Restructures join conditions and eliminates redundant joins. |
| 4a | ExistsOptimizer | Converts EXISTS subqueries into JOINs where the rewrite is semantically safe. |
| 4b | AnyOptimizer | Rewrites ANY subquery patterns into more efficient equivalents. |
| 4c | AggregateOptimizer | Restructures aggregate expressions, taking platform capabilities into account (e.g. whether the engine supports certain window function forms). |
| 5 | JoinOptimizer + RangeOptimizer + JoinConditionFieldInjector | Second pass to catch newly created opportunities; injects required fields into join conditions as a final cleanup step. |
The AggregateOptimizer receives a PlatformCapabilitiesInterface instance so it can make engine-aware decisions. When no platform is specified, a NullPlatformCapabilities no-op is used and platform-specific rewrites are skipped.
Stage 5: Query Decomposition
QueryDecomposer splits the single optimized AST into a set of ExecutionStage objects collected in an ExecutionPlan. This is necessary because a query can join database tables with JSON data sources, and each source type requires a different executor.
How Decomposition Works
The decomposer first identifies all ranges by type. Database ranges (tables and subqueries) are grouped into one ExecutionStage. Each JSON range becomes its own stage, because JSON sources are evaluated in memory and joined to database results after the fact:
// Inside QueryDecomposer::buildExecutionPlan()
$databaseStage = $this->createDatabaseExecutionStage($query, $staticParams);
if ($databaseStage) {
$plan->addStage($databaseStage);
}
// Each JSON/other source gets its own stage
foreach ($query->getOtherRanges() as $otherRange) {
$plan->addStage($this->createRangeExecutionStage($query, $otherRange, $staticParams));
}
Condition Routing
Each stage only receives the conditions that apply to it. The decomposer walks the entire WHERE condition tree and routes each expression to the correct stage by inspecting which ranges each side of the expression references:
- If both sides reference database ranges → kept in the database stage as a join or filter condition.
- If one side references a database range and the other is a literal → kept in the database stage as a filter.
- If either side references a JSON range → excluded from the database stage and placed in the JSON stage's filter conditions.
- If a side references no range at all (a pure literal) → safe to push to the database.
AND and OR nodes are handled recursively. When one branch of an AND cannot be executed by the database but the other can, only the valid branch is passed to the database stage — the structure of the boolean tree is preserved as much as possible, and only unsupported leaves are dropped.
Condition Classification
For JSON stages, conditions are further subdivided into filter conditions and join conditions. A filter condition involves only the current range on one side and a literal on the other (x.value > 100). A join condition involves the current range on one side and a different range on the other (x.id = y.userId). This distinction allows the in-memory executor to apply filters first and then perform the cross-source join:
// Filter: x.value > 100 (only one side references a range)
$leftInvolvesRange = $this->doesConditionInvolveRangeCached($expr->getLeft(), $range);
$rightInvolvesRange = $this->doesConditionInvolveRangeCached($expr->getRight(), $range);
$isFilter = ($leftInvolvesRange && !$this->containsAnyRangeReference($expr->getRight())) ||
($rightInvolvesRange && !$this->containsAnyRangeReference($expr->getLeft()));
// Join: x.id = y.userId (both sides reference a range, but different ones)
$isJoin = ($leftInvolvesRange && $this->containsAnyRangeReference($expr->getRight()) && !$rightInvolvesRange) ||
($rightInvolvesRange && $this->containsAnyRangeReference($expr->getLeft()) && !$leftInvolvesRange);
Temporary Range Ordering
When a query contains temporary ranges (ranges declared with an inner retrieve), they may depend on each other. The decomposer performs a topological sort of all temporary ranges before building the execution plan, so that a range whose inner query references another temporary range's results is always executed after that dependency. A circular dependency between temporary ranges throws a QuelException.
Caching in the Decomposer
The decomposer calls doesConditionInvolveRange() — which recursively walks the condition tree — many times for the same condition/range pair during a single decomposition. Results are cached by a key derived from the spl_object_hash() of both objects. The cache is cleared at the start of each buildExecutionPlan() call to prevent stale results from a previous query leaking into the next.
Stage 6: SQL Generation
QuelToSQL converts a single AstRetrieve into a SQL string. It builds each clause independently and joins non-empty parts with a single space, ensuring no spurious whitespace appears when optional clauses are absent. Clause order follows the SQL standard: SELECT … FROM … JOIN … WHERE … GROUP BY … ORDER BY.
SELECT Clause
Each entry in the value list is visited by QuelToSQLConvertToString in VALUES mode. If the entry is an entity reference without a property access (e.g. retrieve(u)), the entity is expanded to all its mapped columns. Otherwise the expression is emitted as-is with an AS alias. A duplicate-detection step prevents the same column appearing twice — necessary because whole-entity expansion and individual property references can both produce the same column:
// Whole-entity reference: emitted as all columns, no alias
// Property reference: emitted with AS alias
if (!$this->identifierIsEntity($value->getExpression())) {
$sqlResult .= " as `{$value->getName()}`";
}
// Guard against duplicates from overlapping expansions
if (!$this->isDuplicateField($result, $sqlResult)) {
$result[] = $sqlResult;
}
FROM Clause
Ranges without a join property become FROM entries. Ranges that carry an inner query (temporary ranges / subqueries) are emitted as derived tables — the inner AstRetrieve is compiled recursively by a nested convertToSQL() call and wrapped in parentheses:
if ($range->getQuery() !== null) {
$subSQL = $this->convertToSQL($range->getQuery()); // recursive
$tableNames[] = "({$subSQL}) as `{$rangeName}`";
} else {
$owningTable = $this->resolveOwningTable($range);
$tableNames[] = "`{$owningTable}` as `{$rangeName}`";
}
JOIN Clause
Ranges that have a join property and are flagged for inclusion as a join produce a JOIN entry. The join type is determined by the isRequired() flag on the range: required ranges become INNER JOIN, optional ranges become LEFT JOIN. The join condition is compiled by visiting the join property AST node with QuelToSQLConvertToString in CONDITION mode. Subquery ranges in a join position are handled the same way as in the FROM clause — compiled recursively and wrapped in parentheses.
Both getFrom() and getJoins() resolve physical table names through the shared resolveOwningTable() helper, which checks for an explicit table name on the range first and falls back to the entity store. This prevents the two methods from drifting out of sync when a range carries a derived table name.
WHERE Clause
The condition tree is visited by QuelToSQLConvertToString in CONDITION mode. The visitor emits standard SQL operators for comparisons, logical connectives, IS NULL, LIKE, IN(), and full-text MATCH … AGAINST expressions for AstSearch nodes.
ORDER BY Clause
The sort clause has two special-case paths in addition to the default:
- Default — Sort expressions are visited in
SORTmode and emitted asORDER BY expr ASC|DESC. @InValuesAreFinaldirective — When the compiler directiveInValuesAreFinalis set, the optimizer has rewritten a query to use anIN()list with a specific ordering that must be preserved. The sort clause is emitted asORDER BY FIELD(column, val1, val2, …)to maintain that ordering in the database engine.- Application-logic sort — When
getSortInApplicationLogic()is true, noORDER BYis emitted; sorting is handled after the result set is fetched.
GROUP BY Clause
Group-by expressions are visited individually in CONDITION mode and joined with commas. The GROUP BY clause is omitted entirely when no grouping is defined.
Execution and Result Assembly
Once SQL is generated, DatabaseQueryExecutor binds parameters and executes the query against the database. JsonQueryExecutor handles JSON-sourced stages entirely in memory, applying the filter and join conditions extracted during decomposition.
PlanExecutor runs all stages in the ExecutionPlan in order, collecting rows from each stage. The final result is wrapped in a QuelResult, which uses the original AstRetrieve to hydrate raw database rows into entity objects, applying the projection aliases from the value list to map column names back to entity properties.
Executed SQL strings are accumulated by DatabaseQueryExecutor and can be retrieved after the fact via QueryExecutor::getLastExecutedSql(), which is useful for debugging and logging. The list is reset at the start of each executeQuery() call.
Platform Capabilities
Some optimizations and SQL generation decisions depend on which database engine is in use. ObjectQuel models this through a PlatformCapabilitiesInterface that is passed into both QueryOptimizer and QuelToSQL. When no platform is specified, a NullPlatformCapabilities no-op implementation is used, which causes all platform-specific code paths to fall through to safe, universally-compatible SQL.
The main consumer of platform capabilities is AggregateOptimizer, which uses the interface to decide whether to apply engine-specific aggregate rewrites. QuelToSQLConvertToString also consults it when generating full-text search expressions, since MATCH … AGAINST syntax differs slightly between MySQL and MariaDB.