Query Internals
Understand how ObjectQuel queries are parsed, validated, transformed, optimized, and finally compiled to SQL — a complete walkthrough of the internal pipeline from query string to executed result.
The Query Pipeline
Every ObjectQuel query passes through six distinct stages before results are returned. Each stage operates on an Abstract Syntax Tree (AST) and hands it to the next, with no stage skipped and no stage revisited:
- Parsing — The query string is lexed and parsed into an
AstRetrieveAST. - Semantic Analysis — The AST is validated for structural correctness and schema compliance.
- Transformation — Macros are expanded, namespaces are resolved, and
viarelationships are rewritten as direct property lookups. - Optimization — Dead joins are eliminated, subqueries are converted to joins where possible, and aggregates are restructured.
- Planning —
ExecutionPlanBuilderdecomposes the AST into anExecutionPlancontainingExecutionStageandTempTableStageobjects, one per data source, with dependency edges wired between them. - SQL Generation — Each database stage is compiled to a SQL string by
QuelToSQLand executed.
// Entry point in QueryExecutor::executeQuery()
$ast = $this->getObjectQuel()->parse(trim($query)); // Stage 1 (+ stages 2-4 internally)
$builder = new ExecutionPlanBuilder();
$executionPlan = $builder->build($ast, $params); // Stage 5
$result = $this->planExecutor->execute($executionPlan); // Stage 6
return new QuelResult($this->entityManager, $ast, $result);
Stage 1: Parsing
The parser converts the raw query string into an AstRetrieve object — the root node of the AST. The AST captures every part of the query as a tree of typed nodes. The most commonly encountered node types are:
| AST Node | Represents |
|---|---|
AstRetrieve | The complete query (root) |
AstRangeDatabase | A range of x is EntityClass declaration backed by a database table |
AstRangeDatabaseSubquery | A range backed by an inner retrieve; the embedded AstRetrieve is stored on this node |
AstRangeJsonSource | A range backed by a JSON data source |
AstIdentifier | A property reference such as u.email |
AstExpression | A comparison: u.age > 18 |
AstBinaryOperator | Logical connectives: AND, OR |
AstSearch / AstSearchScore | Full-text search() and search_score() calls |
AstCount, AstSum, AstAvg, … | Aggregate functions |
AstAlias | An aliased projection value (expr as name) |
Nested queries — ranges declared with an inner retrieve — are parsed recursively. The parser emits an AstRangeDatabaseSubquery node for each such range, with the inner AstRetrieve stored on it. Each pipeline stage processes nested queries depth-first before handling the outer query.
Stage 2: Semantic Analysis
SemanticAnalyzer validates the AST against both structural rules and the entity schema. Validation runs in a fixed order so that cheaper structural checks fail fast before the more expensive schema lookups are attempted:
Structural Checks
- No regular expressions in the value list (they are WHERE-only constructs).
- No duplicate range names within one query — each alias must be unique, matching the SQL requirement that every table reference in a query have a distinct alias.
- At least one range must exist without a
viaclause to serve as the SQLFROMtable. A query where every range is aJOINis invalid. - Join properties may only reference other ranges declared in the same query — a join cannot reach an entity that is not part of the current query.
Schema Checks
EntityReferenceValidatorconfirms every entity class named in a range declaration exists in the entity store.ViaClauseValidatorwalks eachviarelationship chain and verifies that every intermediate entity and property exists.EntityPropertyValidatorconfirms every field reference (e.g.u.email) corresponds to a mapped property on the declared entity.NoExpressionsAllowedOnEntitiesValidatorensures arithmetic and other expressions are not applied directly to whole-entity identifiers.
SQL Compliance Checks
Aggregate functions (COUNT, SUM, AVG, MIN, MAX, and their DISTINCT variants) are prohibited in WHERE clauses. The analyzer traverses the condition tree with NodeTypeValidator and throws a QuelException if any aggregate node is found there.
Subqueries
When validating a subquery (a nested retrieve), an additional rule applies: the outer query may not reference the subquery result as a whole entity. A subquery's value list may include whole-entity references such as retrieve(u) — these are expanded to individual columns during SQL generation — but the outer query can only access those columns as individual properties, not hydrate them back into entity objects.
Stage 3: Transformation
QueryTransformer applies six sequential visitor passes to rewrite the AST into a form that QuelToSQL can compile directly. Each pass uses the visitor pattern: the transformer creates a visitor object and calls $ast->accept($visitor), which causes the AST to traverse itself and invoke the appropriate visitor method on each node.
| Pass | Visitor | What it does |
|---|---|---|
| 1 | MacroSubstitutor | Finds macro references in the AST and inserts placeholder nodes for later expansion. |
| 2 | RangeDatabaseEntityNormalizer | Adds fully-qualified namespaces to range entity names using the entity store. |
| 3 | EntityProcessRange | Converts range declarations into table references with aliases and join conditions. |
| 4 | MacroExpander | Replaces the placeholder nodes from pass 1 with the full macro body. |
| 5 | EntityNameNormalizer | Resolves all remaining entity name references to their fully-qualified forms. |
| 6 | TransformRelationInViaToPropertyLookup | Converts via relationship chains into direct field-to-field mappings that SQL can understand. |
The via transformation (pass 6) is applied per-range rather than tree-wide. For each range that has a join property, a dedicated TransformRelationInViaToPropertyLookup converter rewrites the join property expression and then traverses the rest of the range for any remaining via references:
// Inside QueryTransformer::transformViaRelations()
foreach ($ast->getRanges() as $range) {
$joinProperty = $range->getJoinProperty();
if ($joinProperty === null) { continue; }
$converter = new TransformRelationInViaToPropertyLookup($this->entityStore, $range);
$range->setJoinProperty($converter->processNodeSide($joinProperty)); // rewrite the join itself
$range->accept($converter); // rewrite anything else in the range
}
Stage 4: Optimization
QueryOptimizer is a facade that delegates to six specialized sub-optimizers. Optimization runs depth-first: all nested queries are fully optimized before the outer query is touched.
Optimization Phases
The optimizer applies its sub-optimizers in a specific order. Later phases can create opportunities for earlier ones, which is why the join optimizer and unused-range removal run a second time at the end:
| Phase | Optimizer | What it does |
|---|---|---|
| 1 | RangeOptimizer | Applies early filtering to reduce dataset size before join processing. |
| 2 | RangeOptimizer (cleanup) | Removes left join ranges and temporary ranges whose results are never referenced in projections or conditions. |
| 3 | JoinOptimizer | Restructures join conditions and eliminates redundant joins. |
| 4a | ExistsOptimizer | Converts EXISTS subqueries into JOINs where the rewrite is semantically safe. |
| 4b | AnyOptimizer | Rewrites ANY subquery patterns into more efficient equivalents. |
| 4c | AggregateOptimizer | Restructures aggregate expressions, taking platform capabilities into account (e.g. whether the engine supports certain window function forms). |
| 5 | JoinOptimizer + RangeOptimizer + JoinConditionFieldInjector | Second pass to catch newly created opportunities; injects required fields into join conditions as a final cleanup step. |
Stage 5: Planning
ExecutionPlanBuilder takes the optimized AstRetrieve and produces an ExecutionPlan — an ordered collection of ExecutionStageInterface objects that PlanExecutor runs sequentially. This stage is necessary because a single ObjectQuel query can join database tables with external sources (JSON, etc.), and each source type requires a different executor.
Subquery Range Specialization
Before building any stages, the planner specializes the AstRangeDatabaseSubquery nodes that the parser produced. Those whose inner query contains external (non-SQL) sources become AstRangeDatabaseTempTable — they cannot be compiled as inline SQL and must be materialized into a real temporary table first. Those whose inner query is pure SQL become AstRangeDatabaseMaterialized — they are emitted as a derived table directly in the FROM or JOIN clause by QuelToSQL and require no separate stage.
Stage Types
The planner produces two distinct kinds of stages:
ExecutionStage— represents a SQL-executable query fragment. It holds a cloned, stripped-downAstRetrievecontaining only the ranges and projections that the database engine can handle, along with the conditions that were routable to the database. There is at most one databaseExecutionStageper plan, plus one additionalExecutionStageper JSON/external range.TempTableStage— represents a subquery range whose inner query contains external sources and cannot be compiled as an inline SQL derived table. The planner assigns a temporary table name to theAstRangeDatabaseTempTablenode soQuelToSQLtreats it as a plain table.TempTableExecutorthen runs the inner plan at execution time and materializes the results into that table before the outer stage runs.
Build Order
The planner constructs stages in a fixed order so that dependency edges are always registered before the stages that need them:
- Identify all
AstRangeDatabaseTempTableranges in the query — subquery ranges whose inner query contains external sources. - Topologically sort those ranges using Kahn's algorithm, so that a temp range whose inner query references another temp range is always processed after its dependency. A circular dependency throws a
QuelException. - For each temp range in sorted order, recursively call
build()on its inner query to produce a nestedExecutionPlan, add aTempTableStageto the outer plan, and register any inter-TempTableStagedependency edges. - Build the main database
ExecutionStageviaStageFactory::createDatabaseExecutionStage()and add it. Register a dependency edge from this stage to everyTempTableStage, ensuring all temp tables are materialized before the outer SQL runs. - For each JSON/external range, build a separate
ExecutionStageviaStageFactory::createRangeExecutionStage()and add it. These stages have no dependency edges — they execute after the database stage and join in memory.
// Simplified view of ExecutionPlanBuilder::build()
$plan = new ExecutionPlan();
// Steps 1-2: identify and topologically sort temp-table ranges
$tempRanges = $this->stageFactory->extractTemporaryRanges($query);
if (!empty($tempRanges)) {
$tempRanges = $this->sortByDependency($tempRanges);
}
// Step 3: build a TempTableStage for each; register inter-stage dependencies
$tempTableStageNames = $this->buildTempTableStages($plan, $tempRanges, $staticParams);
// Step 4: build the main database stage; wire dependencies on all TempTableStages
$this->buildDatabaseStage($plan, $query, $staticParams, $tempTableStageNames);
// Step 5: one ExecutionStage per JSON/external range
foreach ($query->getOtherRanges() as $otherRange) {
$plan->addStage($this->stageFactory->createRangeExecutionStage($query, $otherRange, $staticParams));
}
return $plan;
Condition Routing
When building the database stage, StageFactory does not pass the original WHERE clause through unchanged. It uses ConditionFilter to extract only the conditions that the database engine can evaluate — those that reference only database ranges or pure literals. Conditions that touch an external range are excluded from the database stage and placed in the corresponding JSON stage instead.
AND and OR nodes are handled recursively: when one branch of an AND cannot be routed to the database but the other can, only the valid branch is forwarded. The boolean structure is preserved as far as possible; unsupported leaves are dropped from the database-stage condition tree.
For JSON stages, ConditionFilter further subdivides conditions into filter conditions (one side references the current range, the other is a literal — e.g. x.value > 100) and join conditions (both sides reference ranges, but different ones — e.g. x.id = y.userId). JsonQueryExecutor applies filter conditions first to reduce the working set, then performs the cross-source join using the join conditions.
Dependency Ordering at Runtime
ExecutionPlan::getStagesInOrder() runs Kahn's algorithm over the dependency graph at execution time to produce the final stage sequence. TempTableStage objects always appear before the database ExecutionStage that references their temporary tables, and a TempTableStage that depends on another appears after it. A cycle in the dependency graph throws a QuelException, which indicates a planner bug rather than a user query error.
Caching in ConditionAnalyzer
ConditionAnalyzer is the shared component that answers "does this condition reference this range?" — a question that ConditionFilter and StageFactory both ask repeatedly for the same condition/range pairs during a single plan build. To avoid redundant recursive tree walks, ConditionAnalyzer caches results keyed by a string derived from the spl_object_hash() of both the condition node and the range object. The cache is cleared at the top of each ExecutionPlanBuilder::build() call, preventing stale results from a previous query leaking into the next.
Stage 6: SQL Generation
QuelToSQL converts a single AstRetrieve into a SQL string. It builds each clause independently and joins non-empty parts with a single space, ensuring no spurious whitespace appears when optional clauses are absent. Clause order follows the SQL standard: SELECT … FROM … JOIN … WHERE … GROUP BY … ORDER BY.
SELECT Clause
Each entry in the value list is visited by QuelToSQLConvertToString in VALUES mode. If the entry is an entity reference without a property access (e.g. retrieve(u)), the entity is expanded to all its mapped columns. Otherwise the expression is emitted as-is with an AS alias. A duplicate-detection step prevents the same column appearing twice — necessary because whole-entity expansion and individual property references can both produce the same column:
// Whole-entity reference: emitted as all columns, no alias
// Property reference: emitted with AS alias
if (!$this->identifierIsEntity($value->getExpression())) {
$sqlResult .= " as `{$value->getName()}`";
}
// Guard against duplicates from overlapping expansions
if (!$this->isDuplicateField($result, $sqlResult)) {
$result[] = $sqlResult;
}
FROM Clause
Ranges without a join property become FROM entries. Ranges that carry an inner query (temporary ranges / subqueries) are emitted as derived tables — the inner AstRetrieve is compiled recursively by a nested convertToSQL() call and wrapped in parentheses:
if ($range->getQuery() !== null) {
$subSQL = $this->convertToSQL($range->getQuery()); // recursive
$tableNames[] = "({$subSQL}) as `{$rangeName}`";
} else {
$owningTable = $this->resolveOwningTable($range);
$tableNames[] = "`{$owningTable}` as `{$rangeName}`";
}
JOIN Clause
Ranges that have a join property and are flagged for inclusion as a join produce a JOIN entry. The join type is determined by the isRequired() flag on the range: required ranges become INNER JOIN, optional ranges become LEFT JOIN. The join condition is compiled by visiting the join property AST node with QuelToSQLConvertToString in CONDITION mode. Subquery ranges in a join position are handled the same way as in the FROM clause — compiled recursively and wrapped in parentheses.
Both getFrom() and getJoins() resolve physical table names through the shared resolveOwningTable() helper, which checks for an explicit table name on the range first and falls back to the entity store. This prevents the two methods from drifting out of sync when a range carries a derived table name.
WHERE Clause
The condition tree is visited by QuelToSQLConvertToString in CONDITION mode. The visitor emits standard SQL operators for comparisons, logical connectives, IS NULL, LIKE, IN(), and full-text MATCH … AGAINST expressions for AstSearch nodes.
ORDER BY Clause
The sort clause has two special-case paths in addition to the default:
- Default — Sort expressions are visited in
SORTmode and emitted asORDER BY expr ASC|DESC. @InValuesAreFinaldirective — When the compiler directiveInValuesAreFinalis set, the optimizer has rewritten a query to use anIN()list with a specific ordering that must be preserved. The sort clause is emitted asORDER BY FIELD(column, val1, val2, …)to maintain that ordering in the database engine.- Application-logic sort — When
getSortInApplicationLogic()is true, noORDER BYis emitted; sorting is handled after the result set is fetched.
GROUP BY Clause
Group-by expressions are visited individually in CONDITION mode and joined with commas. The GROUP BY clause is not user-specified — it is injected by the optimizer when aggregate restructuring requires it, and omitted otherwise.
Execution and Result Assembly
Once SQL is generated, DatabaseQueryExecutor binds parameters and executes the query against the database. JsonQueryExecutor handles JSON-sourced stages entirely in memory, applying the filter and join conditions extracted during decomposition.
PlanExecutor calls ExecutionPlan::getStagesInOrder() and iterates the result. TempTableStage objects run first: TempTableExecutor executes each inner plan and materializes the rows into the temporary table the planner named. Then the result-producing stages run — DatabaseQueryExecutor for SQL stages, JsonQueryExecutor for in-memory JSON stages. The combined rows are wrapped in a QuelResult, which hydrates them into entity objects using the projection aliases from the original AstRetrieve.
Executed SQL strings are accumulated by DatabaseQueryExecutor and can be retrieved after the fact via QueryExecutor::getLastExecutedSql(), which is useful for debugging and logging. The list is reset at the start of each executeQuery() call.
Platform Capabilities
Some optimizations and SQL generation decisions depend on which database engine is in use. ObjectQuel models this through a PlatformCapabilitiesInterface that is passed into both QueryOptimizer and QuelToSQL. When no platform is specified, a NullPlatformCapabilities no-op implementation is used, which causes all platform-specific code paths to fall through to safe, universally-compatible SQL.
The main consumer of platform capabilities is AggregateOptimizer, which uses the interface to decide whether to apply engine-specific aggregate rewrites. QuelToSQLConvertToString also consults it when generating full-text search expressions, since MATCH … AGAINST syntax differs slightly between MySQL and MariaDB.