How Krishiv executes a query
A visual walkthrough from session.sql(...) to result — for a single batch SQL query.
This page follows one query — session.sql("SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id") — from your code to the result, so the rest of the documentation has a place to anchor.
At a glance
Your code
session.sql("SELECT customer_id, SUM(amount) ... FROM orders ...")
│
▼
[1] SqlEngine.parse() ── DataFusion SQL → LogicalPlan
│
▼
[2] Krishiv plan + optimizer ── typed Expr AST, predicate pushdown,
│ policy/governance hooks
▼
[3] ExecutionRuntime.accept_plan()
│
├── Embedded → [4a] in-process task graph
├── SingleNode → [4a] in-process task graph + Flight endpoint
└── Distributed → [4b] remote coordinator gRPC
│
▼
[5] Coordinator.submit_job()
│ - job lifecycle, fencing, metadata store
▼
[6] Scheduler → Tasks → Executor(s)
│ - shuffle partitioning, source/sink wiring
▼
[7] DataFusion + Arrow operators
│ - scan, filter, aggregate, window, join, ...
▼
[8] Result: Vec<RecordBatch> (or streaming iterator)
Step by step
1. Parse
Session::sql(...) hands the string to a DataFusion SessionContext. The result is a logical plan — a tree of relational operators (Scan, Filter, Aggregate, Project, …). At this point nothing has run yet.
2. Optimize
Krishiv runs the DataFusion optimizer, then layers its own plan-level transformations: predicate pushdown into source providers, projection pruning, governance/policy hooks, and UDF resolution. The output is a physical plan — operators with specific implementations, partitioning, and shuffle requirements.
3. Route
ExecutionRuntime::accept_plan(plan) decides where the work runs:
| RuntimeMode | What happens |
|---|---|
Embedded | Build the task graph in-process and run it on the calling thread (or spawn_blocking for DataFusion work). |
SingleNode | Same as embedded, but with a Flight SQL endpoint so other clients can attach. |
Distributed | Serialize the plan and ship it to the coordinator gRPC endpoint. No silent fallback — if the endpoint is unreachable, the call fails. |
4. Submit
The coordinator receives the plan, fragments it into tasks, and assigns each task to an executor. Each executor runs the task on the Arrow operator runtime inside krishiv-dataflow — queues, barriers, windows, stateful joins, all working on RecordBatch values.
5. Shuffle, state, and checkpoints
Cross-partition data flows through krishiv-shuffle (in-memory, local disk, or object store, depending on the durability profile). Stateful operators read and write to krishiv-state (in-memory or RocksDB). At configured intervals, the coordinator triggers a checkpoint that snapshots state and source offsets atomically.
6. Collect
For a batch query, the runtime gathers the terminal batches and returns them as a Vec<RecordBatch> (Rust) or a list of pa.RecordBatch (Python). For a streaming query, the same pipeline returns a stream / iterator instead.
Arrow everywhere
The same RecordBatch type flows through every layer — sources, operators, shuffle, and sinks. There is no row-by-row marshalling and no JVM-style boxed objects. This is the main reason Krishiv is competitive with engines that have had years more development time: the columnar format does most of the heavy lifting, and Rust keeps the abstractions cheap.
Where durability fits
Three explicit profiles control what the runtime uses for metadata, state, shuffle, and checkpoints:
| Profile | State | Shuffle | Checkpoints |
|---|---|---|---|
dev-local | In-memory | In-memory | Ephemeral |
single-node-durable | RocksDB | Local disk | Local filesystem |
distributed-durable | RocksDB (restored) | Tiered (local + object store) | Object store + etcd |
See the Execution Model reference for the full list of env vars and the Checkpointing page for the protocol.
What changes for streaming
For streaming queries, the same pipeline runs as a long-lived job: a coordinator-owned process that ingests batches from sources (Kafka, memory streams, registered unbounded tables), processes them through the same operator runtime, and emits results. The plan is the same shape — what changes is that operators push their output to downstream operators instead of pulling all input first. See Python Stream / Rust Stream.
See also
- Execution Model —
RuntimeMode,ExecutionPlacement, durability profiles - Architecture — crate boundaries and design invariants
- Distributed Mode — what the coordinator and executors actually do