ProductDocumentationExamplesBlogRoadmapGitHubGet Started
Available

How Krishiv executes a query

A visual walkthrough from session.sql(...) to result — for a single batch SQL query.

This page follows one query — session.sql("SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id") — from your code to the result, so the rest of the documentation has a place to anchor.

At a glance

Your code
  session.sql("SELECT customer_id, SUM(amount) ... FROM orders ...")
        │
        ▼
[1] SqlEngine.parse()        ── DataFusion SQL → LogicalPlan
        │
        ▼
[2] Krishiv plan + optimizer ── typed Expr AST, predicate pushdown,
        │                          policy/governance hooks
        ▼
[3] ExecutionRuntime.accept_plan()
        │
        ├── Embedded      →  [4a] in-process task graph
        ├── SingleNode    →  [4a] in-process task graph + Flight endpoint
        └── Distributed   →  [4b] remote coordinator gRPC
                                  │
                                  ▼
[5] Coordinator.submit_job()
        │  - job lifecycle, fencing, metadata store
        ▼
[6] Scheduler → Tasks → Executor(s)
        │  - shuffle partitioning, source/sink wiring
        ▼
[7] DataFusion + Arrow operators
        │  - scan, filter, aggregate, window, join, ...
        ▼
[8] Result: Vec<RecordBatch> (or streaming iterator)

Step by step

1. Parse

Session::sql(...) hands the string to a DataFusion SessionContext. The result is a logical plan — a tree of relational operators (Scan, Filter, Aggregate, Project, …). At this point nothing has run yet.

2. Optimize

Krishiv runs the DataFusion optimizer, then layers its own plan-level transformations: predicate pushdown into source providers, projection pruning, governance/policy hooks, and UDF resolution. The output is a physical plan — operators with specific implementations, partitioning, and shuffle requirements.

3. Route

ExecutionRuntime::accept_plan(plan) decides where the work runs:

RuntimeModeWhat happens
EmbeddedBuild the task graph in-process and run it on the calling thread (or spawn_blocking for DataFusion work).
SingleNodeSame as embedded, but with a Flight SQL endpoint so other clients can attach.
DistributedSerialize the plan and ship it to the coordinator gRPC endpoint. No silent fallback — if the endpoint is unreachable, the call fails.

4. Submit

The coordinator receives the plan, fragments it into tasks, and assigns each task to an executor. Each executor runs the task on the Arrow operator runtime inside krishiv-dataflow — queues, barriers, windows, stateful joins, all working on RecordBatch values.

5. Shuffle, state, and checkpoints

Cross-partition data flows through krishiv-shuffle (in-memory, local disk, or object store, depending on the durability profile). Stateful operators read and write to krishiv-state (in-memory or RocksDB). At configured intervals, the coordinator triggers a checkpoint that snapshots state and source offsets atomically.

6. Collect

For a batch query, the runtime gathers the terminal batches and returns them as a Vec<RecordBatch> (Rust) or a list of pa.RecordBatch (Python). For a streaming query, the same pipeline returns a stream / iterator instead.

Arrow everywhere

The same RecordBatch type flows through every layer — sources, operators, shuffle, and sinks. There is no row-by-row marshalling and no JVM-style boxed objects. This is the main reason Krishiv is competitive with engines that have had years more development time: the columnar format does most of the heavy lifting, and Rust keeps the abstractions cheap.

Where durability fits

Three explicit profiles control what the runtime uses for metadata, state, shuffle, and checkpoints:

ProfileStateShuffleCheckpoints
dev-localIn-memoryIn-memoryEphemeral
single-node-durableRocksDBLocal diskLocal filesystem
distributed-durableRocksDB (restored)Tiered (local + object store)Object store + etcd

See the Execution Model reference for the full list of env vars and the Checkpointing page for the protocol.

What changes for streaming

For streaming queries, the same pipeline runs as a long-lived job: a coordinator-owned process that ingests batches from sources (Kafka, memory streams, registered unbounded tables), processes them through the same operator runtime, and emits results. The plan is the same shape — what changes is that operators push their output to downstream operators instead of pulling all input first. See Python Stream / Rust Stream.

See also