Distributed Mode
What the coordinator and executors actually do, and how to operate them.
Preview: Distributed mode has the building blocks in place — Flight coordinator, executor gRPC, bearer-token auth, etcd metadata, object-store shuffle, fenced checkpoints. End-to-end certification work continues; verify your specific workload with the maintainers before relying on it for production.
Topology
The ASCII tree below is the same topology in text form, useful for grepping logs:
Client (Python / Rust / SQL)
│ Arrow Flight SQL
▼
Coordinator (krishiv-scheduler)
│ Task-control gRPC (bearer token)
▼
Executor(s) (krishiv-executor)
│
▼
Arrow / DataFusion operators, shuffle, state, sources, sinks
Coordinator responsibilities
- Accept job submissions from clients.
- Fragment plans into tasks and assign them to executors.
- Issue epoch fence tokens so stale task completions are rejected (no double-commit).
- Drive checkpoint barriers and persist checkpoint metadata to etcd.
- Surface job status, health, and metrics.
Executor responsibilities
- Connect to the coordinator with a bearer token.
- Receive task assignments and execute them on the Arrow operator runtime.
- Report heartbeats and task completions with the current fence token.
- Read/write local RocksDB state and local-disk shuffle when in the
distributed-durableprofile. - Use the object-store shuffle backend for cross-host data exchange.
An explicit endpoint is required
Distributed mode will not silently fall back to in-process execution. If KRISHIV_COORDINATOR is unset or unreachable, Session::connect(...) and Session::from_env() fail with a configuration error. This is intentional — you should not get distributed-looking results from an in-process engine.
Auth (required for production)
| Variable | Purpose |
|---|---|
KRISHIV_COORDINATOR | Coordinator gRPC endpoint. |
KRISHIV_COORDINATOR_BEARER_TOKEN | Bearer token sent on every call. |
KRISHIV_COORDINATOR_BEARER_TOKENS | Accepted server tokens (rotation). |
KRISHIV_COORDINATOR_BEARER_TOKEN_FILE | File-based token, live-reloaded. |
KRISHIV_EXECUTOR_TASK_BEARER_TOKEN | Token for executor → coordinator gRPC. |
See the Architecture page for the crate-level breakdown.
Deployment options
- Deployment — embedded, single-node daemon, and Kubernetes operator / CRD.
- Single-node durable recipe — quick local daemon setup.
- Scheduler — coordinator lifecycle and task state machine.
- Shuffle — backends and configuration.
- Checkpointing — protocol and recovery.