Why Krishiv
An honest comparison with Spark, Flink, DataFusion, and DuckDB — and the trade-offs.
This page is written for someone evaluating whether to adopt Krishiv. It is opinionated but tries to be honest about where Krishiv is not the right choice.
TL;DR
Krishiv is for teams that want batch, streaming, and incremental view maintenance from the same engine and the same APIs, written in Rust, and are willing to accept that some capabilities (distributed executor IVM, end-to-end exactly-once, full Iceberg certification) are still maturing.
If you only need batch SQL, DuckDB is faster and more mature. If you need a battle-tested distributed batch + streaming stack today, Spark + Flink is the safer choice. If you want a Rust SQL engine you can embed but do not need streaming or incremental, DataFusion standalone is leaner.
vs. Apache Spark + Apache Flink
| Dimension | Spark + Flink | Krishiv |
|---|---|---|
| Batch SQL | Spark SQL (mature, very large ecosystem) | DataFusion-backed (mature, smaller ecosystem) |
| Streaming | Flink (mature, certified for exactly-once at scale) | Same runtime as batch (Available for in-process; Preview for end-to-end pipelines) |
| Incremental views | Materialized views in Flink, separate tooling elsewhere | First-class via IncrementalFlow (Experimental) |
| APIs | Java/Scala/Python, multiple disjoint APIs | Rust, Python, SQL — one shared plan |
| Runtime | JVM (Spark), JVM (Flink) | Rust + Tokio (single binary, no JVM) |
| Ecosystem maturity | 10+ years, broad connector coverage | Early — see Maturity |
| Lakehouse formats | Delta Lake, Iceberg, Hudi connectors | Iceberg-first (Preview), Delta/Hudi sources (Preview) |
| Exactly-once | Certified at scale in production deployments | Available for specific certified source/sink/checkpoint combinations only |
Pick Spark + Flink if you need a battle-tested, large-ecosystem system today. Pick Krishiv if you want one engine, one set of APIs, and a Rust-native binary — and you are willing to grow with it.
vs. Apache DataFusion (standalone)
DataFusion is the SQL engine inside Krishiv. Krishiv adds:
- A distributed scheduler and executor (DataFusion is a single-process library).
- A streaming runtime with event-time windows, watermarks, and barriers.
- Stateful operators with RocksDB-backed keyed state and checkpointing.
- Connectors for Kafka, Iceberg, S3/ADLS/GCS, vector stores.
- An incremental view-maintenance runtime (
IncrementalFlow). - A Python binding and a CLI.
Pick DataFusion if you want to embed a SQL engine in a Rust application and you do not need distributed execution, streaming, or IVM. Pick Krishiv if you want all of the above and are willing to accept a larger API surface and a less mature codebase.
vs. DuckDB
DuckDB is an excellent single-node analytical SQL engine with mature Parquet, CSV, and JSON support. It is faster than Krishiv for many pure batch workloads on a single host.
Pick DuckDB for single-node analytical SQL when you do not need streaming, IVM, or distributed execution. Pick Krishiv if you need streaming, IVM, or a distributed runtime, or if you want the same APIs from Python, Rust, and SQL.
Honest trade-offs
| You give up | You gain |
|---|---|
| Maturity of Spark/Flink in production at scale | One engine, one API surface, one binary |
| DuckDB's single-node analytical performance | Streaming + IVM in the same runtime |
| DataFusion's library-only footprint | Distributed scheduler, connectors, Python bindings |
| JVM tooling and ecosystem | Rust-native: smaller binary, predictable performance, no JVM tuning |
Before you commit
Read the Feature Maturity page. Anything marked Preview or Planned is not production-ready today. The recipes show what works end-to-end with current status labels.