An honest comparison with Spark, Flink, DataFusion, and DuckDB — and the trade-offs.

This page is written for someone evaluating whether to adopt Krishiv. It is opinionated but tries to be honest about where Krishiv is not the right choice.

TL;DR

Krishiv is for teams that want batch, streaming, and incremental view maintenance from the same engine and the same APIs, written in Rust, and are willing to accept that some capabilities (distributed executor IVM, end-to-end exactly-once, full Iceberg certification) are still maturing.

If you only need batch SQL, DuckDB is faster and more mature. If you need a battle-tested distributed batch + streaming stack today, Spark + Flink is the safer choice. If you want a Rust SQL engine you can embed but do not need streaming or incremental, DataFusion standalone is leaner.

vs. Apache Spark + Apache Flink

Dimension	Spark + Flink	Krishiv
Batch SQL	Spark SQL (mature, very large ecosystem)	DataFusion-backed (mature, smaller ecosystem)
Streaming	Flink (mature, certified for exactly-once at scale)	Same runtime as batch (Available for in-process; Preview for end-to-end pipelines)
Incremental views	Materialized views in Flink, separate tooling elsewhere	First-class via `IncrementalFlow` (Experimental)
APIs	Java/Scala/Python, multiple disjoint APIs	Rust, Python, SQL — one shared plan
Runtime	JVM (Spark), JVM (Flink)	Rust + Tokio (single binary, no JVM)
Ecosystem maturity	10+ years, broad connector coverage	Early — see Maturity
Lakehouse formats	Delta Lake, Iceberg, Hudi connectors	Iceberg-first (Preview), Delta/Hudi sources (Preview)
Exactly-once	Certified at scale in production deployments	Available for specific certified source/sink/checkpoint combinations only

Pick Spark + Flink if you need a battle-tested, large-ecosystem system today. Pick Krishiv if you want one engine, one set of APIs, and a Rust-native binary — and you are willing to grow with it.

vs. Apache DataFusion (standalone)

DataFusion is the SQL engine inside Krishiv. Krishiv adds:

A distributed scheduler and executor (DataFusion is a single-process library).
A streaming runtime with event-time windows, watermarks, and barriers.
Stateful operators with RocksDB-backed keyed state and checkpointing.
Connectors for Kafka, Iceberg, S3/ADLS/GCS, vector stores.
An incremental view-maintenance runtime (IncrementalFlow).
A Python binding and a CLI.

Pick DataFusion if you want to embed a SQL engine in a Rust application and you do not need distributed execution, streaming, or IVM. Pick Krishiv if you want all of the above and are willing to accept a larger API surface and a less mature codebase.

vs. DuckDB

DuckDB is an excellent single-node analytical SQL engine with mature Parquet, CSV, and JSON support. It is faster than Krishiv for many pure batch workloads on a single host.

Pick DuckDB for single-node analytical SQL when you do not need streaming, IVM, or distributed execution. Pick Krishiv if you need streaming, IVM, or a distributed runtime, or if you want the same APIs from Python, Rust, and SQL.

Honest trade-offs

You give up	You gain
Maturity of Spark/Flink in production at scale	One engine, one API surface, one binary
DuckDB's single-node analytical performance	Streaming + IVM in the same runtime
DataFusion's library-only footprint	Distributed scheduler, connectors, Python bindings
JVM tooling and ecosystem	Rust-native: smaller binary, predictable performance, no JVM tuning

Before you commit

Read the Feature Maturity page. Anything marked Preview or Planned is not production-ready today. The recipes show what works end-to-end with current status labels.

Why Krishiv