ProductDocumentationExamplesBlogRoadmapGitHubGet Started
Available

Why Krishiv

An honest comparison with Spark, Flink, DataFusion, and DuckDB — and the trade-offs.

This page is written for someone evaluating whether to adopt Krishiv. It is opinionated but tries to be honest about where Krishiv is not the right choice.

TL;DR

Krishiv is for teams that want batch, streaming, and incremental view maintenance from the same engine and the same APIs, written in Rust, and are willing to accept that some capabilities (distributed executor IVM, end-to-end exactly-once, full Iceberg certification) are still maturing.

If you only need batch SQL, DuckDB is faster and more mature. If you need a battle-tested distributed batch + streaming stack today, Spark + Flink is the safer choice. If you want a Rust SQL engine you can embed but do not need streaming or incremental, DataFusion standalone is leaner.

DimensionSpark + FlinkKrishiv
Batch SQLSpark SQL (mature, very large ecosystem)DataFusion-backed (mature, smaller ecosystem)
StreamingFlink (mature, certified for exactly-once at scale)Same runtime as batch (Available for in-process; Preview for end-to-end pipelines)
Incremental viewsMaterialized views in Flink, separate tooling elsewhereFirst-class via IncrementalFlow (Experimental)
APIsJava/Scala/Python, multiple disjoint APIsRust, Python, SQL — one shared plan
RuntimeJVM (Spark), JVM (Flink)Rust + Tokio (single binary, no JVM)
Ecosystem maturity10+ years, broad connector coverageEarly — see Maturity
Lakehouse formatsDelta Lake, Iceberg, Hudi connectorsIceberg-first (Preview), Delta/Hudi sources (Preview)
Exactly-onceCertified at scale in production deploymentsAvailable for specific certified source/sink/checkpoint combinations only

Pick Spark + Flink if you need a battle-tested, large-ecosystem system today. Pick Krishiv if you want one engine, one set of APIs, and a Rust-native binary — and you are willing to grow with it.

vs. Apache DataFusion (standalone)

DataFusion is the SQL engine inside Krishiv. Krishiv adds:

  • A distributed scheduler and executor (DataFusion is a single-process library).
  • A streaming runtime with event-time windows, watermarks, and barriers.
  • Stateful operators with RocksDB-backed keyed state and checkpointing.
  • Connectors for Kafka, Iceberg, S3/ADLS/GCS, vector stores.
  • An incremental view-maintenance runtime (IncrementalFlow).
  • A Python binding and a CLI.

Pick DataFusion if you want to embed a SQL engine in a Rust application and you do not need distributed execution, streaming, or IVM. Pick Krishiv if you want all of the above and are willing to accept a larger API surface and a less mature codebase.

vs. DuckDB

DuckDB is an excellent single-node analytical SQL engine with mature Parquet, CSV, and JSON support. It is faster than Krishiv for many pure batch workloads on a single host.

Pick DuckDB for single-node analytical SQL when you do not need streaming, IVM, or distributed execution. Pick Krishiv if you need streaming, IVM, or a distributed runtime, or if you want the same APIs from Python, Rust, and SQL.

Honest trade-offs

You give upYou gain
Maturity of Spark/Flink in production at scaleOne engine, one API surface, one binary
DuckDB's single-node analytical performanceStreaming + IVM in the same runtime
DataFusion's library-only footprintDistributed scheduler, connectors, Python bindings
JVM tooling and ecosystemRust-native: smaller binary, predictable performance, no JVM tuning

Before you commit

Read the Feature Maturity page. Anything marked Preview or Planned is not production-ready today. The recipes show what works end-to-end with current status labels.