Exactly-once pipeline
Build a pipeline with exactly-once delivery using a certified source/sink/checkpoint combination.
Important: "Exactly-once" in Krishiv is a property of a specific source + sink + checkpoint combination, not a global guarantee. Use this recipe only with a certified combination; otherwise prefer at-least-once with an idempotent sink.
What "exactly-once" means here
The end-to-end delivery guarantee is the weakest guarantee supplied by the source, sink, checkpoint storage, and durability profile. Exactly-once requires all four:
| Component | Requirement |
|---|---|
| Source | Supports offset/position tracking (e.g. Kafka with group.id). |
| Sink | Transactional or two-phase. Output is committed atomically with the checkpoint. |
| Checkpoint storage | Atomic, fenced. Object store + etcd under distributed-durable. |
| Coordinator | Fenced with an epoch token so stale completions are rejected. |
Skeleton (Kafka → Iceberg, distributed-durable)
export KRISHIV_DURABILITY_PROFILE=distributed-durable
export KRISHIV_COORDINATOR=https://coord.internal:50051
export KRISHIV_COORDINATOR_BEARER_TOKEN=...
export KRISHIV_SHUFFLE_OBJECT_STORE_URI=s3://bucket/shuffle/
-- A transactional Iceberg sink
CREATE SINK orders_eo
TYPE ICEBERG
OPTIONS (
'catalog.uri' = 'http://catalog:8181',
'warehouse' = 's3://bucket/wh',
'commit' = 'transactional'
);
START PIPELINE orders_raw TO orders_eo
AS SELECT * FROM orders_raw;
Verify
- Force-kill an executor mid-pipeline; the coordinator fences it and restarts from the last committed checkpoint.
- Replay the source; the sink should not produce duplicate committed snapshots.
- Inspect commit metadata on the Iceberg table to confirm the snapshot lineage.
See Connectors for the certified source/sink matrix and Checkpointing for the protocol details.