DataFrame 101 — filter, group, sort
Use the typed DataFrame API instead of SQL.
The DataFrame API is the typed, chainable alternative to SQL. Use whichever fits the call site — both go through the same planner.
Python
import krishiv as ks
from krishiv.functions import col, lit, sum, count
session = ks.Session.embedded()
top = (session.read_parquet("data/orders.parquet")
.filter(col("status") == lit("paid"))
.group_by(["region", "category"])
.agg([sum(col("amount")).alias("total"),
count(col("*")).alias("n")])
.order_by(["total"], ascending=False)
.limit(20))
top.show()
Rust
use krishiv_api::{col, count_all, lit, sum, Session};
#[tokio::main]
async fn main() -> krishiv_api::Result<()> {
let session = Session::embedded().await?;
let df = session.read_parquet("data/orders.parquet").await?
.filter(col("status").eq(lit("paid")))?
.group_by(vec![col("region"), col("category")])?
.agg(vec![
sum(col("amount")).alias("total"),
count_all().alias("n"),
])?
.sort(vec![col("total").desc()])?
.limit(20);
df.show().await?;
Ok(())
}
SQL or DataFrame?
| Use SQL when… | Use DataFrame when… |
|---|---|
| The query is one-off, ad-hoc, or shared with analysts. | You are composing a pipeline programmatically. |
| You want to keep the query string portable. | You want compile-time checking of column names and types. |
You need window functions or MATCH_RECOGNIZE. | You are building a library or framework on top of Krishiv. |