ProductDocumentationExamplesBlogRoadmapGitHubGet Started
Available

DataFrame 101 — filter, group, sort

Use the typed DataFrame API instead of SQL.

The DataFrame API is the typed, chainable alternative to SQL. Use whichever fits the call site — both go through the same planner.

Python

import krishiv as ks
from krishiv.functions import col, lit, sum, count

session = ks.Session.embedded()

top = (session.read_parquet("data/orders.parquet")
    .filter(col("status") == lit("paid"))
    .group_by(["region", "category"])
    .agg([sum(col("amount")).alias("total"),
          count(col("*")).alias("n")])
    .order_by(["total"], ascending=False)
    .limit(20))
top.show()

Rust

use krishiv_api::{col, count_all, lit, sum, Session};

#[tokio::main]
async fn main() -> krishiv_api::Result<()> {
    let session = Session::embedded().await?;
    let df = session.read_parquet("data/orders.parquet").await?
        .filter(col("status").eq(lit("paid")))?
        .group_by(vec![col("region"), col("category")])?
        .agg(vec![
            sum(col("amount")).alias("total"),
            count_all().alias("n"),
        ])?
        .sort(vec![col("total").desc()])?
        .limit(20);
    df.show().await?;
    Ok(())
}

SQL or DataFrame?

Use SQL when…Use DataFrame when…
The query is one-off, ad-hoc, or shared with analysts.You are composing a pipeline programmatically.
You want to keep the query string portable.You want compile-time checking of column names and types.
You need window functions or MATCH_RECOGNIZE.You are building a library or framework on top of Krishiv.