ProductDocumentationExamplesBlogRoadmapGitHubGet Started
Available

Health & Status

Liveness, readiness, scheduler status endpoints, and the ObservabilityReport types.

Every long-running Krishiv process exposes a small set of HTTP endpoints for liveness, readiness, and status. The CLI also has commands that return machine-readable JSON for use in scripts and CI.

HTTP endpoints

PathProcessPurpose
GET /healthzcoordinator, clusterd, executor, UILiveness. Returns 200 OK if the process is alive. Anonymous in all profiles.
GET /readyzcoordinator, clusterd, executorReadiness. Returns 200 OK if the process can serve traffic. Requires auth in production.
GET /metricscoordinator, executor, UIPrometheus text format.
GET /api/v1/openapi.jsoncoordinatorOpenAPI 3.1 spec for the management API.
GET /api/v1/jobscoordinatorList jobs (paginated with ?limit=&offset=).
GET /api/v1/jobs/{id}coordinatorJob detail with stages and tasks.
GET /api/v1/executorscoordinatorList executors and their health.
GET /api/v1/queuescoordinatorNamespace quota snapshot.
GET /api/v1/openapi.jsoncoordinatorOpenAPI 3.1 spec for the management API.

CLI status commands

# List running and recent jobs
krishiv jobs [--distributed]

# Inspect operator state for a job
krishiv state inspect --job my-pipeline --operator my-operator

# Trigger a savepoint
krishiv savepoint --job my-pipeline --label before-deploy

# Show the cluster status
krishiv local status
krishiv cluster status

Typed status reports

Programmatic consumers should use the typed report structs (per the krishiv-metrics::observability_report module):

use krishiv_metrics::ObservabilityReport;

let report: ObservabilityReport = build_report(&coordinator, &executors);
for job in &report.jobs {
    println!("{} state={:?} rows={}", job.id, job.state, job.total_rows);
}
for ex in &report.executors {
    println!("{} slots={}/{} lost={}", ex.id, ex.slots_used, ex.slots_total, ex.lost_count);
}

Sub-types: ReportJob, ReportStage, ReportTask, ReportRuntimeStats, ReportExecutor, ReportCheckpoint, ReportShuffle, ReportStreamingState, ReportEvent, ReportConnectorMetrics.

System metrics

For capacity planning, krishiv_metrics::system_metrics() -> &'static SystemMetrics exposes:

  • CPU cores (logical)
  • Total and available memory bytes
  • Hostname, OS, kernel version
  • Process ID, uptime seconds

Auth on management endpoints

All endpoints except /healthz require a bearer token in production profiles. Set KRISHIV_COORDINATOR_BEARER_TOKEN (or the file / multi-token variants) before starting the coordinator. The UI also accepts a separate KRISHIV_UI_TOKEN.

See also