Health & Status
Liveness, readiness, scheduler status endpoints, and the ObservabilityReport types.
Every long-running Krishiv process exposes a small set of HTTP endpoints for liveness, readiness, and status. The CLI also has commands that return machine-readable JSON for use in scripts and CI.
HTTP endpoints
| Path | Process | Purpose |
|---|---|---|
GET /healthz | coordinator, clusterd, executor, UI | Liveness. Returns 200 OK if the process is alive. Anonymous in all profiles. |
GET /readyz | coordinator, clusterd, executor | Readiness. Returns 200 OK if the process can serve traffic. Requires auth in production. |
GET /metrics | coordinator, executor, UI | Prometheus text format. |
GET /api/v1/openapi.json | coordinator | OpenAPI 3.1 spec for the management API. |
GET /api/v1/jobs | coordinator | List jobs (paginated with ?limit=&offset=). |
GET /api/v1/jobs/{id} | coordinator | Job detail with stages and tasks. |
GET /api/v1/executors | coordinator | List executors and their health. |
GET /api/v1/queues | coordinator | Namespace quota snapshot. |
GET /api/v1/openapi.json | coordinator | OpenAPI 3.1 spec for the management API. |
CLI status commands
# List running and recent jobs
krishiv jobs [--distributed]
# Inspect operator state for a job
krishiv state inspect --job my-pipeline --operator my-operator
# Trigger a savepoint
krishiv savepoint --job my-pipeline --label before-deploy
# Show the cluster status
krishiv local status
krishiv cluster status
Typed status reports
Programmatic consumers should use the typed report structs (per the krishiv-metrics::observability_report module):
use krishiv_metrics::ObservabilityReport;
let report: ObservabilityReport = build_report(&coordinator, &executors);
for job in &report.jobs {
println!("{} state={:?} rows={}", job.id, job.state, job.total_rows);
}
for ex in &report.executors {
println!("{} slots={}/{} lost={}", ex.id, ex.slots_used, ex.slots_total, ex.lost_count);
}
Sub-types: ReportJob, ReportStage, ReportTask, ReportRuntimeStats, ReportExecutor, ReportCheckpoint, ReportShuffle, ReportStreamingState, ReportEvent, ReportConnectorMetrics.
System metrics
For capacity planning, krishiv_metrics::system_metrics() -> &'static SystemMetrics exposes:
- CPU cores (logical)
- Total and available memory bytes
- Hostname, OS, kernel version
- Process ID, uptime seconds
Auth on management endpoints
All endpoints except /healthz require a bearer token in production profiles. Set KRISHIV_COORDINATOR_BEARER_TOKEN (or the file / multi-token variants) before starting the coordinator. The UI also accepts a separate KRISHIV_UI_TOKEN.