Maintenance & Incident Support
Last updated
Was this helpful?
Last updated
Was this helpful?
During the operation of the on-premise Steadybit platform, you may run into situations that require users to be informed about incidents and planned maintenance that will affect on-premise Steadybit users. To inform these users, you can tell the Steadybit platform about incidents and maintenance windows. Once done, the platform can
show banners in the user interface indicating the specific situation,
require explicit confirmation when executing experiments via the user interface and
optionally disable experiment runs entirely during an ongoing incident.
This feature is available in versions > 1.0.2 of the Steadybit platform.
You can use the following metrics to learn whether maintenance on the Steadybit platform would be affecting any users:
platform.experiments.executing
is exposed by the Steadybit platform as a Prometheus metric. It represents the number of currently executing experiments.
The maintenance and incident API endpoints are only reachable through the admin port 9090
on the platform Kubernetes deployment workload. This port is not exposed to users by default, and you should only use it for administrative purposes. API endpoints on port 9090
do not require authentication.
One option to reach this port from your local machine is a port forward, as the following snippet shows:
To configure a maintenance window, you can use the API endpoints under path /actuator/systemstatusmaintenance
. You can retrieve the maintenance configuration via HTTP GET
or clear it via HTTP DELETE
. The following example shows how you could leverage HTTP POST
to configure a maintenance window.
You can use incidents to inform users of service disruption. In contrast to scheduled maintenance, an incident is never scheduled and has varying severity. To configure an incident, you can use the API endpoints under path /actuator/systemstatusincident
. You can retrieve the incident configuration via HTTP GET
or clear it via HTTP DELETE
. The following example shows how you could leverage HTTP POST
to configure an incident.
The following severities are supported by the API. The user interface will adapt its banner style according to the chosen incident severity.
UNDER_MAINTENANCE
DEGRADED_PERFORMANCE
PARTIAL_OUTAGE
MAJOR_OUTAGE