On-prem platform
This part of the documentation is only intended in the context of a supported PoC (Proof of Concept) together with the Steadybit team. Please, book an appointment to scope your PoC before continuing to evaluate the on-prem solution.
If you just want to try out Steadybit, we recommend you sign up for our SaaS platform.
This page describes some common issues and how to solve them.
Platform and Postgres are in CrashLoopBackOff
Check the logs of the platform and Postgres containers
Verify that the Postgres password is correct and base64 encoded in the manifest file
Create Heap dump
Prerequisites:
If the platform is launched under docker, you need to have a dedicated volume or use an existing one for this mount path.
A network access to the host machine to retrieve the file.
The platform can suffer from out of memory issues at JVM level. If that's happen, a heap dump might be needed to diagnose further, for providing it, add this environment variable :
Then you need to retrieve the heat dump. Usually by copying the file from the destination to your machine :
Kubernetes namespaces and deployments show up multiple times in the landscape table
Check the logs of the agents
If you see this error:
Missing permissions to create leases for the leader elections
orCannot perform leader election. All agents will behave as leader.
Check if the agent has the correct permissions to create leases.
Agents are not able to connect to the platform during an experiment
Check if you can reach the platform from the agent:
Check if the agent can reach the Websocket port of the platform.
This is usally port 7878 and can be configured in the platform manifest via environment variable
STEADYBIT_WEB_PUBLIC_EXPERIMENT_PORT
(helm chart:platform.publicWebsocketPort
)If setting the port is not enough, you can set the url via environment variable
STEADYBIT_WEB_PUBLIC_EXPERIMENT_URL
(helm chart:platform.ingressOrigin
)Please also check your ingress configuration.
You can try to connect to the websocket port via curl:
Platform is behind Nginx and the agents are not able to connect to the platform
Error message in the platform logs:
Solution:
set the nginx backend protocol is HTTPS instead of HTTP
Configured the Platform with a oidc provider and the redirect to the platform is been send as http instead of https
Example error message in the browser:
Solution:
Set the environment variable: server.tomcat.remoteip.trusted-proxies to a regex that matches the CIDRs of the loadbalancer or reverse proxy. Add the following environment variable to the platform manifest: (Example for Google Cloud Load Balancer CIRDs regex)
Last updated