githubEdit

On-prem platform

circle-info

This part of the documentation is only intended in the context of a supported PoC (Proof of Concept) together with the Steadybit team. Please, book an appointmentarrow-up-right to scope your PoC before continuing to evaluate the on-prem solution.

If you just want to try out Steadybit, we recommend you sign up for our SaaS platformarrow-up-right.

This page describes some common issues and how to solve them.

Platform and Postgres are in CrashLoopBackOff

  • Check the logs of the platform and Postgres containers

    kubectl logs -f -n steadybit-platform steadybit-platform-postgresql-0 --previous
    kubectl logs -f -n steadybit-platform steadybit-platform-0 --previous
  • Verify that the Postgres password is correct and base64 encoded in the manifest file

Create Heap dump

Prerequisites:

  • If the platform is launched under docker, you need to have a dedicated volume or use an existing one for this mount path.

  • A network access to the host machine to retrieve the file.

The platform can suffer from out of memory issues at JVM level. If that's happen, a heap dump might be needed to diagnose further, for providing it, add this environment variable :

- name: JAVA_TOOL_OPTIONS
  value: -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/PATH_TO_BE_MOUNTED/heapdump-%p.hprof

Then you need to retrieve the heat dump. Usually by copying the file from the destination to your machine :

scp [email protected]:/PATH_TO_BE_MOUNTED/heapdump-*.hprof /tmp/

Kubernetes namespaces and deployments show up multiple times in the landscape table

  • Check the logs of the agents

  • If you see this error: Missing permissions to create leases for the leader elections or Cannot perform leader election. All agents will behave as leader.

    • Check if the agent has the correct permissions to create leases.

Agents are not able to connect to the platform during an experiment

  • Check if you can reach the platform from the agent:

  • Check if the agent can reach the Websocket port of the platform.

    • This is usally port 7878 and can be configured in the platform manifest via environment variable STEADYBIT_WEB_PUBLIC_EXPERIMENT_PORT (helm chart: platform.publicWebsocketPort)

    • If setting the port is not enough, you can set the url via environment variable STEADYBIT_WEB_PUBLIC_EXPERIMENT_URL (helm chart: platform.ingressOrigin)

    • Please also check your ingress configuration.

    • You can try to connect to the websocket port via curl:

Platform is behind Nginx and the agents are not able to connect to the platform

Error message in the platform logs:

Solution:

  • set the nginx backend protocol is HTTPS instead of HTTP

Configured the Platform with a oidc provider and the redirect to the platform is been send as http instead of https

Example error message in the browser:

Solution:

Set the environment variable: server.tomcat.remoteip.trusted-proxies to a regex that matches the CIDRs of the loadbalancer or reverse proxy. Add the following environment variable to the platform manifest: (Example for Google Cloud Load Balancer CIRDs regex)

Last updated

Was this helpful?