OpenTelemetry Integration
Last updated
Last updated
For every experiment run, Steadybit collects distributed tracing spans using OpenTelemetry across the Steadybit platform and agents. Access to this data benefits users, extension authors and Steadybit maintainers alike. Here are some scenarios as part of which you might access this data:
Steadybit interests your organization, and you are in the process of building trust in the solution. As part of this, you want to understand what is happening as part of experiments – including the nitty-gritty details.
You are developing an extension, and something went wrong. You want to know precisely how your extension was called, the parameters, and how it responded.
To correlate experiment runs with other monitoring and observability data, e.g., in your Jaeger or Zipkin installations.
Something went wrong, and you need help from Steadybit's support staff to resolve the situation. Attach the distributed tracing data to give them context.
As the following sections show, Steadybit enables the collection of this data automatically for simple use cases. However, you can instruct the Steadybit agents to report this data to your observability pipeline. This document explains both approaches.
Steadybit collects and persists distributed tracing data across its platform and agents without further configuration for every experiment run. This is the simplest way to get started – and the option relevant to most customers.
You can download the distributed tracing data as multiple OTLP JSON files. The UI explains importing and inspecting this data within the open-source tool Jaeger.
Distributed tracing data for experiments is retained for 28 days within the Steadybit platform.
Note: OpenTelemetry data export is currently an experimental capability.
It can be helpful to have Steadybit observability data within your systems. Steadybit agents can be instructed to export distributed tracing data to OpenTelemetry-compatible systems.
This section explains how to configure the Steadybit agents to achieve this. To validate the configuration, the section contains optional guidance on how to set up a local Jaeger instance, Zipkin instance and an OpenTelemetry collector.
The Steadybit agent internally leverages the OpenTelemetry SDK auto-configuration module. Consequently, all of the module's configuration parameters are supported! This section only shows the most basic configuration to achieve data export.
The configuration parameters are set through environment variables, as the following shell
snippet shows. You may also pass these environment variables when deploying the agent through any other mechanism, e.g., Helm charts.
The following sections explain how to spin up a local Jaeger instance, a Zipkin instance and an OpenTelemetry collector. These steps are optional for a successful configuration of the export mechanism. We list these here for your convenience if you want to check the setup locally.
We start with a configuration for an OpenTelemetry collector. The collector will accept the telemetry data from Steadybit agents, batch it and then forward it to both Jaeger and Zipkin.
Store this in a file called otel-config.yml
within your current working directory.
Next, we start all the systems locally using Docker compose. Note the comments about UI endpoints within the snippet.
Store this in a file called docker-compose.yml
within your current working directory. Then run docker-compose up
to start everything.
Once the startup completes, you can use the following URLs to interact with the systems:
Jaeger UI: http://localhost:16686/
Zipkin UI: http://127.0.0.1:9411/
OpenTelemetry collector OTLP gRPC endpoint: http://127.0.0.1:4317