For every experiment execution, Steadybit collects distributed tracing spans using OpenTelemetry across the Steadybit platform and agents. Access to this data benefits users, extension authors and Steadybit maintainers alike. Here are some scenarios as part of which you might access this data:
- Steadybit interests your organization, and you are in the process of building trust in the solution. As part of this, you want to understand what is happening as part of experiments – including the nitty-gritty details.
- You are developing an extension, and something went wrong. You want to know precisely how your extension was called, the parameters, and how it responded.
- To correlate experiment executions with other monitoring and observability data, e.g., in your Jaeger or Zipkin installations.
- Something went wrong, and you need help from Steadybit's support staff to resolve the situation. Attach the distributed tracing data to give them context.
As the following sections show, Steadybit enables the collection of this data automatically for simple use cases. However, you can instruct the Steadybit agents to report this data to your observability pipeline. This document explains both approaches.
This screenshot shows a trace encompassing the Steadybit platform and three Steadybit agents in Jaeger.
Steadybit collects and persists distributed tracing data across its platform and agents without further configuration for every experiment execution. This is the simplest way to get started – and the option relevant to most customers.
You can download the distributed tracing data as multiple OTLP JSON files. The UI explains importing and inspecting this data within the open-source tool Jaeger.
Distributed tracing data for experiments is retained for 28 days within the Steadybit platform.
You can download a zip file containing distributed tracing data through the experiment execution view.
Note: OpenTelemetry data export is currently an experimental capability.
It can be helpful to have Steadybit observability data within your systems. Steadybit agents can be instructed to export distributed tracing data to OpenTelemetry-compatible systems.
This section explains how to configure the Steadybit agents to achieve this. To validate the configuration, the section contains optional guidance on how to set up a local Jaeger instance, Zipkin instance and an OpenTelemetry collector.
The Steadybit agent internally leverages the OpenTelemetry SDK auto-configuration module. Consequently, all of the module's configuration parameters are supported! This section only shows the most basic configuration to achieve data export.
The configuration parameters are set through environment variables, as the following
shellsnippet shows. You may also pass these environment variables when deploying the agent through any other mechanism, e.g., Helm charts.
# enable the auto-configuration mechanism
# Name the service. You most likely wanna keep it as 'steadybit-agent'
# Define where to export the data to.
# The Steadybit agent does not currently expose any metrics through OpenTelemetry.
The following sections explain how to spin up a local Jaeger instance, a Zipkin instance and an OpenTelemetry collector. These steps are optional for a successful configuration of the export mechanism. We list these here for your convenience if you want to check the setup locally.
We start with a configuration for an OpenTelemetry collector. The collector will accept the telemetry data from Steadybit agents, batch it and then forward it to both Jaeger and Zipkin.
Store this in a file called
otel-config.ymlwithin your current working directory.
receivers: [ otlp ]
processors: [ batch ]
exporters: [ otlp, zipkin ]
Next, we start all the systems locally using Docker compose. Note the comments about UI endpoints within the snippet.
Store this in a file called
docker-compose.ymlwithin your current working directory. Then run
docker-compose upto start everything.
Once the startup completes, you can use the following URLs to interact with the systems:
- Jaeger UI: http://localhost:16686/
- Zipkin UI: http://127.0.0.1:9411/
- OpenTelemetry collector OTLP gRPC endpoint: http://127.0.0.1:4317
# UI and ingestion endpoint
# UI endpoint
# OTLP gRPC endpoint
# OTLP HTTP endpoint
command: [ "--config=/etc/otel-collector-config.yml" ]
# OTLP gRPC endpoint the agent will be interacting with