LogoLogo
Reliability HubAPI DocsPlatform
  • Welcome to Steadybit
  • Quick Start
    • First Steps
    • Compatibility
    • Install Agent and Extensions
    • Run an Experiment
    • Deploy Example Application
  • Concepts
    • Actions
    • Discovery
    • Query Language
  • Install and Configure
    • Install Agent
      • Architecture
      • Install on Kubernetes
      • Install on Linux Hosts
      • Install using Docker Compose
      • Install on Amazon ECS
      • Extension Registration
      • Using Mutual TLS for Extensions
      • Configuration Options
      • Agent State
      • Agent API
    • Install On-Prem Platform
      • Install on Minikube
      • Advanced Agent Authentication
      • Configuration Options
      • Maintenance & Incident Support
      • Syncing Teams via OIDC Attribute
    • Manage Environments
    • Manage Teams and Users
      • Users
      • Teams
      • Permissions
    • Manage Experiment Templates
  • Use Steadybit
    • Experiments
      • Design
      • Run
      • Run History
      • Schedule
      • Variables
      • Emergency Stop
      • Share
        • Templates
        • Duplicate
        • File
      • OpenTelemetry Integration
    • Explorer
      • Landscape
      • Targets
      • Advice
    • Reporting
  • Integrate with Steadybit
    • Extensions
      • Anatomy of an Extension
      • Extension Installation
      • Extension Kits
      • Available Extensions
    • API
      • Interactive API Documentation
    • CLI
    • Badges
    • Webhooks
      • Custom Webhooks
      • Preflight Webhooks
    • Preflight Actions
    • Slack Notifications
    • Audit Log
    • Hubs
  • Troubleshooting
    • How to troubleshoot
    • Common fixes
      • Extensions
      • Agents
      • On-prem platform
Powered by GitBook

Extension Docs

  • ActionKit
  • DiscoveryKit
  • EventKit

More Resources

  • Reliability Hub
  • API Docs
On this page
  • Basic Elements
  • Create New Experiment
  • From Scratch
  • From Template
  • Via File Upload
  • Supported Actions

Was this helpful?

Edit on GitHub
  1. Use Steadybit
  2. Experiments

Design

Last updated 1 month ago

Was this helpful?

To start and eventually run a Chaos Engineering experiment, you first need to design it. That includes defining, e.g., where to run it (environment), what actions to perform (attacks, checks, load tests), and what should be attacked (targets).

Basic Elements

An experiment generally consists of the following elements:

  • Name: Giving your experiment a meaningful name makes it easier to find it again. You can change it at any time.

  • Team: Each experiment is associated with a team. If you are a member of multiple teams, you can see and change your current team in the left-hand navigation sidebar. Once created, you can't change the experiment's team.

  • Environment: An experiment always runs in one specific environment of your system landscape. This environment spans a set of targets you want to address in an experiment, such as containers and JVM applications in a Development stage. How the is up to your admin.

  • Variables: A variable allows you to reference a value in your experiment's configuration to have a single source of truth. Steadybit supports different scopes (e.g., experiment variables or environment variables). Learn more in the .

  • Hypothesis: The hypothesis should answer the question of the expected outcome. In addition, you can describe the steady state, the turbulent condition, and the expected behavior. See this example of a hypothesis: 'When requests to the recommendation service exceed 1000ms, the catalog responds using an empty recommendation list.'

  • Tags: Besides the hypothesis, tags can be applied to an experiment as useful metadata. Tags are not only listed in the experiments overview, but also you can use them to search for specific experiments. In the experiment list, the top 10 used tags are listed right below the searchbar (where you find the full list of used tags), so you have a quick access.

  • Actions: Performed when running the experiment, sequentially or in parallel. An action can be either

    • an attack to inject a fault into your system

    • a check to validate your expectation in your system or observability tooling

    • a load test to inject load into your system and validate functionality.

The team influences the experiment in the following aspects:

  1. Short Handle: Each experiment gets a unique key with a team prefix (e.g., SRE-23), which is used, e.g., to trigger an experiment via the .

  2. Run/Edit Permissions: Only team members are allowed to edit or run an experiment.

  3. Environment and Action Permissions: A team's permission determines which environments (a set of targets or infrastructure components you can attack) and actions are eligible for an experiment.

Create New Experiment

In general, you have three possibilities to get started:

  • From scratch: in case you have already an experiment in mind

  • From template: benefit from existing templates and apply them easily in a wizard-style to your context

  • Upload a YAML- or JSON-based file containing an exported experiment

From Scratch

When creating an experiment from scratch, you land directly into our timeline-based experiment editor. If your team can access only one environment, it is selected automatically. Otherwise, you have to define which environment you want to experiment in. In addition, you can specify the experiment's name and hypothesis at any point you like.

After that, you can add all actions (attacks, checks, and load tests) to your experiment by dragging and dropping. When you place actions in the same horizontal lane, they will be performed sequentially, putting them below each other results in a parallel run. Choose actions with the desired effect to perform your overall experiment.

Once you've dropped the action or selected it later, you can define its configuration on the right-hand side. For example, you can specify an attack's target selection and blast radius.

Target Selection

Limiting Targets via Blast Radius

Furthermore, you can limit your targets randomly to only attack a subset of your target selection and avoid attacking all at once. You can specify the target limitation as a percentage or a fixed number.

Finally, you can also rename each step to reveal the intention by clicking on the step's label in the sidebar.

From Template

A second option for creating a new experiment is using an existing template. This approach is best when you want to learn from others' experiences and apply them to your context via a step-by-step wizard. You can simply browse all available templates or use our search feature to find one using the template's tags, targets or actions used in the experiment, or free text search.

Once you've decided to use an experiment template, you are guided step-by-step through applying it to your context. The first step asks you to select the environment where you eventually want to run your experiment. If the experiment template references any environment variable, they are also listed. You only need to specify a value for new environment variables.

The subsequent steps depend on the selected experiment template and will guide you step-by-step to a ready-to-be-executed experiment. Each step consists of a specific question for a value (i.e., a Kubernetes deployment) to adapt the template's experiment to your context.

Via File Upload

The third option for creating a new experiment is to upload a YAML- or JSON-based experiment file. Once you've uploaded the file, the experiment is created in the defined team. Please note that when the experiment file references an explicit team and environment (e.g., team: "ADM" or environment: "Online shop"), they have to exist. If you want to be flexible, you can use the variable {{teamKey}} to apply it to the current team and {{environmentName}} to apply it to the team's first environment.

Supported Actions

You can now select your desired targets of your environment via a target query referencing . Since these attribute values are always discovered live, some can change from one moment to the next. So, it is wise to choose stable attributes. Good examples are labels, namespaces, or symbolic names - whereas a unique identifier of targets (like the container id) is usually a bad idea. When you run the experiment, these attributes are resolved into a concrete set of targets under attack. You can preview matching targets using the 'show targets' button next to the query.

Continue with these steps until you've designed your experiment. A reasonable experiment could easily look like the one below. Once you have saved it, you are ready to to learn how your system behaves.

Eventually, you end up in the experiment editor, where you can adjust the experiment or to learn how your system behaves.

Check out our to learn about actions you can leverage with Steadybit or learn more about the .

environments are configured and assigned to your team
variables section
API
discovered attributes
Reliability Hub
concept of an action
run it
run it
Approaches to create an experiment
Create Experiment - Drag'n Drop Editor
Create Experiment - Target Selection in the Editor
Create Experiment - Limiting Targets in the Editor
Create Experiment - Rename Steps in the Editor
Create Experiment - Example in the Editor
Create Experiment - Browse and Search Templates
Create Experiment - Use Template: Environment
Create Experiment - Use Template: Further Steps
Create Experiment - Use Template: Further Steps