LogoLogo
Reliability HubAPI DocsPlatform
  • Welcome to Steadybit
  • Quick Start
    • First Steps
    • Compatibility
    • Install Agent and Extensions
    • Run an Experiment
    • Deploy Example Application
  • Concepts
    • Actions
    • Discovery
    • Query Language
  • Install and Configure
    • Install Agent
      • Architecture
      • Install on Kubernetes
      • Install on Linux Hosts
      • Install using Docker Compose
      • Install on Amazon ECS
      • Extension Registration
      • Using Mutual TLS for Extensions
      • Configuration Options
      • Agent State
      • Agent API
    • Install On-Prem Platform
      • Install on Minikube
      • Advanced Agent Authentication
      • Configuration Options
      • Maintenance & Incident Support
      • Syncing Teams via OIDC Attribute
    • Manage Environments
    • Manage Teams and Users
      • Users
      • Teams
      • Permissions
    • Manage Experiment Templates
  • Use Steadybit
    • Experiments
      • Design
      • Run
      • Run History
      • Schedule
      • Variables
      • Emergency Stop
      • Share
        • Templates
        • Duplicate
        • File
      • OpenTelemetry Integration
    • Explorer
      • Landscape
      • Targets
      • Advice
    • Reporting
  • Integrate with Steadybit
    • Extensions
      • Anatomy of an Extension
      • Extension Installation
      • Extension Kits
      • Available Extensions
    • API
      • Interactive API Documentation
    • CLI
    • Badges
    • Webhooks
      • Custom Webhooks
      • Preflight Webhooks
    • Preflight Actions
    • Slack Notifications
    • Audit Log
    • Hubs
  • Troubleshooting
    • How to troubleshoot
    • Common fixes
      • Extensions
      • Agents
      • On-prem platform
Powered by GitBook

Extension Docs

  • ActionKit
  • DiscoveryKit
  • EventKit

More Resources

  • Reliability Hub
  • API Docs
On this page
  • Prerequisite
  • Step 1 - Define your Scenario
  • Step 2 - Define experiment
  • (1) Environment
  • (2) Cluster Name, (3) Namespace, and (4) Deployment
  • (5) HTTP Upstream Endpoint, and (6) Success Rate
  • (7) Recovery Time
  • Step 3 - Experiment Design
  • Step 4 - Run experiment
  • Conclusion
  • What are the next steps?

Was this helpful?

Edit on GitHub
  1. Quick Start

Run an Experiment

Last updated 11 months ago

Was this helpful?

We will now use Steadybit to design and run our first experiment in just 4 simple steps. This guide focuses on our example shopping demo application, which you can . However, you can also adapt the steps to your application.

Prerequisite

  • You have already signed up for an account

  • You can log in to the

  • You have already

Step 1 - Define your Scenario

The first step is to think about the scenario you want to test. Good inspiration for scenarios is, e.g., a past incident, dependencies of your architecture, or common pitfalls.

In our case, we want to identify how our shopping demo behaves when one of the product backend services is unavailable. We can simply use one of the existing templates to create this experiment step-by-step.

In the Steadybit platform, go to 'Experiments' -> 'New Experiment' -> 'From Template' and search for a template with the tag Shopping Demo Quick Start.

You can see the overall experiment structure in the template details or continue this tutorial by choosing 'Use This Template'.

Step 2 - Define experiment

Now, you need to define step-by-step template-specific experiment details to validate the scenario.

(1) Environment

(2) Cluster Name, (3) Namespace, and (4) Deployment

Based on the selected template, the experiment needs to know the name of your Kubernetes cluster (minikube), the Kubernetes namespace (steadybit-demo), and the deployment (hot-deals). Just choose the values from the drop-down.

(5) HTTP Upstream Endpoint, and (6) Success Rate

The next step is to define the upstream HTTP endpoint dependent on our downstream deployment hot deals. This is the /products endpoint of gateway, which crawls all products from hot-deals, toys-bestseller, and fashion-bestseller. The resulting products are shown on our shop's landing page. In Steadybit, we can also use Kubernetes-internal URLs, like http://gateway.steadybit-demo.svc.cluster.local/products.

In the next step, we specify the expected HTTP success rate. Since the shop's landing page uses this endpoint, we aim for 100% successful responses.

(7) Recovery Time

The last step is to define the recovery time. How long do we expect it to take until all pods of hot-deals start up again and are ready to serve traffic? In our example, we expect 60 seconds to be sufficient.

Finalize the template's wizard by clicking 'Create Experiment'.

Step 3 - Experiment Design

Woohoo! There is our first experiment design! 🎉 You're now in the timeline-based experiment editor that you can always use to design an experiment from scratch via drag-and-drop.

Our experiment was already designed by using the template, so we can save and run it immediately to learn whether the shop survives an outage of the downstream deployment hot-deals.

Step 4 - Run experiment

When hitting the 'Run Experiment'-button you see the Steadybit run view. As soon as the agent connected, the experiment starts to validate the HTTP endpoint /products, and the amount of ready pods for the deployment hot-deals. When isolating the deployment's containers, we start noticing faults in the HTTP responses as hot-deals' products can't be requested anymore. This is undesirable as there are other products of fashion-bestseller and toys-bestseller which could have been shown at the shop's landing page. You can improve this behavior by adding appropriate fallbacks or scaling the services.

Continuing to run the experiment, we see that eventually, Kubernetes restarts the deployment's pods, resulting in missing pods. Even so, after the attack, all pods become ready within the expected 60 seconds. However, the overall experiment run failed, as the HTTP success rate of 100% was not achieved.

Conclusion

You have now successfully run an experiment with Steadybit in a Kubernetes environment. You have discovered the impact of an unavailable downstream service on the upstream service.

What are the next steps?

If you miss this experiment template in your Steadybit platform, you can and import it via 'Experiments' -> 'New Experiment' -> 'Upload'. Continue to . In the future, the template will be available also in your tenant.

The first option is always to select the environment where you want to run the experiment. The environment limits the set of attackable targets and thus prevents you from affecting the wrong stage or interfering with other teams. Check out later. If you haven't set up environments yet, you can continue with the Global environment, which contains everything Steadybit discovered.

Check how the shop's behavior differs when using . Alternatively, explore your next experiment using the and get to learn how to improve your system's reliability.

Eventually, before rolling it out to more users, make sure to and to benefit from Steadybit's safety in Chaos Engineering rollouts.

Manage Environments
different implementation of the products endpoint
explorer landscape
advice
set up proper environments
create teams
download the resulting experiment here
Step 3 - Experiment Design
deploy by follow this guide
on our website
Steadybit SaaS platform
installed an agent and extensions
Run Experiment - Step 1: Use Template
Run Experiment - Template Wizard Step 1: Environment
Run Experiment - Template Wizard Step 2: Kubernetes Cluster Name
Run Experiment - Template Wizard Step 5: HTTP Upstream Endpoint
Run Experiment - Experiment Design
Run Experiment - Experiment Run View
Run Experiment - Experiment Run View