Run an Experiment
Last updated
Last updated
We will now use Steadybit to design and run our first experiment in just 4 simple steps. This guide focuses on our example shopping demo application, which you can deploy by follow this guide. However, you can also adapt the steps to your application.
You have already signed up for an account on our website
You can log in to the Steadybit SaaS platform
You have already installed an agent and extensions
The first step is to think about the scenario you want to test. Good inspiration for scenarios is, e.g., a past incident, dependencies of your architecture, or common pitfalls.
In our case, we want to identify how our shopping demo behaves when one of the product backend services is unavailable. We can simply use one of the existing templates to create this experiment step-by-step.
In the Steadybit platform, go to 'Experiments' -> 'New Experiment' -> 'From Template' and search for a template with the tag Shopping Demo Quick Start
.
You can see the overall experiment structure in the template details or continue this tutorial by choosing 'Use This Template'.
If you miss this experiment template in your Steadybit platform, you can download the resulting experiment here and import it via 'Experiments' -> 'New Experiment' -> 'Upload'. Continue to Step 3 - Experiment Design. In the future, the template will be available also in your tenant.
Now, you need to define step-by-step template-specific experiment details to validate the scenario.
The first option is always to select the environment where you want to run the experiment. The environment limits the set of attackable targets and thus prevents you from affecting the wrong stage or interfering with other teams. Check out Manage Environments later. If you haven't set up environments yet, you can continue with the Global
environment, which contains everything Steadybit discovered.
Based on the selected template, the experiment needs to know the name of your Kubernetes cluster (minikube
), the Kubernetes namespace (steadybit-demo
), and the deployment (hot-deals
). Just choose the values from the drop-down.
The next step is to define the upstream HTTP endpoint dependent on our downstream deployment hot deals
. This is the /products
endpoint of gateway
, which crawls all products from hot-deals
, toys-bestseller
, and fashion-bestseller
. The resulting products are shown on our shop's landing page. In Steadybit, we can also use Kubernetes-internal URLs, like http://gateway.steadybit-demo.svc.cluster.local/products
.
In the next step, we specify the expected HTTP success rate. Since the shop's landing page uses this endpoint, we aim for 100%
successful responses.
The last step is to define the recovery time. How long do we expect it to take until all pods of hot-deals
start up again and are ready to serve traffic? In our example, we expect 60 seconds to be sufficient.
Finalize the template's wizard by clicking 'Create Experiment'.
Woohoo! There is our first experiment design! 🎉 You're now in the timeline-based experiment editor that you can always use to design an experiment from scratch via drag-and-drop.
Our experiment was already designed by using the template, so we can save and run it immediately to learn whether the shop survives an outage of the downstream deployment hot-deals
.
When hitting the 'Run Experiment'-button you see the Steadybit run view. As soon as the agent connected, the experiment starts to validate the HTTP endpoint /products
, and the amount of ready pods for the deployment hot-deals
. When isolating the deployment's containers, we start noticing faults in the HTTP responses as hot-deals
' products can't be requested anymore. This is undesirable as there are other products of fashion-bestseller
and toys-bestseller
which could have been shown at the shop's landing page. You can improve this behavior by adding appropriate fallbacks or scaling the services.
Continuing to run the experiment, we see that eventually, Kubernetes restarts the deployment's pods, resulting in missing pods. Even so, after the attack, all pods become ready within the expected 60 seconds. However, the overall experiment run failed, as the HTTP success rate of 100%
was not achieved.
You have now successfully run an experiment with Steadybit in a Kubernetes environment. You have discovered the impact of an unavailable downstream service on the upstream service.
Check how the shop's behavior differs when using different implementation of the products endpoint. Alternatively, explore your next experiment using the explorer landscape and get advice to learn how to improve your system's reliability.
Eventually, before rolling it out to more users, make sure to set up proper environments and create teams to benefit from Steadybit's safety in Chaos Engineering rollouts.