LogoLogo
Reliability HubAPI DocsPlatform
  • Welcome to Steadybit
  • Quick Start
    • First Steps
    • Compatibility
    • Install Agent and Extensions
    • Run an Experiment
    • Deploy Example Application
  • Concepts
    • Actions
    • Discovery
    • Query Language
  • Install and Configure
    • Install Agent
      • Architecture
      • Install on Kubernetes
      • Install on Linux Hosts
      • Install using Docker Compose
      • Install on Amazon ECS
      • Extension Registration
      • Using Mutual TLS for Extensions
      • Configuration Options
      • Agent State
      • Agent API
    • Install On-Prem Platform
      • Install on Minikube
      • Advanced Agent Authentication
      • Configuration Options
      • Maintenance & Incident Support
      • Syncing Teams via OIDC Attribute
    • Manage Environments
    • Manage Teams and Users
      • Users
      • Teams
      • Permissions
    • Manage Experiment Templates
  • Use Steadybit
    • Experiments
      • Design
      • Run
      • Run History
      • Schedule
      • Variables
      • Emergency Stop
      • Share
        • Templates
        • Duplicate
        • File
      • OpenTelemetry Integration
    • Explorer
      • Landscape
      • Targets
      • Advice
    • Reporting
  • Integrate with Steadybit
    • Extensions
      • Anatomy of an Extension
      • Extension Installation
      • Extension Kits
      • Available Extensions
    • API
      • Interactive API Documentation
    • CLI
    • Badges
    • Webhooks
      • Custom Webhooks
      • Preflight Webhooks
    • Preflight Actions
    • Slack Notifications
    • Audit Log
    • Hubs
  • Troubleshooting
    • How to troubleshoot
    • Common fixes
      • Extensions
      • Agents
      • On-prem platform
Powered by GitBook

Extension Docs

  • ActionKit
  • DiscoveryKit
  • EventKit

More Resources

  • Reliability Hub
  • API Docs
On this page

Was this helpful?

Edit on GitHub
  1. Install and Configure
  2. Install Agent

Architecture

Last updated 1 year ago

Was this helpful?

The core of the Steadybit architecture is the agent-based approach consisting of

  • a central platform, being the center of control for you

  • one agent per network boundary, deployed somewhere in your system and works as a channel between the platform and multiple extensions

  • extensions, deployed wherever you need them, to discover running infrastructure components (hosts, containers, Kubernetes resources, etc.) and run actions (like, e.g., chaos engineering attacks or checks).

The following diagram shows this in an example of three hosts, one with the agent installed to communicate with the installed extensions and platform.

  • The agent is deployed only once, as everything runs within one network.

  • Also, the AWS and Kubernetes extensions only run once as they work with the respective API.

  • The extensions for host- and container-based chaos engineering need to run per host to reach the underlying resources (e.g., network, CPU, memory)

The Steadybit agent periodically polls its registered extensions for discovery data using HTTP. For each call, only the delta of the discovery data is sent to the platform. If an experiment is to be executed for the agent, the agent connects via a websocket to the platform and receives the actions to be executed. The agent will then control the action execution and calls the respective HTTP endpoints of the extension. If the connection between the platform and agent or agent and extension is interrupted, the agent immediately stops and rollbacks any active action.

The Steadybit platform never connects to the agent or any extensions: all connections are initiated by the agent, regardless of the deployment model.

Check our for an up-to-date list of extensions. Our extensions are Open Source and available on . They are written in Go to ensure the best possible resource usage.

reliability hub
github.com
Steadybit agent and extension architecture