The open source toolbox
for resilient operations

Rebound is a set of tools, ranging from a network injection fault proxy to a reliability automation platform, so people can rely on you being there when they need.

Get Started |

curl -sSL https://rebound.how/install/full.sh | bash

Copied!

What is Rebound?

Deliver products people can rely on

System Agnostic

Rebound's toolbox is not bound to a particular system. Whether it's Cloud, On-Premise or "It's complicated", Rebound follows.
Metrics Driven

With knowledge comes wisdom. Rebound has a mission to turn resilience from a perceived cost to a must-have strategic asset.
Scales With You

Grow at your own pace. Rebound adapts to your journey—from early development to enterprise operations.
No Lock-In

Naturally Open-Source. Invest safely in the future by owning the stack.
Engineers First

Useful not painful. Rebound encourages resilience thinking at all stages without being obnoxious.

Products

Lightning-Fast

A Rust-powered CLI that simulates network disruptions right on your local machine.
Real-World Scenarios

Easily configure tests via YAML to mimic latencies, bandwidth constraints, packet loss, and HTTP errors.
CI Integration

Automatically run these tests within your pipeline and receive detailed, actionable reports.

$ fault scenario run --scenario scenario.yaml

================ Running Scenarios ================

⡏  4/4  Latency Increase By 30ms Steps From Downstream  ▮▮▮▮
⡏  1/1  Within Allowed Latency While Bandwidth At 5 bytes/second  ▮
⡏  1/1  Circuit Breaker Takes Care of 404  ▮
⡏  1/1  Packet loss has no impact on service performance  ▮

===================== Summary =====================

Tests run: 7, Tests failed: 2
Total time: 2.4s

Report saved as report.json

$ fault scenario run --scenario scenario.yaml

================ Running Scenarios ================

⡏  4/4  Latency Increase By 30ms Steps From Downstream  ▮▮▮▮
⡏  1/1  Within Allowed Latency While Bandwidth At 5 bytes/second  ▮
⡏  1/1  Circuit Breaker Takes Care of 404  ▮
⡏  1/1  Packet loss has no impact on service performance  ▮

===================== Summary =====================

Tests run: 7, Tests failed: 2
Total time: 2.4s

Report saved as report.json

Read the docs

Executable Policies

Convert your resilience objectives into clear, measurable tests.
Clear Feedback

Gain actionable insights that let you address vulnerabilities before they escalate.
Fully Customizable

Cover all your systems, whether it's legacy, Cloud, On-Premise or "It's complicated", Rebound follows you.

$ chaos run policy.json
[2025-03-04 15:23:27 INFO] Validating the experiment's syntax
[2025-03-04 15:23:28 INFO] Experiment looks valid
[2025-03-04 15:23:28 INFO] Running experiment: Loss of pod capacity does not impact latency
[2025-03-04 15:23:28 INFO] Steady-state strategy: after-method-only
[2025-03-04 15:23:28 INFO] Rollbacks strategy: default 
[2025-03-04 15:23:28 INFO] Playing your experiment's method now...
[2025-03-04 15:23:28 INFO] Action: Run 5qps to home page for 60s [in background]
[2025-03-04 15:23:28 INFO] Action: Delete application pod
[2025-03-04 15:23:28 INFO] Action: Let it self-heal
[2025-03-04 15:23:28 INFO] Pausing activity for 45s or until the execution is resumed
[2025-03-04 15:24:13 INFO] Resuming execution...
[2025-03-04 15:24:28 INFO] Steady state hypothesis: Steady-State Hypothesis
[2025-03-04 15:24:28 INFO] Probe: Capacity is back to normal
[2025-03-04 15:24:28 INFO] Probe: Latency was not impacted
[2025-03-04 15:24:28 CRITICAL] Steady state probe 'Latency was not impacted' is not in the given tolerance
[2025-03-04 15:24:28 INFO] Let's rollback...
[2025-03-04 15:24:28 INFO] No declared rollbacks, let's move on.
[2025-03-04 15:24:28 INFO] Experiment ended with status: deviated
[2025-03-04 15:24:28 INFO] The steady-state has deviated, a weakness may have been discovered

$ chaos run policy.json
[2025-03-04 15:23:27 INFO] Validating the experiment's syntax
[2025-03-04 15:23:28 INFO] Experiment looks valid
[2025-03-04 15:23:28 INFO] Running experiment: Loss of pod capacity does not impact latency
[2025-03-04 15:23:28 INFO] Steady-state strategy: after-method-only
[2025-03-04 15:23:28 INFO] Rollbacks strategy: default 
[2025-03-04 15:23:28 INFO] Playing your experiment's method now...
[2025-03-04 15:23:28 INFO] Action: Run 5qps to home page for 60s [in background]
[2025-03-04 15:23:28 INFO] Action: Delete application pod
[2025-03-04 15:23:28 INFO] Action: Let it self-heal
[2025-03-04 15:23:28 INFO] Pausing activity for 45s or until the execution is resumed
[2025-03-04 15:24:13 INFO] Resuming execution...
[2025-03-04 15:24:28 INFO] Steady state hypothesis: Steady-State Hypothesis
[2025-03-04 15:24:28 INFO] Probe: Capacity is back to normal
[2025-03-04 15:24:28 INFO] Probe: Latency was not impacted
[2025-03-04 15:24:28 CRITICAL] Steady state probe 'Latency was not impacted' is not in the given tolerance
[2025-03-04 15:24:28 INFO] Let's rollback...
[2025-03-04 15:24:28 INFO] No declared rollbacks, let's move on.
[2025-03-04 15:24:28 INFO] Experiment ended with status: deviated
[2025-03-04 15:24:28 INFO] The steady-state has deviated, a weakness may have been discovered

Read the docs

Centralized Management

Coordinate and monitor resilience tests across your entire infrastructure from a single platform.
Unified Visibility

Automatically map system health, track key resilience metrics, and identify improvement areas.
AI-Powered Policies

Leverage GenAI to generate targeted tests and policies that evolve as your system does.

A screenshot from the Reliably application. It shows an expriment with a summary of the last execution results and a breakdown of the different actions the experiment is made of.

Read the docs

Why Rebound?

Features

Network Fault Injector

Gain understanding of how your services react to poor network conditions.

Hello
Rapid Testing Scenarios

Turn Reliability into non-regression scenarios from your CI build.

Hello
Reliability As Code

Declare reliability tests as code stored like any other assets for review and visibility.

Hello
Comprehensive Systems Coverage

Ensure all parts of your systems can be verified and kept in check, including your legacy systems.

Hello
Extensible By Design

If something is amiss, easily extend for your own unique requirements.

Hello
Safety Included

Remain in control and rapidly terminate tests when your system is under fire.

Hello
Teams Communication

Let teams know when a reliability test is underway, via Slack, email or a webhook.

Hello
Reporting

Build reports that can be easily shared throughout your organization.

Hello
Orchestration

Orchestrate the fleet of reliability tests and policies across your system from a unified platform.

Hello
Periodic Scheduling

Run tests once or periodically automatically.

Hello
Open Telemetry Support

Make your tests fully part of your operations via Open Telemetry traces and metrics.

Hello
GenAI Assistant

Use the gentle power of GenAI to help you create reliability tests.

Hello
Effort Scoring

Keep an eye on engineering effort with a simple scoring system.

Hello
Template

Create tests and policies template to let teams parameterized them in their own context.

Hello
Runs anywhere

Rebound products can run on any system and are simple to deploy locally, as containers, in a Kubernets cluster or from a serverless ecosystem.

Hello

Get started

The open source toolbox for resilient operations

Lightning-Fast

Real-World Scenarios

CI Integration

Executable Policies

Clear Feedback

Fully Customizable

Centralized Management

Unified Visibility

AI-Powered Policies

Network Fault Injector

Rapid Testing Scenarios

Reliability As Code

Comprehensive Systems Coverage

Extensible By Design

Safety Included

Teams Communication

Reporting

Orchestration

Periodic Scheduling

Open Telemetry Support

GenAI Assistant

Effort Scoring

Template

Runs anywhere

The open source toolbox
for resilient operations