Artificial Intelligence

RL-Calibrated Chaos Engineering: A Constrained MDP Approach to Network Resilience Testing

Authors: Sayali Patil

Chaos engineering tests production network resilience by injecting controlled failures; the central open problem is calibration: how much failure injection is sufficient to expose latent resilience defects without degrading quality of service (QoS) experienced by end users? In practice, the inability to systematically calibrate failure injection has limited chaos engineering adoption in production environments, particularly in systems where reliability, cost, and user experience are tightly coupled. As AI-driven infrastructure and autonomous systems proliferate, this problem becomes critical—improper experimentation either misses failure modes or introduces unacceptable operational risk. The chaos-level engine of U.S. Patent No. 12,242,370 B2 (Cisco Technology, Inc., 2025) automates chaos-level derivation from network telemetry and refines it through a linear parameter adjustment loop, but provides no formal optimality guarantee, no mathematically rigorous safety constraint, and no sample-complexity characterization. This paper introduces a principled framework that resolves these limitations by casting chaos-level calibration as a Constrained Markov Decision Process (CMDP) and training a reinforcement-learning (RL) agent to select chaos levels maximizing cumulative resilience-discovery yield per unit of QoS risk, subject to a hard probabilistic constraint on production-disabling events. Three theorems establish the theoretical foundation: Theorem 1 (Safe Action Set Existence) proves a non-empty set of QoS-safe chaos actions always exists, guaranteeing CMDP feasibility; Theorem 2 (Bellman Optimality) establishes the resilience-per-risk reward satisfies the Bellman contraction, guaranteeing a globally optimal deterministic policy exists; Theorem 3 (PAC-Convergence) gives an explicit sample complexity bound O(|S|²|A|εu207b² log |S||A|/δ) for reaching an ε-optimal safe policy with probability 1−δ. A Lagrangian primal-dual policy-gradient algorithm enforces the safety constraint at exact probabilistic semantics without penalty approximation. Empirical evaluation in a 150-node SD-WAN simulation—instantiating the patent’s reference architecture—demonstrates the RL agent discovers 41.3 ± 3.8% more latent resilience defects than the patent’s heuristic baseline, reduces unnecessary production disruptions by 58.7%, and achieves zero hard-constraint violations across 500 evaluation episodes, converging in 34 training episodes versus non-convergence of the heuristic baseline within 200 episodes.

Comments: 12 pages, 9 tables, 3 theorems. IEEE two-column format. Working paper, April 2025.

Download: PDF

Submission history

[v1] 2026-04-04 18:20:54

Unique-IP document downloads: 34 times

ai.Vixra.org is a AI assisted e-print repository rather than a journal. Articles hosted may not yet have been verified by peer-review and should be treated as preliminary. In particular, anything that appears to include financial or legal advice or proposed medical treatments should be treated with due caution. ai.Vixra.org will not be responsible for any consequences of actions that result from any form of use of any documents on this website.

Add your own feedback and questions here:
You are equally welcome to be positive or negative about any paper but please be polite. If you are being critical you must mention at least one specific error, otherwise your comment will be deleted as unhelpful.