Gremlin Announces ‘Status Checks’ To Make Chaos Engineering Safe in Production

Gremlin’s new ‘Status Checks’ capability automatically verifies that your systems are healthy and ready for Chaos Engineering experiments, improving testing safety and effectiveness.


San Jose, CA, June 23, 2020 (GLOBE NEWSWIRE) -- Gremlin, the Chaos Engineering company founded by former Netflix and Amazon engineers, today announced the general availability of Status Checks. The new capability evaluates the health and stability of a system before attacking it, making it safer to run experiments in production. While Chaos Engineering has been widely adopted and evangelized among the world’s top performing IT organizations, safety is still a big concern for companies who have yet to fully embrace the practice.

“Safety is a core product principle at Gremlin and has been built into the product since the beginning,” said Matthew Fornaciari, CTO and Co-Founder of Gremlin. “Since launch in 2017, we’ve had a big red HALT button that makes it simple for Gremlin users to reactively rollback experiments, should an attack negatively impact the customer experience. Today, companies that have matured are automating more of their experiments with CI/CD, and they need a way to programmatically check the health of their systems and proactively stop an experiment. That’s Status Checks.”

Traditionally, companies have addressed their concerns around safety by only running experiments in staging environments, then applying key learnings to problems in production. This approach has limited value, as staging environments do not accurately mirror the production environments where customers live. 

Status Checks work by integrating with 3rd party tools like PagerDuty, Datadog, New Relic or any other monitoring tool or custom URL. For example, if PagerDuty reports an active incident, the Status Check will prevent the chaos attack from running on the system already under stress. Integrating with monitoring tools is also a quick way to gauge the health of the system and to determine whether a planned attack should be executed or halted. By promoting experiments that are run on ready infrastructure, the learnings are more accurate and the health of the system is preserved.

"Combining Observability and Chaos Engineering can drastically shorten the time it takes to identify and resolve failure scenarios," said Nik Jain, Solution Architect at New Relic. "Status Checks will provide New Relic users with an extra sense of safety when running experiments in production."

“Many organizations approach the concept of Chaos Engineering with the attitude that the practice is far too risky to execute into production,” said Jim Scheibmeir, Senior Principal Analyst, Gartner. “The reality is that avoiding Chaos Engineering is equivalent to embracing crisis engineering.”

About Gremlin
Gremlin is the world’s first hosted Chaos Engineering service with a mission to help build a more reliable internet. It turns failure into resilience by offering engineers a fully hosted solution to safely experiment on complex systems, in order to identify weaknesses before they impact customers and cause revenue loss. Founded by CEO Kolton Andrus and CTO Matthew Fornaciari in 2016, the company has since raised $26.8Million in funding from Amplify Partners, Index Ventures, and Redpoint Ventures. Existing customers include Charter Communications, Expedia, Mailchimp, Qualtrics, Target, Twilio, Under Armour, and Walmart.

 

Contact Data