Gremlin Soundproofs Kubernetes by Helping DevOps Teams Isolate Noisy Neighbors Within a Cluster

55% of Gremlin customers run Chaos Engineering experiments on Kubernetes to mitigate failure and optimize performance


San Jose, CA, Nov. 17, 2020 (GLOBE NEWSWIRE) -- Gremlin, a platform for safely and securely running Chaos Engineering experiments, today announced new features to “soundproof” Kubernetes and help engineers prevent noisy neighbors in a cluster. The idea of sharing resources across machines is not new or unique to Kubernetes; however, given the highly dynamic and ephemeral nature of containers orchestrated by Kubernetes, which can host dozens of apps and hundreds of services across a single cluster, sharing resources and security permissions is an even larger concern.

According to recent Kubernetes Adoption Research, 59% of large organizations use Kubernetes in production, which mirrors the distribution of companies running chaos attacks on the Gremlin platform. Gartner predicts in their CTO’s Guide to Containers and Kubernetes that “by 2025, more than 85% of global organizations will be running containerized applications in production, which is a significant increase from fewer than 35% in 2019.”

Besides being highly flexible and scalable, Kubernetes adoption is driven by resource efficiency: containers have a smaller resource footprint, which enables a much higher tenant density on a host, thereby increasing infrastructure utilization. But that density and utilization of resources adds to the “noisy neighbor” problem, where one scaling or problematic service can impact another on the same node within a cluster. Without doing proactive testing, it's difficult to know how a system handles a noisy neighbor in production, unless there is a spike in demand on a single service, at which point it’s too late and customers already feel the impact.

“Kubernetes is becoming the default way to build and operate applications at many enterprises, but along with the advantage of abstraction comes uncertainty,” said Lorne Kligerman, Sr Director of Product at Gremlin. “We’re providing DevOps teams with better tooling to understand how their Kubernetes applications will behave under various stresses, such as when a neighboring container is spiking with traffic.”

The noisy neighbor problem also introduces security concerns. Performing chaos experiments in multi-tenant environments requires fine-grained controls. Ideally, individuals and teams are limited to the namespaces where they should be performing attacks. Using namespace access control ensures that only team members with correct permissions will have access to specific Kubernetes objects, versus all objects in the cluster. This is crucial to ensuring the Chaos Engineering work an engineer is doing isn’t negatively impacting neighboring services.

Highlights

  • Test individual pod scaling and Kubernetes resource limits to prevent “noisy neighbors” taking down your application
  • Easily target specific Kubernetes objects to test how they handle spikes in usage without impacting the entire application 
  • Securely allow testing Kubernetes in shared cluster environments

Running targeted experiments on Kubernetes infrastructure via Gremlin’s intuitive user-interface helps SRE and DevOps teams simulate real-world failures that are unpredictable, difficult to replicate, and cause downtime if they happen in production. Engineers can specify exactly which Kubernetes objects they’d like to test, and simulate CPU spikes or servers shutting down, without affecting the entire cluster and ultimately giving them more confidence in the resiliency of their environments.

“Gremlin makes Chaos Engineering easy and seamless,” said Chaitanya Krant, Engineering Manager at National Australia Bank. “For us, it’s cut down the amount of time involved in designing and executing the chaos experiments, particularly for our Microservices and Kubernetes.”

About Gremlin
Gremlin is the world’s first hosted Chaos Engineering service with a mission to help build a more reliable internet. It turns failure into resilience by offering engineers tooling to safely experiment on complex systems, in order to identify weaknesses before they impact customers and cause revenue loss. Investors include Amplify Partners, Index Ventures, and Redpoint VC. Key customers include GrubHub, HEB, JPMorgan, Mailchimp, Target, Twilio, Under Armour, and Walmart.

 

Contact Data