This talk examines the October 2024 Zalando outage, which resulted in severe revenue impact. The incident was triggered by an automated security scan that uncovered an unprotected GraphQL endpoint. This led to an unexpected amplification of requests, causing computationally expensive queries against the service in front of the search indices and ultimately overloading the Elasticsearch clusters. The prolonged impact highlighted difficulties in pinpointing the root cause of high load in Elasticsearch, and proved that even though a perfect storm is by definition a rare occurrence, it still should never be discarded. Sometimes, when you hear the hoofbeats, they are zebras after all.
talk-data.com
M
Speaker
Maryna Kryvko
1
talks
Senior Software Engineer
Zalando
Senior Software Engineer at Zalando.
Bio from: Elastic Filebeat and Agent & Zalando and their Elasticsearch Battle History
Filter by Event / Source
Talks & appearances
1 activities · Newest first