talk-data.com
Meetup
talk
2024-03-06 at 18:30
Scaling Thanos at Reddit for Enhanced Observability
Description
Learn how Reddit uses a custom monitoring operator to manage Thanos and Prometheus to scale their metrics deployment beyond 45 million samples per second and 600 million active series. To achieve this they run thousands of Prometheus instances of varying sizes managed by their internally developed Kubernetes controller. They use Thanos for long-term storage and global single pane of glass querying across this massive deployment. Learn about the operator, other tools they've developed, and the challenges they've faced along the way.