Learn how Reddit uses a custom monitoring operator to manage Thanos and Prometheus to scale their metrics deployment beyond 45 million samples per second and 600 million active series. To achieve this they run thousands of Prometheus instances of varying sizes managed by their internally developed Kubernetes controller. They use Thanos for long-term storage and global single pane of glass querying across this massive deployment. Learn about the operator, other tools they've developed, and the challenges they've faced along the way.
talk-data.com
B
Speaker
Ben Kochie
1
talks
Site Reliability Engineer
Reddit
Long-time Prometheus contributor and Site Reliability Engineer at Reddit.
Bio from: Observability Insights with Grafana, HelloFresh and Reddit
Filter by Event / Source
Talks & appearances
1 activities · Newest first