talk-data.com talk-data.com

B

Speaker

Ben Kochie

1

talks

Site Reliability Engineer Reddit

Long-time Prometheus contributor and Site Reliability Engineer at Reddit.

Bio from: Observability Insights with Grafana, HelloFresh and Reddit

Filter by Event / Source

Talks & appearances

1 activities · Newest first

Search activities →

Learn how Reddit uses a custom monitoring operator to manage Thanos and Prometheus to scale their metrics deployment beyond 45 million samples per second and 600 million active series. To achieve this they run thousands of Prometheus instances of varying sizes managed by their internally developed Kubernetes controller. They use Thanos for long-term storage and global single pane of glass querying across this massive deployment. Learn about the operator, other tools they've developed, and the challenges they've faced along the way.