talk-data.com talk-data.com

Filter by Source

Select conferences and events

People (1 result)

Showing 6 results

Activities & events

Title & Speakers Event
Daniel Paulus – VP of Engineering @ Checkly

This talk is about using synthetic monitoring to reduce MTTD&MTTR significantly and achieve high devops maturity. Daniel is a big believer in synthetic monitoring as a concept to build reliable production services. If engineers are supposed to run what they build, they need monitoring tools that work for them. He has built his own custom solutions in the past using Jenkins or GH Actions and later used SaaS tools for this. He would like to share his experience getting frontend engineers to build monitoring and get everyone on an engineering team to care about production system reliability. Daniel Paulus has taken a unique journey from military officer to tech leader, and he’s now the VP of Engineering at Checkly. Along the way, he’s worn many hats— from engineering lead to director —learning how to build strong teams and solve tough challenges. Outside of work, Daniel lives near Berlin with his family and four kids, while also finding time to maintain an open-source project. Whether it’s scaling teams or debugging code, he’s passionate about technology and enjoys sharing his knowledge with others.

DevOps Jenkins SaaS
9 SLIs ... OH MY! 2024-12-12 · 23:00
Sal Furino – Customer Reliability Engineer (CRE) @ Bloomberg

After years of working and coaching teams to implement SLOs, it’s becoming incredibly clear to me that the greatest challenge that engineering and product teams face is finding the right SLIs. SLOs are hard to get right, and it generally takes time and multiple iterations to tweak, tune, and adjust them so they’re providing value to inform when we need to take action to defend the reliability of our systems. However there is an underlying assumption that the SLI itself is/has been providing value. As hard as SLOs are to get right, thinking of a good SLI is also difficult. This especially complicates things for engineering teams that don’t have a product person. As a result, they often struggle to identify what are key user / customer journeys. This talk will attempt to provide attendees with additional guidance to help them think more clearly about and create better SLIs. We’ll break SLIs up into three (3) categories – Customer / User Experience, Supporting Services, and Management/Reputation. For each of these categories, I’ll discuss three relevant SLIs of each (e.g., application metrics, network metrics, Public Sentiment, etc.), some best practices, common pitfalls, and how the signal for each of the nine (9) metrics can be developed further to become more mature over time. Sal Furino is a Customer Reliability Engineer at Bloomberg. During his career he’s worked as a TPM, SRE, Developer, Sys Admin, and IT support. While not working he enjoys cooking, gaming, and traveling. Sal lives in Queens and has a BS in Applied Mathematics from Marist College.

A Safer Future with STPA 2024-12-12 · 23:00
Theo Klein – Senior Site Reliability Engineer @ Google Maps

Want to prevent outages before they happen? Traditional SRE methods focus on component failures, but a whole class of outages stem from unexpected system interactions. We found a solution. In our team, we use Systems Theoretic Process Analysis (STPA) to identify and fix system-level vulnerabilities before they cause outages. By applying STPA during the design phase, we've prevented major incidents and saved countless engineering hours. This talk will show you how STPA can transform your approach to reliability. We'll share a real-world example where STPA caught critical design flaws that traditional methods missed, saving us months of costly rework. Don't wait for outages to happen. Learn how STPA can help you build more resilient systems and become a 1000x engineer. Theo is a Senior Site Reliability Engineer for Google Maps. He is leading a program to improve road closure data safety. Previously, he led a program identifying risky dependencies within Google Maps. In his spare time, he hosts supper clubs.

Event Google SRE NY Tech Talk 2024-05-22
The Hammer Changes the Hand 2024-05-22 · 22:00
Sal Furino – Customer Reliability Engineer (CRE) @ Bloomberg

Imagine you’re observing a worker swinging a hammer. As they swing the hammer, they make small adjustments to better hit and drive the nail or rivet into the surface. These adjustments are made unconsciously. The hammer has become an extension of their arm. It’s important to consider that the arm doesn’t just change the hammer; it gives it new meaning beyond that of simply some wood and steel. But the hammer also changes the arm! Weeks, months, years of swinging that hammer changes the worker themselves. The tools we use change us and enable us to think and interact with the world differently. This talk will briefly explore how to view internal tooling through the lens of product management in not just developing and shipping features, but how those features empower teams to change their understanding of their social-technical systems.

Thiara Ortiz – Staff CDN Reliability Engineer @ Netflix

Any time a Netflix member sits down, reclines in their chair and turns on their TV to Netflix, there's a moment of truth. It's an opportunity to deliver a spectacular service with amazing quality of experience. Misses, errors, or high latency that prevent individuals from streaming, as a result of ISP configuration changes, code deployment, or catastrophic fallback, result in an impact on how our service is perceived. This talk will go over how we measure the quality of experience for our members and how we work to develop new metrics when we have additional offerings like live streaming and cloud gaming.

Cloud Computing Data Streaming
Mike Scherbakov – Staff Site Reliability Engineer @ Google

LLMs open up an opportunity to automate and scale many operational processes, which couldn't be otherwise solved by conventional methods. Examples include simple summarization of issues and incidents, assisting production on-callers, managing incidents, clustering (creating taxonomy) of issues, scaling SRE via assisted review of development design documents. Therefore LLMs provide a new and unique opportunity to transform the work we do as SREs.

LLM
Showing 6 results