At Yahoo, we built a secure, scalable, and cost-efficient batch processing platform using Amazon MWAA to orchestrate Apache Flink jobs on EKS, managed by the Flink Kubernetes Operator. This setup enables dynamic job orchestration while meeting strict enterprise compliance standards. In this session, we’ll share how Airflow DAGs: Dynamically launch, monitor, and clean up isolated Flink clusters per batch job, improving resource efficiency. Securely fetch EKS kubeconfig, submit FlinkDeployment CRDs using FlinkKubernetesOperator, and poll job status using Airflow sensors. Integrate IAM for access control and meet Yahoo’s security requirements, including mutual TLS (mTLS) with Athenz. Optimize for cost and resilience through automated cleanup of jobs and the operator, and handle job failures and retries. Join us for practical strategies and lessons from Yahoo’s production-scale Flink workflows in a Kubernetes environment.
talk-data.com
Topic
Cyber Security
2078
tagged
Activity Trend
Top Events
This talk will explore the key changes introduced by AIP-81, focusing on security enhancements and user experience improvements across the entire software development lifecycle. We will break down the technical advancements from both a security and usability perspective, addressing key questions for Apache Airflow users of all levels. Topics include and not limited to isolating CLI communication to enhance security via leveraging Role-Based Access Control (RBAC) within the API for secure database interactions, clearly defining local vs. remote command execution and future improvements.
Airflow v2 architecture has strong coupling between the Airflow core & the User Code running in an Airflow task. This poses barriers in security, maintenance, and adoption. One such threat is that user code can access the source of truth of Airflow - the metadata DB and run any query against it! From a scalability angle, ‘n’ tasks create ‘n’ DB connections, limiting Airflow’s ability to scale effectively. To address this we proposed AIP-72 – a client-server model for task execution. The new architecture addresses several long-standing issues, including DB isolation from workers, dependency conflicts between Airflow core & workers, and ‘n’ number of DB connections.The new architecture has two parts: Execution API Server: Tasks no longer have direct DB access, they use this new slim, secure API Task SDK: A lightweight toolkit that lets you write tasks without drowning within Airflow’s codebase Beyond isolation and security, the redesign unlocks the ability for native multi-language task authoring support, and secure Remote Execution. Join us to explore how AIP-72 transforms Airflow task execution, paving the way for a more secure, flexible, and futuristic task orchestration!
Jonah will walk through his experience helping migrate a legacy enterprise system from stored procedures to a microservices architecture. Early in the project, they introduced SonarQube to improve code quality and surface security concerns during the POC phase leading them to reevaluate some of their initial architectural choices. Jonah will share how they integrated SonarQube into their CI, how it shaped their dev process, and what he learned from digging through the codebase to clean up smells and reduce tech debt! Key Takeaways: 1. Practical lessons from cleaning up real-world code smells and vulnerabilities manually as part of a growing team. 2. How to integrate SonarQube into a CI/CD pipeline for a microservices project and what impact it actually has on code quality. 3. Using SonarQube for POC work benefits as a design feedback tool, not just a cleanup tool.
Unlock the world of data science—no coding required. Curious about data science but not sure where to start? This book is a beginner-friendly guide to what data science is and how people use it. It walks you through the essential topics—what data analysis involves, which skills are useful, and how terms like “data analytics” and “machine learning” connect—without getting too technical too fast. Data science isn’t just about crunching numbers, pulling data from a database, or running fancy algorithms. It’s about asking the right questions, understanding the process from start to finish, and knowing what’s possible (and what’s not). This book teaches you all of that, while also introducing important topics like ethics, privacy, and security—because working with data means thinking about people, too. Whether you're a student exploring new skills, a professional navigating data-driven decisions, or someone considering a career change, this book is your friendly gateway into the world of data science, one of today’s most exciting fields. No coding or programming experience? No problem. You'll build a solid foundation and gain the confidence to engage with data science concepts— just as AI and data become increasingly central to everyday life. What You Will Learn Grasp foundational statistics and how it matters in data analysis and data science Understand the data science project life cycle and how to manage a data science project Examine the ethics of working with data and its use in data analysis and data science Understand the foundations of data security and privacy Collect, store, prepare, visualize, and present data Identify the many types of machine learning and know how to gauge performance Prepare for and find a career in data science Who This Book is for A wide range of readers who are curious about data science and eager to build a strong foundation. Perfect for undergraduates in the early semesters of their data science degrees, as it assumes no prior programming or industry experience. Professionals will find particular value in the real-world insights shared through practitioner interviews. Business leaders can use it to better understand what data science can do for them and how their teams are applying it. And for career changers, this book offers a welcoming entry point into the field—helping them explore the landscape before committing to more intensive learning paths like degrees or boot camps.
Remote presentation by a USADatadog engineer based in the USA.
Recognize and avoid these common PostgreSQL mistakes! The best mistakes to learn from are ones made by other people! In PostgreSQL Mistakes and How To Avoid Them you’ll explore dozens of common PostgreSQL errors so you can easily avoid them in your own projects, learning proactively why certain approaches fail and others succeed. In PostgreSQL Mistakes and How To Avoid Them you’ll learn how to: Avoid configuration and operation issues Maximize PostgreSQL utility and performance Fix bad SQL practices Solve common security and administration issues Ensure smooth migration and upgrades Diagnose and fix a bad database As PostgreSQL continues its rise as a leading open source database, mastering its intricacies is crucial. PostgreSQL Mistakes and How To Avoid Them is full of tested best practices to ensure top performance, and future-proof your database systems for seamless change and growth. Each of the mistakes is carefully described and accompanied by a demo, along with an explanation that expands your knowledge of PostgreSQL internals and helps you to build a stronger mental model of how the database engine works. About the Technology Fixing mistakes in PostgreSQL databases can be time-consuming and risky—especially when you’re making live changes to an in-use system. Fortunately, you can learn from the mistakes other Postgres pros have already made! This incredibly practical book lays out how to find and avoid the most common, dangerous, and sneaky errors you’ll encounter using PostgreSQL. About the Book PostgreSQL Mistakes and How To Avoid Them identifies Postgres problems in key areas like data types, features, security, and high availability. For each mistake you’ll find a real-world narrative that illustrates the pattern and provides concrete recommendations for improvement. You’ll especially appreciate the illustrative code snippets, schema samples, mind maps, and tables that show the pros and cons of different approaches. What's Inside Diagnose configuration and operation issues Fix bad SQL code Address security and administration issues Ensure smooth migration and upgrades About the Reader For PostgreSQL database administrators and application developers. About the Author Jimmy Angelakos is a systems and database architect and PostgreSQL Contributor. He works as a Senior Principal Engineer at Deriv. Quotes I’ve run into many of these mistakes. Read up to get prepared! - Milorad Imbra, FEVO Navigates PostgreSQL pitfalls with clarity. I highly recommend it. - Manohar Sai Jasti, Workday A straightforward style and real-world examples make it an essential read. - Potito Coluccelli, Econocom Italia Provides valuable tips to avoid common PostgreSQL pitfalls. - Fernando Bugni, Grupo QuintoAndar
Tired of one-size-fits-all security? In this advanced session, we'll venture past traditional role-based access controls and discover powerful techniques to protect your sensitive data with surgical precision. We will explore how you can implement access policies for documents and fields (DLS/FLS), anonymize fields on the fly (goodbye exposed PII!), and how to blend role and attribute-based approaches for dramatically simpler role definitions.
One of the really cool aspects of OpenSearch is its ability to serve as a monitoring tool for itself. Let's have a look at how to piece together a monitoring solution for OpenSearch using components from the OpenSearch ecosystem such as Data Prepper and OpenSearch alerting. We will focus on security aspects like authentication and authorization.\n\nAny software dealing with confidential data needs to have a security solution providing authentication and authorization. And any such security solution should have proper monitoring. OpenSearch is no exception in this regard.\n\nIn this presentation, we will look how to use components like Audit Logs, Data Prepper and OpenSearch Alerting to create a small, but effective security monitoring solution.
Supported by Our Partners • Statsig — The unified platform for flags, analytics, experiments, and more. • Graphite — The AI developer productivity platform. • Augment Code — AI coding assistant that pro engineering teams love — GitHub recently turned 17 years old—but how did it start, how has it evolved, and what does the future look like as AI reshapes developer workflows? In this episode of The Pragmatic Engineer, I’m joined by Thomas Dohmke, CEO of GitHub. Thomas has been a GitHub user for 16 years and an employee for 7. We talk about GitHub’s early architecture, its remote-first operating model, and how the company is navigating AI—from Copilot to agents. We also discuss why GitHub hires junior engineers, how the company handled product-market fit early on, and why being a beloved tool can make shipping harder at times. Other topics we discuss include: • How GitHub’s architecture evolved beyond its original Rails monolith • How GitHub runs as a remote-first company—and why they rarely use email • GitHub’s rigorous approach to security • Why GitHub hires junior engineers • GitHub’s acquisition by Microsoft • The launch of Copilot and how it’s reshaping software development • Why GitHub sees AI agents as tools, not a replacement for engineers • And much more! — Timestamps (00:00) Intro (02:25) GitHub’s modern tech stack (08:11) From cloud-first to hybrid: How GitHub handles infrastructure (13:08) How GitHub’s remote-first culture shapes its operations (18:00) Former and current internal tools including Haystack (21:12) GitHub’s approach to security (24:30) The current size of GitHub, including security and engineering teams (25:03) GitHub’s intern program, and why they are hiring junior engineers (28:27) Why AI isn’t a replacement for junior engineers (34:40) A mini-history of GitHub (39:10) Why GitHub hit product market fit so quickly (43:44) The invention of pull requests (44:50) How GitHub enables offline work (46:21) How monetization has changed at GitHub since the acquisition (48:00) 2014 desktop application releases (52:10) The Microsoft acquisition (1:01:57) Behind the scenes of GitHub’s quiet period (1:06:42) The release of Copilot and its impact (1:14:14) Why GitHub decided to open-source Copilot extensions (1:20:01) AI agents and the myth of disappearing engineering jobs (1:26:36) Closing — The Pragmatic Engineer deepdives relevant for this episode: • AI Engineering in the real world • The AI Engineering stack • How Linux is built with Greg Kroah-Hartman • Stacked Diffs (and why you should know about them) • 50 Years of Microsoft and developer tools — See the transcript and other references from the episode at https://newsletter.pragmaticengineer.com/podcast — Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email [email protected].
Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe
Data architects are increasingly tasked with provisioning quality unstructured data to support AI models. However, little has been done to manage unstructured data beyond data security and privacy requirements. This session will look at what it takes to improve the quality of unstructured data and the emerging best practices in this space.
Meridian Energy, New Zealand’s leader in 100% renewable generation, adopted Denodo as a unified semantic data layer to accelerate the delivery of diverse use cases across its lakehouse environment. From security risk modelling to incident management, ESG compliance and more, Denodo enables governed, real-time access to data without replication – reducing ETL overhead, empowering self-service, and ensuring consistent metrics. Business teams are continuing to explore and advance data-driven solutions, supporting Meridian’s shift to a governed lakehouse architecture.
This comprehensive guide book equips you with the knowledge and confidence needed to prep for the exam and thrive as a Power Platform Solution Architect. The book starts with a foundation for successful solution architecture, emphasizing essential skills such as requirements gathering, governance, and security. You will learn to navigate customer discovery, translate business needs into technical requirements, and design solutions that address both functional and non-functional needs. The second part of the book delves into the Microsoft Power Platform ecosystem, offering an in-depth look at its core components—Power Apps, Power Automate, Power BI, Microsoft Copilot, and Robotic Process Automation (RPA). Detailed insights into data modeling, security strategies, and AI integration will guide you in building scalable, secure solutions. Coverage of application life cycle management, which empowers solution architects to design, implement, and deploy Power Platform solutions effectively, is discussed next. You will then go through real-world scenarios, giving you a practical understanding of the challenges and considerations in managing Power Platform projects within a business context. The book concludes with strategies for continuous learning and resources for professional development, including practice questions to assess knowledge and readiness for the PL-600 exam. After reading the book, you will be ready to take the exam and become a successful Power Platform Solution Architect. What You Will Learn Understand the Solution Architect's role, responsibilities, and strategic approaches to successfully navigate projects Master the basics of Power Platform Solution Architecture Understand governance, security, and integration concepts in real-world scenarios Design and deploy effective business solutions using Power Platform components Gain the skills necessary to prep for the PL-600 certification exam Who This Book Is For Professionals pursuing Microsoft PL-600 Solution Architect certification and IT consultants and developers transitioning to solution architect roles
Join award-winning broadcaster and national comedy champion, Adam Spencer to learn how technologies like AI, Cyber Security and ChatGPT are disrupting business and how you can win in this world. In this thought-provoking and funny presentation, Adam will share:
• How Artificial Intelligence will impact every industry and how to harness its potential
• The crucial role every worker plays in your cyber security
• The business potential of holding a supercomputer in your hands
• How to keep up if the pace of digital disruption feels overwhelming
Join this session to learn how Coinbase builds end-to-end ML workflows on top of Snowflake’s platform for optimal data security, governance and price performance. Using features such as Snowflake Feature Store and Snowflake Model Registry, Coinbase now automates batch and online inference on predictive ML models to quickly and accurately unban users who were initially incorrectly flagged as suspected fraud or bots, resulting in an improved user experience and increased revenue.
Learn how to streamline governance and security for data and AI with Snowflake's latest updates to Horizon Catalog. Join us for new product overviews and live demos covering Trust Center, sensitive data, data quality, lineage and more.
Embark on a transformative AI journey with our session focused on deploying AI agents that deliver immediate ROI while ensuring robust data security. We’ll delve into advanced AI orchestration techniques that not only enhance system efficiency but also improve employee productivity. By incorporating TRiSM principles, you’ll learn how to develop AI applications that are both trustworthy and risk-managed. Whether you are just beginning your AI journey or seeking to expand your existing framework, this session offers practical insights to transform AI potential into meaningful business outcomes.
Struggling to balance AI-driven innovation with security and compliance? Organisations are racing to leverage AI, advanced analytics, and the cloud for a competitive edge—but growing regulations and data protection requirements create significant challenges.
This session will provide practical strategies to maximise AI potential while safeguarding sensitive data, ensuring compliance, and mitigating risk.
Learn how to harness innovation without compromise and position your organisation for success in a rapidly evolving digital landscape.
Generative AI is revolutionizing businesses, but the secret behind its success is your data – sensitive, unstructured, and exposed. Companies are racing to deploy AI yet often overlook the risks of data compromise, exfiltration and compliance. As organizations adopt new data tools, they must rethink their approach to data security or risk disastrous consequences.
If you’re struggling with balancing innovation and security, this session will give you the blueprint to securely scale data & AI applications.