Apache Airflow’s executor landscape has traditionally presented users with a clear trade-off: choose either the speed of local execution or the scalability, isolation and configurability of remote execution. The AWS Lambda Executor introduces a new paradigm that bridges this gap, offering near-local execution speeds with the benefits of remote containerization. This talk will begin with a brief overview of Airflow’s executors, how they work and what they are responsible for, highlighting the compromises between different executors. We will explore the emerging niche for fast, yet remote execution and demonstrate how the AWS Lambda Executor fills this space. We will also address practical considerations when using such an executor, such as working within Lambda’s 15 minute execution limit, and how to mitigate this using multi-executor configuration. Whether you’re new to Airflow or an experienced user, this session will provide valuable insights into task execution and how you can combine the best of both local and remote execution paradigms.
talk-data.com
Topic
Docker
109
tagged
Activity Trend
Top Events
Ready to contribute to Apache Airflow? In this hands-on workshop, you’ll be expected to come prepared with your development environment already configured (Breeze installed is strongly recommended, but Codespaces works if you can’t install Docker). We’ll dive straight into finding issues that match your skills and walk you through the entire contribution process—from creating your first pull request to receiving community feedback. Whether you’re writing code, enhancing documentation, or offering feedback, there’s a place for you. Let’s get started and see your name among Airflow contributors!
In this season of the Analytics Engineering podcast, Tristan is digging deep into the world of developer tools and databases. There are few more widely used developer tools than Docker. From its launch back in 2013, Docker has completely changed how developers ship applications. In this episode, Tristan talks to Solomon Hykes, the founder and creator of Docker. They trace Docker's rise from startup obscurity to becoming foundational infrastructure in modern software development. Solomon explains the technical underpinnings of containerization, the pivotal shift from platform-as-a-service to open-source engine, and why Docker's developer experience was so revolutionary. The conversation also dives into his next venture Dagger, and how it aims to solve the messy, overlooked workflows of software delivery. Bonus: Solomon shares how AI agents are reshaping how CI/CD gets done and why the next revolution in DevOps might already be here. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.
move from experimentation to a reproducible, shareable format.
explore tools like Docker, FastAPI, or simple cloud deployment.
Autonomous AI agents are transforming industries by enabling systems to perform tasks, make decisions and adapt in real time without human intervention. In this talk, I will delve into the architecture and design principles required to build these agents within scalable AI infrastructure. Key topics will include constructing modular, reusable frameworks, optimizing resource allocation and enabling interoperability between agents and data pipelines. I will discuss practical use cases in which attendees will learn how to leverage containerization and orchestration techniques to enhance the flexibility and performance of these agents while ensuring low-latency decision-making. This session will also highlight challenges like ensuring robustness, ethical considerations and strategies for real-time feedback loops. Participants will gain actionable insights into building autonomous AI agents that drive efficiency, scalability and innovation in modern AI ecosystems.
This session covers packaging your model for reproducibility, setting up basic infrastructure (Docker, FastAPI, or simple cloud deployment), and thinking in pipelines (how to automate data inputs, retraining, and monitoring).
Balancing developer agility with security compliance is a key challenge in AI-driven, cloud-native development. Learn how Docker and Google Cloud integrate security into every phase of the software lifecycle—enabling teams to build, test, and deploy AI features and applications with confidence. Embed security in developer workflows, enhance supply chain integrity with SLSA provenance and SBOM attestations, and leverage trusted content on seamless workflows with Google Cloud.
This Session is hosted by a Google Cloud Next Sponsor.
Visit your registration profile at g.co/cloudnext to opt out of sharing your contact information with the sponsor hosting this session.
How to successfully migrate petabyte-scale of Cassandra clusters to Spanner without requiring code changes. The use case addresses various lifecycle aspects, including IaC, containerization, gradual migration, performance testing, security, centralized observability and multi-region operations.
In this hands-on lab, you'll explore the power of Kubernetes and learn how to orchestrate cloud applications with ease. Using Google Kubernetes Engine, you’ll provision a fully managed Kubernetes cluster and deploy Docker containers using kubectl. Break down a monolithic application into microservices using Kubernetes Deployments and Services, and gain insights into the latest innovations in resource efficiency, developer productivity, and automated operations. By the end, you'll be ready to streamline application management in any environment.
If you register for a Learning Center lab, please ensure that you sign up for a Google Cloud Skills Boost account for both your work domain and personal email address. You will need to authenticate your account as well (be sure to check your spam folder!). This will ensure you can arrive and access your labs quickly onsite. You can follow this link to sign up!
In this hands-on lab, you'll explore the power of Kubernetes and learn how to orchestrate cloud applications with ease. Using Google Kubernetes Engine, you’ll provision a fully managed Kubernetes cluster and deploy Docker containers using kubectl. Break down a monolithic application into microservices using Kubernetes Deployments and Services, and gain insights into the latest innovations in resource efficiency, developer productivity, and automated operations. By the end, you'll be ready to streamline application management in any environment.
If you register for a Learning Center lab, please ensure that you sign up for a Google Cloud Skills Boost account for both your work domain and personal email address. You will need to authenticate your account as well (be sure to check your spam folder!). This will ensure you can arrive and access your labs quickly onsite. You can follow this link to sign up!
In this podcast episode, we talked with Eddy Zulkifly about From Supply Chain Management to Digital Warehousing and FinOps
About the Speaker: Eddy Zulkifly is a Staff Data Engineer at Kinaxis, building robust data platforms across Google Cloud, Azure, and AWS. With a decade of experience in data, he actively shares his expertise as a Mentor on ADPList and Teaching Assistant at Uplimit. Previously, he was a Senior Data Engineer at Home Depot, specializing in e-commerce and supply chain analytics. Currently pursuing a Master’s in Analytics at the Georgia Institute of Technology, Eddy is also passionate about open-source data projects and enjoys watching/exploring the analytics behind the Fantasy Premier League.
In this episode, we dive into the world of data engineering and FinOps with Eddy Zulkifly, Staff Data Engineer at Kinaxis. Eddy shares his unconventional career journey—from optimizing physical warehouses with Excel to building digital data platforms in the cloud.
🕒 TIMECODES 0:00 Eddy’s career journey: From supply chain to data engineering 8:18 Tools & learning: Excel, Docker, and transitioning to data engineering 21:57 Physical vs. digital warehousing: Analogies and key differences 31:40 Introduction to FinOps: Cloud cost optimization and vendor negotiations 40:18 Resources for FinOps: Certifications and the FinOps Foundation 45:12 Standardizing cloud cost reporting across AWS/GCP/Azure 50:04 Eddy’s master’s degree and closing thoughts
🔗 CONNECT WITH EDDY Twitter - https://x.com/eddarief Linkedin - https://www.linkedin.com/in/eddyzulkifly/ Github: https://github.com/eyzyly/eyzyly ADPList: https://adplist.org/mentors/eddy-zulkifly
🔗 CONNECT WITH DataTalksClub Join the community - https://datatalks.club/slack.html Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ
Check other upcoming events - https://lu.ma/dtc-events LinkedIn - https://www.linkedin.com/company/datatalks-club/ Twitter - https://twitter.com/DataTalksClub Website - https://datatalks.club/
It's finally possible to bring the awesome power of Large Language Models (LLMs) to your laptop. This talk will explore how to run and leverage small, openly available LLMs to power common tasks involving data, including selecting the right models, practical use cases for running small models, and best practices for deploying small models effectively alongside databases.
Bio: Jeffrey Morgan is the founder of Ollama, an open-source tool to get up and run large language models. Prior to founding Ollama, Jeffrey founded Kitematic, which was acquired by Docker and evolved into Docker Desktop. He has previously worked at companies including Docker, Twitter, and Google.
➡️ Follow Us LinkedIn: https://www.linkedin.com/company/small-data-sf/ X/Twitter : https://twitter.com/smalldatasf Website: https://www.smalldatasf.com/
Discover how to run large language models (LLMs) locally using Ollama, the easiest way to get started with small AI models on your Mac, Windows, or Linux machine. Unlike massive cloud-based systems, small open source models are only a few gigabytes, allowing them to run incredibly fast on consumer hardware without network latency. This video explains why these local LLMs are not just scaled-down versions of larger models but powerful tools for developers, offering significant advantages in speed, data privacy, and cost-effectiveness by eliminating hidden cloud provider fees and risks.
Learn the most common use case for small models: combining them with your existing factual data to prevent hallucinations. We dive into retrieval augmented generation (RAG), a powerful technique where you augment a model's prompt with information from a local data source. See a practical demo of how to build a vector store from simple text files and connect it to a model like Gemma 2B, enabling you to query your own data using natural language for fast, accurate, and context-aware responses.
Explore the next frontier of local AI with small agents and tool calling, a new feature that empowers models to interact with external tools. This guide demonstrates how an LLM can autonomously decide to query a DuckDB database, write the correct SQL, and use the retrieved data to answer your questions. This advanced tutorial shows you how to connect small models directly to your data engineering workflows, moving beyond simple chat to create intelligent, data-driven applications.
Get started with practical applications for small models today, from building internal help desks to streamlining engineering tasks like code review. This video highlights how small and large models can work together effectively and shows that open source models are rapidly catching up to their cloud-scale counterparts. It's never been a better time for developers and data analysts to harness the power of local AI.
This book provides a comprehensive approach to manage PostgreSQL cluster databases on Amazon Web Services and Azure Web Services on the cloud, as well as in Docker and container environments on a Red Hat operating system. Furthermore, detailed references for managing PostgreSQL on both Windows and Mac are provided. This book condenses all the fundamental and essential concepts you need to manage a PostgreSQL cluster into a one-stop guide that is perfect for newcomers to Postgres database administration. Each chapter of the book provides historical context and documents version changes of the PostgreSQL cluster, elucidates practical "how-to" methods, and includes illustrations and key word definitions, practices for application, a summary of key learnings, and questions to reinforce understanding. The book also outlines a clear study objective with a weekly learning schedule and hundreds of practice exercises, along with questions and answers. With its comprehensive and practical approach, this book will help you gain the confidence to manage all aspects of a PostgreSQL cluster in critical production environments so you can better support your organization's database infrastructure on the cloud and in containers. What You Will Learn Install and configure Postgres clusters on the cloud and in containers, monitor database logs, start and stop databases, troubleshoot, tune performance, backup and recover, and integrate with Amazon S3 and Azure Data Blob Manage Postgres databases on Amazon Web Services and Azure Web Services on the cloud, as well as in Docker and container environments on a Red Hat operating system Access sample references to scripting solutions and database management tools for working with Postgres, Redshift (based on Postgres 8.2), and Docker Create Amazon Machine Images (AMI) and Azure Images for managing a fleet of Postgres clusters on the cloud Reinforce knowledge with a weekly learning schedule and hundreds of practice exercises, along with questions and answers Progress from simple concepts, such as how to choose the correct instance type, to creating complex machine images Gain access to an Amazon AMI with a DBA admin tool, allowing you to learn Postgres, Redshift, and Docker in a cloud environment Refer to a comprehensive summary of documentations of Postgres, Amazon Web services, Azure Web services, and Red Hat Linux for managing all aspects of Postgres cluster management on the cloud Who This Book Is For Newcomers to PostgreSQL database administration and cross-platform support DBAs looking to master PostgreSQL on the cloud.
Brought to you by: • WorkOS — The modern identity platform for B2B SaaS. • Sevalla — Deploy anything from preview environments to Docker images. • Chronosphere — The observability platform built for control. — Welcome to The Pragmatic Engineer! Today, I’m thrilled to be joined by Grady Booch, a true legend in software development. Grady is the Chief Scientist for Software Engineering at IBM, where he leads groundbreaking research in embodied cognition. He’s the mind behind several object-oriented design concepts, a co-author of the Unified Modeling Language, and a founding member of the Agile Alliance and the Hillside Group. Grady has authored six books, hundreds of articles, and holds prestigious titles as an IBM, ACM, and IEEE Fellow, as well as a recipient of the Lovelace Medal (an award for those with outstanding contributions to the advancement of computing). In this episode, we discuss: • What it means to be an IBM Fellow • The evolution of the field of software development • How UML was created, what its goals were, and why Grady disagrees with the direction of later versions of UML • Pivotal moments in software development history • How the software architect role changed over the last 50 years • Why Grady declined to be the Chief Architect of Microsoft – saying no to Bill Gates! • Grady’s take on large language models (LLMs) • Advice to less experienced software engineers • … and much more! — Timestamps (00:00) Intro (01:56) What it means to be a Fellow at IBM (03:27) Grady’s work with legacy systems (09:25) Some examples of domains Grady has contributed to (11:27) The evolution of the field of software development (16:23) An overview of the Booch method (20:00) Software development prior to the Booch method (22:40) Forming Rational Machines with Paul and Mike (25:35) Grady’s work with Bjarne Stroustrup (26:41) ROSE and working with the commercial sector (30:19) How Grady built UML with Ibar Jacobson and James Rumbaugh (36:08) An explanation of UML and why it was a mistake to turn it into a programming language (40:25) The IBM acquisition and why Grady declined Bill Gates’s job offer (43:38) Why UML is no longer used in industry (52:04) Grady’s thoughts on formal methods (53:33) How the software architect role changed over time (1:01:46) Disruptive changes and major leaps in software development (1:07:26) Grady’s early work in AI (1:12:47) Grady’s work with Johnson Space Center (1:16:41) Grady’s thoughts on LLMs (1:19:47) Why Grady thinks we are a long way off from sentient AI (1:25:18) Grady’s advice to less experienced software engineers (1:27:20) What’s next for Grady (1:29:39) Rapid fire round — The Pragmatic Engineer deepdives relevant for this episode: • The Past and Future of Modern Backend Practices https://newsletter.pragmaticengineer.com/p/the-past-and-future-of-backend-practices • What Changed in 50 Years of Computing https://newsletter.pragmaticengineer.com/p/what-changed-in-50-years-of-computing • AI Tooling for Software Engineers: Reality Check https://newsletter.pragmaticengineer.com/p/ai-tooling-2024 — Where to find Grady Booch: • X: https://x.com/grady_booch • LinkedIn: https://www.linkedin.com/in/gradybooch • Website: https://computingthehumanexperience.com Where to find Gergely: • Newsletter: https://www.pragmaticengineer.com/ • YouTube: https://www.youtube.com/c/mrgergelyorosz • LinkedIn: https://www.linkedin.com/in/gergelyorosz/ • X: https://x.com/GergelyOrosz — References and Transcripts: See the transcript and other references from the episode at https://newsletter.pragmaticengineer.com/podcast — Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email [email protected].
Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe
This comprehensive guide, featuring hand-picked examples of daily use cases, will walk you through the end-to-end predictive model-building cycle using the latest techniques and industry tricks. In Chapters 1, 2, and 3, we will begin by setting up the environment and covering the basics of PySpark, focusing on data manipulation. Chapter 4 delves into the art of variable selection, demonstrating various techniques available in PySpark. In Chapters 5, 6, and 7, we explore machine learning algorithms, their implementations, and fine-tuning techniques. Chapters 8 and 9 will guide you through machine learning pipelines and various methods to operationalize and serve models using Docker/API. Chapter 10 will demonstrate how to unlock the power of predictive models to create a meaningful impact on your business. Chapter 11 introduces some of the most widely used and powerful modeling frameworks to unlock real value from data. In this new edition, you will learn predictive modeling frameworks that can quantify customer lifetime values and estimate the return on your predictive modeling investments. This edition also includes methods to measure engagement and identify actionable populations for effective churn treatments. Additionally, a dedicated chapter on experimentation design has been added, covering steps to efficiently design, conduct, test, and measure the results of your models. All code examples have been updated to reflect the latest stable version of Spark. You will: Gain an overview of end-to-end predictive model building Understand multiple variable selection techniques and their implementations Learn how to operationalize models Perform data science experiments and learn useful tips
Brought to you by: • Launch Darkly — a platform for high-velocity engineering teams to release, monitor, and optimize great software. • Sevalla — Deploy anything from preview environments to Docker images. • WorkOS — The modern identity platform for B2B SaaS. — On today’s episode of The Pragmatic Engineer, I’m joined by fellow Uber alum, Sabin Roman, now the first Engineering Manager at Linear. Linear, known for its powerful project and issue-tracking system, streamlines workflows throughout the product development process. In our conversation today, Sabin and I compare building projects at Linear versus our experiences at Uber. He shares insights into Linear’s unique approaches, including: • How Linear handles internal communications • The “goalie” program to address customer concerns and Linear’s zero bug policy • How Linear keeps teams connected despite working entirely remotely • An in-depth, step-by-step walkthrough of a project at Linear • Linear’s focus on quality and creativity over fash shipping • Titles at Linear, Sabin’s learnings from Uber, and much more! Timestamps (00:00) Intro (01:41) Sabin’s background (02:45) Why Linear rarely uses e-mail internally (07:32) An overview of Linear's company profile (08:03) Linear’s tech stack (08:20) How Linear operated without product people (09:40) How Linear stays close to customers (11:27) The shortcomings of Support Engineers at Uber and why Linear’s “goalies” work better (16:35) Focusing on bugs vs. new features (18:55) Linear’s hiring process (21:57) An overview of a typical call with a hiring manager at Linear (24:13) The pros and cons of Linear’s remote work culture (29:30) The challenge of managing teams remotely (31:44) A step-by-step walkthrough of how Sabin built a project at Linear (45:47) Why Linear’s unique working process works (49:57) The Helix project at Uber and differences in operations working at a large company (57:47) How senior engineers operate at Linear vs. at a large company (1:01:30) Why Linear has no levels for engineers (1:07:13) Less experienced engineers at Linear (1:08:56) Sabin’s big learnings from Uber (1:09:47) Rapid fire round — The Pragmatic Engineer deepdives relevant for this episode: • The story of Linear, as told by its CTO • An update on Linear, after their $35M fundraise • Software engineers leading projects • Netflix’s historic introduction of levels for software engineers — Where to find Sabin Roman: • X: https://x.com/sabin_roman • LinkedIn: https://www.linkedin.com/in/sabinroman/ Where to find Gergely: • Newsletter: https://www.pragmaticengineer.com/ • YouTube: https://www.youtube.com/c/mrgergelyorosz • LinkedIn: https://www.linkedin.com/in/gergelyorosz/ • X: https://x.com/GergelyOrosz — References and Transcripts: See the transcript and other references from the episode at https://newsletter.pragmaticengineer.com/podcast — Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email [email protected].
Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe
We talked about:
00:00 DataTalks.Club intro
02:34 Career journey and transition into MLOps
08:41 Dutch agriculture and its challenges
10:36 The concept of "technical debt" in MLOps
13:37 Trade-offs in MLOps: moving fast vs. doing things right
14:05 Building teams and the role of coordination in MLOps
16:58 Key roles in an MLOps team: evangelists and tech translators
23:01 Role of the MLOps team in an organization
25:19 How MLOps teams assist product teams
27 :56 Standardizing practices in MLOps
32:46 Getting feedback and creating buy-in from data scientists
36:55 The importance of addressing pain points in MLOps
39:06 Best practices and tools for standardizing MLOps processes
42:31 Value of data versioning and reproducibility
44:22 When to start thinking about data versioning
45:10 Importance of data science experience for MLOps
46:06 Skill mix needed in MLOps teams
47:33 Building a diverse MLOps team
48:18 Best practices for implementing MLOps in new teams
49:52 Starting with CI/CD in MLOps
51:21 Key components for a complete MLOps setup
53:08 Role of package registries in MLOps
54:12 Using Docker vs. packages in MLOps
57:56 Examples of MLOps success and failure stories
1:00:54 What MLOps is in simple terms
1:01:58 The complexity of achieving easy deployment, monitoring, and maintenance
Join our Slack: https://datatalks .club/slack.html
At TIER Mobility, we successfully reduced our cloud expenses by over 60% in less than two years. While this was a significant achievement, the journey wasn’t without its challenges. In this presentation, I’ll share insights into the potential pitfalls of cost reduction strategies that might end up being more expensive in the long run.