Du chaos à la clarté : comment les data contracts transforment la gouvernance en croissance

2025-10-01 · Big Data & AI Paris 2025

Face To Face

by Jean-Georges Perrin (Actian)

AI/ML Data Contracts

Bien menée, la gouvernance est un moteur de croissance. Durant cette session, Jean-Georges Perrin montrera comment les data contracts apportent précision, confiance et responsabilité à vos pipelines données et IA, sans créer de goulots d'étranglement. En utilisant l'Open Data Contract Standard (ODCS) du projet Bitol de la Fondation Linux, vous découvrirez comment les organisations peuvent réduire les défauts en aval, accélérer l'intégration des modèles IA, réduire les risques de conformité et simplifier la gestion des incidents, souvent en quelques jours seulement.

From Chaos to Clarity: How Data Contracts Turn Governance into a Growth Engine

2025-09-25 · Big Data LDN 2025

Face To Face

by Jean-Georges Perrin (Actian)

AI/ML Data Contracts

When done right, governance is a growth engine. In this talk, Jean-Georges “jgp” Perrin will show how data contracts bring precision, trust, and accountability into your data and AI pipelines—without creating bottlenecks. Using the Open Data Contract Standard (ODCS) from the Linux Foundation’s Bitol project, you’ll see how organizations can cut downstream defects, accelerate AI model onboarding, lower compliance risk, and reduce firefighting—often in just days.

Workshop 2: Data Products

2025-09-23 · Data Driven LDN Conference

Face To Face

by Mandy Chessell (Pragmatic Data Research)

Airflow Dashboard Data Governance Kafka Superset

Following on from the Building consumable data products keynote, we will dive deeper into the interactions around the data product catalog, to show how the network effect of explicit data sharing relationships starts to pay dividends to the participants. Such as:

For the product consumer:

• Searching for products, understanding content, costs, terms and conditions, licenses, quality certifications etc

• Inspecting sample data, choosing preferred data format, setting up a secure subscription, and seeing data provisioned into a database from the product catalog.

• Providing feedback and requesting help

• Reviewing own active subscriptions

• Understanding the lineage behind each product along with outstanding exceptions and future plans

For the product manager/owner:

• Setting up a new product, creating a new release of an existing product and issuing a data correction/restatement

• Reviewing a product’s active subscriptions and feedback/requests from consumers

• Interacting with the technical teams on pipeline implementations along with issues and proposed enhancements

• For the data governance team

• Viewing the network of dependencies between data products (the data mesh) to understand the data value chains and risk concentrations

• Reviewing a dashboard of metrics around the data products including popularity, errors/exceptions, subscriptions, interaction

• Show traceability from a governance policy relating to, say data sovereignty or data privacy to the product implementations.

• Building trust profiles for producers and consumers

The aim of the demonstrations and discussions is to explore the principles and patterns relating to data products, rather than push a particular implementation approach.

Having said that, all of the software used in the demonstrations is open source. Principally this is Egeria, Open Lineage and Unity Catalog from the Linux Foundation, plus Apache Airflow, Apache Kafka and Apache SuperSet from the Apache Software Foundation.

Videos of the demonstrations will be available on YouTube after the conference and the complete demo software can be downloaded and run on a laptop so you can share your experiences with your teams after the event.

Docling: Get your documents ready for gen AI

2025-09-03 · PyData Berlin 2025 Watch

talk

by Michele Dolfi , Christoph Auer

AI/ML GenAI GitHub Python

Docling, an open source package, is rapidly becoming the de facto standard for document parsing and export in the Python community. Earning close to 30,000 GitHub in less than one year and now part of the Linux AI & Data Foundation. Docling is redefining document AI with its ease and speed of use. In this session, we’ll introduce Docling and its features, including usages with various generative AI frameworks and protocols (e.g. MCP).

The Infrastructure Revolution for AI Factories w/ David Flynn

2025-08-29 · Data Unchained

podcast_episode

by David Flynn (Hammerspace)

AI/ML

In this episode of Data Unchained,we sit down with David Flynn, Founder and CEO of Hammerspace, to explore how next-generation infrastructure is transforming the future of AI factories, hyperscaler data centers, and enterprise-scale AI deployments. From exabyte-in-a-rack architectures to parallel file systems native in Linux, this conversation reveals how organizations can drastically lower CapEx, OpEx, and power consumption while unlocking unprecedented performance density. Cyberpunk by jiglr | https://soundcloud.com/jiglrmusic Music promoted by https://www.free-stock-music.com Creative Commons Attribution 3.0 Unported License https://creativecommons.org/licenses/by/3.0/deed.en_US

AIInfrastructure #Hyperscalers #DataEngineering #EnterpriseAI #SoftwareArchitecture #ExabyteStorage #ParallelFileSystems #LinuxNative #DataCenters #AIatScale #OpenPlatformInitiative #globaldata

Hosted on Acast. See acast.com/privacy for more information.

Data Engineering for Cybersecurity

2025-08-26 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by James Bonifield

Ansible Cloud Computing Data Engineering ELK Git Kafka Logstash PowerShell Redis Cyber Security Data Streaming data +1 more

Security teams rely on telemetry—the continuous stream of logs, events, metrics, and signals that reveal what’s happening across systems, endpoints, and cloud services. But that data doesn’t organize itself. It has to be collected, normalized, enriched, and secured before it becomes useful. That’s where data engineering comes in. In this hands-on guide, cybersecurity engineer James Bonifield teaches you how to design and build scalable, secure data pipelines using free, open source tools such as Filebeat, Logstash, Redis, Kafka, and Elasticsearch and more. You’ll learn how to collect telemetry from Windows including Sysmon and PowerShell events, Linux files and syslog, and streaming data from network and security appliances. You’ll then transform it into structured formats, secure it in transit, and automate your deployments using Ansible. You’ll also learn how to: Encrypt and secure data in transit using TLS and SSH Centrally manage code and configuration files using Git Transform messy logs into structured events Enrich data with threat intelligence using Redis and Memcached Stream and centralize data at scale with Kafka Automate with Ansible for repeatable deployments Whether you’re building a pipeline on a tight budget or deploying an enterprise-scale system, this book shows you how to centralize your security data, support real-time detection, and lay the groundwork for incident response and long-term forensics.

Running SQL Server on Linux

2025-07-16 · SQL Server on Linux and Data QA Techniques

talk

SQL Server containerization

SQL Server has had the ability to run on Linux for a number of years now but what are the differences to running on Windows? What are the benefits of running SQL on Linux? Are there any? This session will show how to get started with SQL Server on Linux, what the main differences are, the advantages it brings, and the disadvantages that need to be contended with.

Topics covered will be: - How SQL Server on Linux works Installing SQL Server on a Linux distribution High availability options for SQL Server on Linux How to run SQL Server in a Linux container

Dive into Flytekit's Internals: A Python SDK to Quickly Bring your Code Into Production

2025-07-11 · SciPy 2025

talk

by Thomas J. Fan

AI/ML Python

Flyte is a Linux Foundation OSS orchestrator built for Data and Machine Learning workflows focused on scalability, reliability, and developer productivity. Flyte’s Python SDK, Flytekit, empowers developers by shipping their code from their local environments onto a cluster with one simple CLI command. In this talk, you will learn about the design and implementation details that powers Flytekit’s core features, such as “fast registration” and “type transformers”, and a plugin system that enables Dask, Ray, or distributed GPU workflows.

Reproducible Machine Learning Workflows for Scientists with pixi

2025-07-07 · SciPy 2025

talk

by John Kirkham , Ruben Arts , Matthew Feickert

AI/ML Python PyTorch

Scientific researchers need reproducible software environments for complex applications that can run across heterogeneous computing platforms. Modern open source tools, like pixi, provide automatic reproducibility solutions for all dependencies while providing a high level interface well suited for researchers.

This tutorial will provide a practical introduction to using pixi to easily create scientific and AI/ML environments that benefit from hardware acceleration, across multiple machines and platforms. The focus will be on applications using the PyTorch and JAX Python machine learning libraries with CUDA enabled, as well as deploying these environments to production settings in Linux container images.

Luca Borella — AI Program Manager, The Linux Foundation / FINOS

2025-07-02 · GenerationAI 2025 - Munich (Agentic AI, MCP and more)

talk

by Luca Borella (The Linux Foundation / FINOS)

AI/ML

Featured speaker at GenerationAI 2025

Open session to help folks convert their machines with support

2025-06-28 · [workshop] End of 10 - convert your windows 10 laptop to Linux!

workshop

Participatory session to assist attendees in converting Windows 10 machines to Linux with live support.

Opt Green - End of Windows 10

2025-06-26 · June Talk evening: KDE Opt Green - End of Windows 10

talk

by Nicole (Opt Green)

open source windows 10

Nicole discusses environmental impact of proprietary software, why a Linux operating system could be a solution and where you can get help if you decide to switch.

The present, past and future of GitHub

2025-06-18 · The Pragmatic Engineer Listen

podcast_episode

by Thomas Dohmke (GitHub) , Gergely Orosz

AI/ML Analytics Cloud Computing GitHub Marketing Microsoft Cyber Security

Supported by Our Partners •⁠ Statsig ⁠ — ⁠ The unified platform for flags, analytics, experiments, and more. • Graphite — The AI developer productivity platform. • Augment Code — AI coding assistant that pro engineering teams love — GitHub recently turned 17 years old—but how did it start, how has it evolved, and what does the future look like as AI reshapes developer workflows? In this episode of The Pragmatic Engineer, I’m joined by Thomas Dohmke, CEO of GitHub. Thomas has been a GitHub user for 16 years and an employee for 7. We talk about GitHub’s early architecture, its remote-first operating model, and how the company is navigating AI—from Copilot to agents. We also discuss why GitHub hires junior engineers, how the company handled product-market fit early on, and why being a beloved tool can make shipping harder at times. Other topics we discuss include: • How GitHub’s architecture evolved beyond its original Rails monolith • How GitHub runs as a remote-first company—and why they rarely use email • GitHub’s rigorous approach to security • Why GitHub hires junior engineers • GitHub’s acquisition by Microsoft • The launch of Copilot and how it’s reshaping software development • Why GitHub sees AI agents as tools, not a replacement for engineers • And much more! — Timestamps (00:00) Intro (02:25) GitHub’s modern tech stack (08:11) From cloud-first to hybrid: How GitHub handles infrastructure (13:08) How GitHub’s remote-first culture shapes its operations (18:00) Former and current internal tools including Haystack (21:12) GitHub’s approach to security (24:30) The current size of GitHub, including security and engineering teams (25:03) GitHub’s intern program, and why they are hiring junior engineers (28:27) Why AI isn’t a replacement for junior engineers (34:40) A mini-history of GitHub (39:10) Why GitHub hit product market fit so quickly (43:44) The invention of pull requests (44:50) How GitHub enables offline work (46:21) How monetization has changed at GitHub since the acquisition (48:00) 2014 desktop application releases (52:10) The Microsoft acquisition (1:01:57) Behind the scenes of GitHub’s quiet period (1:06:42) The release of Copilot and its impact (1:14:14) Why GitHub decided to open-source Copilot extensions (1:20:01) AI agents and the myth of disappearing engineering jobs (1:26:36) Closing — The Pragmatic Engineer deepdives relevant for this episode: • AI Engineering in the real world • The AI Engineering stack • How Linux is built with Greg Kroah-Hartman • Stacked Diffs (and why you should know about them) • 50 Years of Microsoft and developer tools — See the transcript and other references from the episode at ⁠⁠https://newsletter.pragmaticengineer.com/podcast⁠⁠ — Production and marketing by ⁠⁠⁠⁠⁠⁠⁠⁠https://penname.co/⁠⁠⁠⁠⁠⁠⁠⁠. For inquiries about sponsoring the podcast, email [email protected].

Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe

50 Years of Microsoft and Developer Tools with Scott Guthrie

2025-06-04 · The Pragmatic Engineer Listen

podcast_episode

by Scott Guthrie (Microsoft) , Gergely Orosz

AI/ML Analytics Azure C#/.NET Cloud Computing GitHub Marketing Microsoft

Supported by Our Partners •⁠ Statsig ⁠ — ⁠ The unified platform for flags, analytics, experiments, and more. •⁠ Sinch⁠ — Connect with customers at every step of their journey. •⁠ Modal⁠ — The cloud platform for building AI applications. — How has Microsoft changed since its founding in 1975, especially in how it builds tools for developers? In this episode of The Pragmatic Engineer, I sit down with Scott Guthrie, Executive Vice President of Cloud and AI at Microsoft. Scott has been with the company for 28 years. He built the first prototype of ASP.NET, led the Windows Phone team, led up Azure, and helped shape many of Microsoft’s most important developer platforms. We talk about Microsoft’s journey from building early dev tools to becoming a top cloud provider—and how it actively worked to win back and grow its developer base. In this episode, we cover: • Microsoft’s early years building developer tools • Why Visual Basic faced resistance from devs back in the day: even though it simplified development at the time • How .NET helped bring a new generation of server-side developers into Microsoft’s ecosystem • Why Windows Phone didn’t succeed • The 90s Microsoft dev stack: docs, debuggers, and more • How Microsoft Azure went from being the #7 cloud provider to the #2 spot today • Why Microsoft created VS Code • How VS Code and open source led to the acquisition of GitHub • What Scott’s excited about in the future of developer tools and AI • And much more! — Timestamps (00:00) Intro (02:25) Microsoft’s early years building developer tools (06:15) How Microsoft’s developer tools helped Windows succeed (08:00) Microsoft’s first tools were built to allow less technically savvy people to build things (11:00) A case for embracing the technology that’s coming (14:11) Why Microsoft built Visual Studio and .NET (19:54) Steve Ballmer’s speech about .NET (22:04) The origins of C# and Anders Hejlsberg’s impact on Microsoft (25:29) The 90’s Microsoft stack, including documentation, debuggers, and more (30:17) How productivity has changed over the past 10 years (32:50) Why Gergely was a fan of Windows Phone—and Scott’s thoughts on why it didn’t last (36:43) Lessons from working on (and fixing) Azure under Satya Nadella (42:50) Codeplex and the acquisition of GitHub (48:52) 2014: Three bold projects to win the hearts of developers (55:40) What Scott’s excited about in new developer tools and cloud computing (59:50) Why Scott thinks AI will enhance productivity but create more engineering jobs — The Pragmatic Engineer deepdives relevant for this episode: • Microsoft is dogfooding AI dev tools’ future • Microsoft’s developer tools roots • Why are Cloud Development Environments spiking in popularity, now? • Engineering career paths at Big Tech and scaleups • How Linux is built with Greg Kroah-Hartman — See the transcript and other references from the episode at ⁠⁠https://newsletter.pragmaticengineer.com/podcast⁠⁠ — Production and marketing by ⁠⁠⁠⁠⁠⁠⁠⁠https://penname.co/⁠⁠⁠⁠⁠⁠⁠⁠. For inquiries about sponsoring the podcast, email [email protected].

Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe

How Kubernetes is Built with Kat Cosgrove

2025-05-14 · The Pragmatic Engineer Listen

podcast_episode

by Gergely Orosz , Kat Cosgrove

AI/ML Cloud Computing GenAI Kubernetes Marketing SaaS

Supported by Our Partners •⁠ WorkOS — The modern identity platform for B2B SaaS. •⁠ Modal⁠ — The cloud platform for building AI applications. •⁠ Cortex⁠ — Your Portal to Engineering Excellence. — Kubernetes is the second-largest open-source project in the world. What does it actually do—and why is it so widely adopted? In this episode of The Pragmatic Engineer, I’m joined by Kat Cosgrove, who has led several Kubernetes releases. Kat has been contributing to Kubernetes for several years, and originally got involved with the project through K3s (the lightweight Kubernetes distribution). In our conversation, we discuss how Kubernetes is structured, how it scales, and how the project is managed to avoid contributor burnout. We also go deep into: • An overview of what Kubernetes is used for • A breakdown of Kubernetes architecture: components, pods, and kubelets • Why Google built Borg, and how it evolved into Kubernetes • The benefits of large-scale open source projects—for companies, contributors, and the broader ecosystem • The size and complexity of Kubernetes—and how it’s managed • How the project protects contributors with anti-burnout policies • The size and structure of the release team • What KEPs are and how they shape Kubernetes features • Kat’s views on GenAI, and why Kubernetes blocks using AI, at least for documentation • Where Kat would like to see AI tools improve developer workflows • Getting started as a contributor to Kubernetes—and the career and networking benefits that come with it • And much more! — Timestamps (00:00) Intro (02:02) An overview of Kubernetes and who it’s for (04:27) A quick glimpse at the architecture: Kubernetes components, pods, and cubelets (07:00) Containers vs. virtual machines (10:02) The origins of Kubernetes (12:30) Why Google built Borg, and why they made it an open source project (15:51) The benefits of open source projects (17:25) The size of Kubernetes (20:55) Cluster management solutions, including different Kubernetes services (21:48) Why people contribute to Kubernetes (25:47) The anti-burnout policies Kubernetes has in place (29:07) Why Kubernetes is so popular (33:34) Why documentation is a good place to get started contributing to an open-source project (35:15) The structure of the Kubernetes release team (40:55) How responsibilities shift as engineers grow into senior positions (44:37) Using a KEP to propose a new feature—and what’s next (48:20) Feature flags in Kubernetes (52:04) Why Kat thinks most GenAI tools are scams—and why Kubernetes blocks their use (55:04) The use cases Kat would like to have AI tools for (58:20) When to use Kubernetes (1:01:25) Getting started with Kubernetes (1:04:24) How contributing to an open source project is a good way to build your network (1:05:51) Rapid fire round — The Pragmatic Engineer deepdives relevant for this episode: •⁠ Backstage: an open source developer portal •⁠ How Linux is built with Greg Kroah-Hartman •⁠ Software engineers leading projects •⁠ What TPMs do and what software engineers can learn from them •⁠ Engineering career paths at Big Tech and scaleups — See the transcript and other references from the episode at ⁠⁠https://newsletter.pragmaticengineer.com/podcast⁠⁠ — Production and marketing by ⁠⁠⁠⁠⁠⁠⁠⁠https://penname.co/⁠⁠⁠⁠⁠⁠⁠⁠. For inquiries about sponsoring the podcast, email [email protected].

Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe

The Philosophy of Software Design – with John Ousterhout

2025-04-09 · The Pragmatic Engineer Listen

podcast_episode

by Gergely Orosz , John Ousterhout (Stanford University)

AI/ML Cloud Computing LLM Marketing

Supported by Our Partners •⁠ CodeRabbit⁠⁠ — Cut code review time and bugs in half. Use the code PRAGMATIC to get one month free. •⁠ Modal⁠ — The cloud platform for building AI applications. — How will AI tools change software engineering? Tools like Cursor, Windsurf and Copilot are getting better at autocomplete, generating tests and documentation. But what is changing, when it comes to software design? Stanford professor John Ousterhout thinks not much. In fact, he believes that great software design is becoming even more important as AI tools become more capable in generating code. In this episode of The Pragmatic Engineer, John joins me to talk about why design still matters and how most teams struggle to get it right. We dive into his book A Philosophy of Software Design, unpack the difference between top-down and bottom-up approaches, and explore why some popular advice, like writing short methods or relying heavily on TDD, does not hold up, according to John. We also explore: • The differences between working in industry vs. academia • Why John believes software design will become more important as AI capabilities expand • The top-down and bottoms-up design approaches – and why you should use both • John’s “design it twice” principle • Why deep modules are essential for good software design • Best practices for special cases and exceptions • The undervalued trait of empathy in design thinking • Why John advocates for doing some design upfront • John’s criticisms of the single-responsibility principle, TDD, and why he’s a fan of well-written comments • And much more! As a fun fact: when we recorded this podcast, John was busy contributing to the Linux kernel: adding support to the Homa Transport Protocol – a protocol invented by one of his PhD students. John wanted to make this protocol available more widely, and is putting in the work to do so. What a legend! (We previously covered how Linux is built and how to contribute to the Linux kernel) — Timestamps (00:00) Intro (02:00) Why John transitioned back to academia (03:47) Working in academia vs. industry (07:20) Tactical tornadoes vs. 10x engineers (11:59) Long-term impact of AI-assisted coding (14:24) An overview of software design (15:28) Why TDD and Design Patterns are less popular now (17:04) Two general approaches to designing software (18:56) Two ways to deal with complexity (19:56) A case for not going with your first idea (23:24) How Uber used design docs (26:44) Deep modules vs. shallow modules (28:25) Best practices for error handling (33:31) The role of empathy in the design process (36:15) How John uses design reviews (38:10) The value of in-person planning and using old-school whiteboards (39:50) Leading a planning argument session and the places it works best (42:20) The value of doing some design upfront (46:12) Why John wrote A Philosophy of Software of Design (48:40) An overview of John’s class at Stanford (52:20) A tough learning from early in Gergely’s career (55:48) Why John disagrees with Robert Martin on short methods (1:10:40) John’s current coding project in the Linux Kernel (1:14:13) Updates to A Philosophy of Software Design in the second edition (1:19:12) Rapid fire round (1:01:08) John’s criticisms of TDD and what he favors instead (1:05:30) Why John supports the use of comments and how to use them correctly (1:09:20) How John uses ChatGPT to help explain code in the Linux Kernel — The Pragmatic Engineer deepdives relevant for this episode: • Engineering Planning with RFCs, Design Documents and ADRs • Paying down tech debt • Software architect archetypes • Building Bluesky: a distributed social network — See the transcript and other references from the episode at ⁠⁠https://newsletter.pragmaticengineer.com/podcast⁠⁠ — Production and marketing by ⁠⁠⁠⁠⁠⁠⁠⁠https://penname.co/⁠⁠⁠⁠⁠⁠⁠⁠. For inquiries about sponsoring the podcast, email [email protected].

Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe

Build Bigger With Small Ai: Running Small Models Locally

2025-03-22 · Small Data SF 2024 Watch

video

by Jeffrey Morgan (Ollama)

AI/ML Cloud Computing Data Engineering Docker DuckDB LLM RAG SQL

It's finally possible to bring the awesome power of Large Language Models (LLMs) to your laptop. This talk will explore how to run and leverage small, openly available LLMs to power common tasks involving data, including selecting the right models, practical use cases for running small models, and best practices for deploying small models effectively alongside databases.

Bio: Jeffrey Morgan is the founder of Ollama, an open-source tool to get up and run large language models. Prior to founding Ollama, Jeffrey founded Kitematic, which was acquired by Docker and evolved into Docker Desktop. He has previously worked at companies including Docker, Twitter, and Google.

➡️ Follow Us LinkedIn: https://www.linkedin.com/company/small-data-sf/ X/Twitter : https://twitter.com/smalldatasf Website: https://www.smalldatasf.com/

Discover how to run large language models (LLMs) locally using Ollama, the easiest way to get started with small AI models on your Mac, Windows, or Linux machine. Unlike massive cloud-based systems, small open source models are only a few gigabytes, allowing them to run incredibly fast on consumer hardware without network latency. This video explains why these local LLMs are not just scaled-down versions of larger models but powerful tools for developers, offering significant advantages in speed, data privacy, and cost-effectiveness by eliminating hidden cloud provider fees and risks.

Learn the most common use case for small models: combining them with your existing factual data to prevent hallucinations. We dive into retrieval augmented generation (RAG), a powerful technique where you augment a model's prompt with information from a local data source. See a practical demo of how to build a vector store from simple text files and connect it to a model like Gemma 2B, enabling you to query your own data using natural language for fast, accurate, and context-aware responses.

Explore the next frontier of local AI with small agents and tool calling, a new feature that empowers models to interact with external tools. This guide demonstrates how an LLM can autonomously decide to query a DuckDB database, write the correct SQL, and use the retrieved data to answer your questions. This advanced tutorial shows you how to connect small models directly to your data engineering workflows, moving beyond simple chat to create intelligent, data-driven applications.

Get started with practical applications for small models today, from building internal help desks to streamlining engineering tasks like code review. This video highlights how small and large models can work together effectively and shows that open source models are rapidly catching up to their cloud-scale counterparts. It's never been a better time for developers and data analysts to harness the power of local AI.

How Linux is built with Greg Kroah-Hartman

2025-03-19 · The Pragmatic Engineer Listen

podcast_episode

by Gergely Orosz , Greg Kroah-Hartman (Linux Foundation)

CI/CD IBM LLM Marketing Rust SaaS Cyber Security

Supported by Our Partners • WorkOS — The modern identity platform for B2B SaaS. • Vanta — Automate compliance and simplify security with Vanta. — Linux is the most widespread operating system, globally – but how is it built? Few people are better to answer this than Greg Kroah-Hartman: a Linux kernel maintainer for 25 years, and one of the 3 Linux Kernel Foundation Fellows (the other two are Linus Torvalds and Shuah Khan). Greg manages the Linux kernel’s stable releases, and is a maintainer of multiple kernel subsystems. We cover the inner workings of Linux kernel development, exploring everything from how changes get implemented to why its community-driven approach produces such reliable software. Greg shares insights about the kernel's unique trust model and makes a case for why engineers should contribute to open-source projects. We go into: • How widespread is Linux? • What is the Linux kernel responsible for – and why is it a monolith? • How does a kernel change get merged? A walkthrough • The 9-week development cycle for the Linux kernel • Testing the Linux kernel • Why is Linux so widespread? • The career benefits of open-source contribution • And much more! — Timestamps (00:00) Intro (02:23) How widespread is Linux? (06:00) The difference in complexity in different devices powered by Linux (09:20) What is the Linux kernel? (14:00) Why trust is so important with the Linux kernel development (16:02) A walk-through of a kernel change (23:20) How Linux kernel development cycles work (29:55) The testing process at Kernel and Kernel CI (31:55) A case for the open source development process (35:44) Linux kernel branches: Stable vs. development (38:32) Challenges of maintaining older Linux code (40:30) How Linux handles bug fixes (44:40) The range of work Linux kernel engineers do (48:33) Greg’s review process and its parallels with Uber’s RFC process (51:48) Linux kernel within companies like IBM (53:52) Why Linux is so widespread (56:50) How Linux Kernel Institute runs without product managers (1:02:01) The pros and cons of using Rust in Linux kernel (1:09:55) How LLMs are utilized in bug fixes and coding in Linux (1:12:13) The value of contributing to the Linux kernel or any open-source project (1:16:40) Rapid fire round — The Pragmatic Engineer deepdives relevant for this episode: What TPMs do and what software engineers can learn from them The past and future of modern backend practices Backstage: an open-source developer portal — See the transcript and other references from the episode at ⁠⁠https://newsletter.pragmaticengineer.com/podcast⁠⁠ — Production and marketing by ⁠⁠⁠⁠⁠⁠⁠⁠https://penname.co/⁠⁠⁠⁠⁠⁠⁠⁠. For inquiries about sponsoring the podcast, email [email protected].

Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe

PostgreSQL Skills Development on Cloud: A Practical Guide to Database Management with AWS and Azure

2024-12-05 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Venkateswara Vadlamani

AWS Azure Cloud Computing Docker Redshift S3 data data-engineering postgresql relational-databases

This book provides a comprehensive approach to manage PostgreSQL cluster databases on Amazon Web Services and Azure Web Services on the cloud, as well as in Docker and container environments on a Red Hat operating system. Furthermore, detailed references for managing PostgreSQL on both Windows and Mac are provided. This book condenses all the fundamental and essential concepts you need to manage a PostgreSQL cluster into a one-stop guide that is perfect for newcomers to Postgres database administration. Each chapter of the book provides historical context and documents version changes of the PostgreSQL cluster, elucidates practical "how-to" methods, and includes illustrations and key word definitions, practices for application, a summary of key learnings, and questions to reinforce understanding. The book also outlines a clear study objective with a weekly learning schedule and hundreds of practice exercises, along with questions and answers. With its comprehensive and practical approach, this book will help you gain the confidence to manage all aspects of a PostgreSQL cluster in critical production environments so you can better support your organization's database infrastructure on the cloud and in containers. What You Will Learn Install and configure Postgres clusters on the cloud and in containers, monitor database logs, start and stop databases, troubleshoot, tune performance, backup and recover, and integrate with Amazon S3 and Azure Data Blob Manage Postgres databases on Amazon Web Services and Azure Web Services on the cloud, as well as in Docker and container environments on a Red Hat operating system Access sample references to scripting solutions and database management tools for working with Postgres, Redshift (based on Postgres 8.2), and Docker Create Amazon Machine Images (AMI) and Azure Images for managing a fleet of Postgres clusters on the cloud Reinforce knowledge with a weekly learning schedule and hundreds of practice exercises, along with questions and answers Progress from simple concepts, such as how to choose the correct instance type, to creating complex machine images Gain access to an Amazon AMI with a DBA admin tool, allowing you to learn Postgres, Redshift, and Docker in a cloud environment Refer to a comprehensive summary of documentations of Postgres, Amazon Web services, Azure Web services, and Red Hat Linux for managing all aspects of Postgres cluster management on the cloud Who This Book Is For Newcomers to PostgreSQL database administration and cross-platform support DBAs looking to master PostgreSQL on the cloud.

A Pragmatic Approach to Building Collaborative, Open Governance for Your Data and AI Ecosystem

2024-09-19 · Big Data LDN 2024

Face To Face

by Dan Wolfson, ACM DE , Mandy Chessell CBE FREng

AI/ML

Your world is filled with an ever-changing landscape of tools that create and use metadata. Each tool is useful, but independent, unable to share and link what it knows to information from other tools. The result is a disconnected story throughout your data and AI operations, making it hard to know where data came from, how it can and should be processed; leading to uncertainty in the trustworthiness of your AI results.

Using open source software from the Linux Foundation, (including Egeria, Open Lineage, Unity Catalog) we will share a simple approach to incrementally link and govern these tools to create end-to-end lineage, provenance and information sharing along your tool chains.

talk-data.com

Linux

Activity Trend

Top Events

Top Speakers

Du chaos à la clarté : comment les data contracts transforment la gouvernance en croissance

From Chaos to Clarity: How Data Contracts Turn Governance into a Growth Engine

Workshop 2: Data Products

Docling: Get your documents ready for gen AI

The Infrastructure Revolution for AI Factories w/ David Flynn

AIInfrastructure #Hyperscalers #DataEngineering #EnterpriseAI #SoftwareArchitecture #ExabyteStorage #ParallelFileSystems #LinuxNative #DataCenters #AIatScale #OpenPlatformInitiative #globaldata

Data Engineering for Cybersecurity

Running SQL Server on Linux

Dive into Flytekit's Internals: A Python SDK to Quickly Bring your Code Into Production

Reproducible Machine Learning Workflows for Scientists with pixi

Luca Borella — AI Program Manager, The Linux Foundation / FINOS

Open session to help folks convert their machines with support

Opt Green - End of Windows 10

The present, past and future of GitHub

50 Years of Microsoft and Developer Tools with Scott Guthrie

How Kubernetes is Built with Kat Cosgrove

The Philosophy of Software Design – with John Ousterhout

Build Bigger With Small Ai: Running Small Models Locally

How Linux is built with Greg Kroah-Hartman

PostgreSQL Skills Development on Cloud: A Practical Guide to Database Management with AWS and Azure

A Pragmatic Approach to Building Collaborative, Open Governance for Your Data and AI Ecosystem