Python

From Swift to Mojo and high-performance AI Engineering with Chris Lattner

2025-11-05 · The Pragmatic Engineer Listen

podcast_episode

by Chris Lattner , Gergely Orosz

AI/ML Analytics LLM Marketing Microsoft Rust TypeScript

Brought to You By: •⁠ Statsig ⁠ — ⁠ The unified platform for flags, analytics, experiments, and more. Companies like Graphite, Notion, and Brex rely on Statsig to measure the impact of the pace they ship. Get a 30-day enterprise trial here. •⁠ Linear – The system for modern product development. Linear is a heavy user of Swift: they just redesigned their native iOS app using their own take on Apple’s Liquid Glass design language. The new app is about speed and performance – just like Linear is. Check it out. — Chris Lattner is one of the most influential engineers of the past two decades. He created the LLVM compiler infrastructure and the Swift programming language – and Swift opened iOS development to a broader group of engineers. With Mojo, he’s now aiming to do the same for AI, by lowering the barrier to programming AI applications. I sat down with Chris in San Francisco, to talk language design, lessons on designing Swift and Mojo, and – of course! – compilers. It’s hard to find someone who is as enthusiastic and knowledgeable about compilers as Chris is! We also discussed why experts often resist change even when current tools slow them down, what he learned about AI and hardware from his time across both large and small engineering teams, and why compiler engineering remains one of the best ways to understand how software really works. — Timestamps (00:00) Intro (02:35) Compilers in the early 2000s (04:48) Why Chris built LLVM (08:24) GCC vs. LLVM (09:47) LLVM at Apple (19:25) How Chris got support to go open source at Apple (20:28) The story of Swift (24:32) The process for designing a language (31:00) Learnings from launching Swift (35:48) Swift Playgrounds: making coding accessible (40:23) What Swift solved and the technical debt it created (47:28) AI learnings from Google and Tesla (51:23) SiFive: learning about hardware engineering (52:24) Mojo’s origin story (57:15) Modular’s bet on a two-level stack (1:01:49) Compiler shortcomings (1:09:11) Getting started with Mojo (1:15:44) How big is Modular, as a company? (1:19:00) AI coding tools the Modular team uses (1:22:59) What kind of software engineers Modular hires (1:25:22) A programming language for LLMs? No thanks (1:29:06) Why you should study and understand compilers — The Pragmatic Engineer deepdives relevant for this episode: •⁠ AI Engineering in the real world • The AI Engineering stack • Uber's crazy YOLO app rewrite, from the front seat • Python, Go, Rust, TypeScript and AI with Armin Ronacher • Microsoft’s developer tools roots — Production and marketing by ⁠⁠⁠⁠⁠⁠⁠⁠https://penname.co/⁠⁠⁠⁠⁠⁠⁠⁠. For inquiries about sponsoring the podcast, email [email protected].

Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe

Keep it Simple and "Scalable": pythonic Extract, Load, Transform (ELT) using dltHub

2025-11-04 · Small Data SF 2025

workshop

by Elvis Kahoro (Chalk) , Brian Douglas (Continue) , Thierry Jean (dltHub)

AI/ML Data Quality ETL/ELT LLM

Get ready to ingest data and transform it into ready-to-use datasets using Python. We'll share a no-nonsense approach for developing and testing data connectors and transformations locally. Moving to production will be a matter of tweaking your configuration. In the end, you get a simple dataset interface to build dashboards & applications, train predictive models, or create agentic workflows. This workshop includes two guest speakers. Brian teach how to leverage AI IDEs, MCP servers and LLM scaffoldings to create ingestion pipelines. Elvis will show how to interactively define transformations and data quality checks.

Open Data Science Agent

2025-11-04 · Small Data SF 2025

workshop

by Zain Hasan (Together.AI)

AI/ML Data Science React

Learn to build an autonomous data science agent from scratch using open-source models and modern AI tools. This hands-on workshop will guide you through implementing a ReAct-based agent that can perform end-to-end data analysis tasks, from data cleaning to model training, using natural language reasoning and Python code generation. We'll explore the CodeAct framework, where the agent "thinks" through problems and then generates executable Python code as actions. You'll discover how to safely execute AI-generated code using Together Code Interpreter, creating a modular and maintainable system that can handle complex analytical workflows. Perfect for data scientists, ML engineers, and developers interested in agentic AI, this workshop combines practical implementation with best practices for building reasoning-driven AI assistants. By the end, you'll have a working data science agent and understand the fundamentals of agent architecture design. What you'll learn: ReAct framework implementation Safe code execution in AI systems Agent evaluation and optimization techniques Building transparent, "hackable" AI agents No advanced AI background required, just familiarity with Python and data science concepts.

Bridging the AI–Data Gap: Collect, Curate, Serve

2025-11-02 · Data Engineering Podcast Listen

podcast_episode

by Ido Bronstein (Upriver) , Omri Lifshitz (Upriver) , Tobias Macey

AI/ML Cloud Computing Data Engineering Data Management Data Quality Datafold dbt ETL/ELT LLM Prefect RAG SQL +1 more

Summary In this episode of the Data Engineering Podcast Omri Lifshitz (CTO) and Ido Bronstein (CEO) of Upriver talk about the growing gap between AI's demand for high-quality data and organizations' current data practices. They discuss why AI accelerates both the supply and demand sides of data, highlighting that the bottleneck lies in the "middle layer" of curation, semantics, and serving. Omri and Ido outline a three-part framework for making data usable by LLMs and agents: collect, curate, serve, and share challenges of scaling from POCs to production, including compounding error rates and reliability concerns. They also explore organizational shifts, patterns for managing context windows, pragmatic views on schema choices, and Upriver's approach to building autonomous data workflows using determinism and LLMs at the right boundaries. The conversation concludes with a look ahead to AI-first data platforms where engineers supervise business semantics while automation stitches technical details end-to-end.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Omri Lifshitz and Ido Bronstein about the challenges of keeping up with the demand for data when supporting AI systemsInterview IntroductionHow did you get involved in the area of data management?We're here to talk about "The Growing Gap Between Data & AI". From your perspective, what is this gap, and why do you think it's widening so rapidly right now?How does this gap relate to the founding story of Upriver? What problems were you and your co-founders experiencing that led you to build this?The core premise of new AI tools, from RAG pipelines to LLM agents, is that they are only as good as the data they're given. How does this "garbage in, garbage out" problem change when the "in" is not a static file but a complex, high-velocity, and constantly changing data pipeline?Upriver is described as an "intelligent agent system" and an "autonomous data engineer." This is a fascinating "AI to solve for AI" approach. Can you describe this agent-based architecture and how it specifically works to bridge that data-AI gap?Your website mentions a "Data Context Layer" that turns "tribal knowledge" into a "machine-usable mode." This sounds critical for AI. How do you capture that context, and how does it make data "AI-ready" in a way that a traditional data catalog or quality tool doesn't?What are the most innovative or unexpected ways you've seen companies trying to make their data "AI-ready"? And where are the biggest points of failure you observe?What has been the most challenging or unexpected lesson you've learned while building an AI system (Upriver) that is designed to fix the data foundation for other AI systems?When is an autonomous, agent-based approach not the right solution for a team's data quality problems? What organizational or technical maturity is required to even start closing this data-AI gap?What do you have planned for the future of Upriver? And looking more broadly, how do you see this gap between data and AI evolving over the next few years?Contact Info Ido - LinkedInOmri - LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links UpriverRAG == Retrieval Augmented GenerationAI Engineering Podcast EpisodeAI AgentContext WindowModel Finetuning)The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Emoji Master Challenge in Python

2025-11-01 · Python Emoji Master Challenge! [Ages 12-16] [EN/DE]

workshop

Hands-on Python workshop where students tackle emoji-based puzzles across levels, including Level 3 (display a rose emoji 10 times and practice keyboard shortcuts) and Level 5 (conceal a superhero with emojis and reveal at the end). Aimed at ages 12-18; provides an interactive Python introduction through emoji-based challenges.

Modeling cities with build123d library

2025-10-29 · PyData Trójmiasto #37

talk

by Marta Sienkiewicz (Hapag-Lloyd)

build123d openstreetmap

In her presentation she will show you a script in Python that combines some of her interests: urban planning and 3D modeling. You will learn how you can generate a 3D model of a city using Python, build123d library and OpenStreetMap data. If this is your first encounter with 3D modeling in Python, you may be surprised how simple the script actually is.

Beyond the Perimeter: Practical Patterns for Fine‑Grained Data Access

2025-10-27 · Data Engineering Podcast Listen

podcast_episode

by Matt Topper (UberEther) , Tobias Macey

AI/ML Cloud Computing Data Engineering Data Governance Data Management Data Quality Datafold dbt ETL/ELT JSON Prefect Cyber Security +2 more

Summary In this episode of the Data Engineering Podcast Matt Topper, president of UberEther, talks about the complex challenge of identity, credentials, and access control in modern data platforms. With the shift to composable ecosystems, integration burdens have exploded, fracturing governance and auditability across warehouses, lakes, files, vector stores, and streaming systems. Matt shares practical solutions, including propagating user identity via JWTs, externalizing policy with engines like OPA/Rego and Cedar, and using database proxies for native row/column security. He also explores catalog-driven governance, lineage-based label propagation, and OpenTDF for binding policies to data objects. The conversation covers machine-to-machine access, short-lived credentials, workload identity, and constraining access by interface choke points, as well as lessons from Zanzibar-style policy models and the human side of enforcement. Matt emphasizes the need for trust composition - unifying provenance, policy, and identity context - to answer questions about data access, usage, and intent across the entire data path.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Matt Topper about the challenges of managing identity and access controls in the context of data systemsInterview IntroductionHow did you get involved in the area of data management?The data ecosystem is a uniquely challenging space for creating and enforcing technical controls for identity and access control. What are the key considerations for designing a strategy for addressing those challenges?For data acess the off-the-shelf options are typically on either extreme of too coarse or too granular in their capabilities. What do you see as the major factors that contribute to that situation?Data governance policies are often used as the primary means of identifying what data can be accesssed by whom, but translating that into enforceable constraints is often left as a secondary exercise. How can we as an industry make that a more manageable and sustainable practice?How can the audit trails that are generated by data systems be used to inform the technical controls for identity and access?How can the foundational technologies of our data platforms be improved to make identity and authz a more composable primitive?How does the introduction of streaming/real-time data ingest and delivery complicate the challenges of security controls?What are the most interesting, innovative, or unexpected ways that you have seen data teams address ICAM?What are the most interesting, unexpected, or challenging lessons that you have learned while working on ICAM?What are the aspects of ICAM in data systems that you are paying close attention to?What are your predictions for the industry adoption or enforcement of those controls?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links UberEtherJWT == JSON Web TokenOPA == Open Policy AgentRegoPingIdentityOktaMicrosoft EntraSAML == Security Assertion Markup LanguageOAuthOIDC == OpenID ConnectIDP == Identity ProviderKubernetesIstioAmazon CEDAR policy languageAWS IAMPII == Personally Identifiable InformationCISO == Chief Information Security OfficerOpenTDFOpenFGAGoogle ZanzibarRisk Management FrameworkModel Context ProtocolGoogle Data ProjectTPM == Trusted Platform ModulePKI == Public Key InfrastructurePassskeysDuckLakePodcast EpisodeAccumuloJDBCOpenBaoHashicorp VaultLDAPThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Python Masterclass: Emoji Master Challenge

2025-10-25 · Python Emoji Master Challenge! [Ages 12-16] [EN/DE]

workshop

emoji

Hands-on Python workshop for ages 12-18 featuring the Emoji Master Challenge. Includes a Python introduction, Level 3 emoji tasks (display a rose emoji 10 times), Level 5 emoji-based superhero concealment, and a final reveal of students' emoji-powered superhero projects.

From Biotechnology to Bioinformatics Software - Sebastian Ayala Ruano

2025-10-24 · DataTalks.Club Listen

podcast_episode

by Data Talks Club (DataTalks.Club) , Sebastian Ayala Ruano (Multiomics Network Analytics Group, DTU Biosustain)

AI/ML Analytics Data Science GitHub LLM

In this talk, Sebastian, a bioinformatics researcher and software engineer, shares his inspiring journey from wet lab biotechnology to computational bioinformatics. Hosted by Data Talks Club, this session explores how data science, AI, and open-source tools are transforming modern biological research — from DNA sequencing to metagenomics and protein structure prediction.

You’ll learn about: - The difference between wet lab and dry lab workflows in biotechnology - How bioinformatics enables faster insights through data-driven modeling - The MCW2 Graph Project and its role in studying wastewater microbiomes - Using co-abundance networks and the CC Lasso algorithm to map microbial interactions - How AlphaFold revolutionized protein structure prediction - Building scientific knowledge graphs to integrate biological metadata - Open-source tools like VueGen and VueCore for automating reports and visualizations - The growing impact of AI and large language models (LLMs) in research and documentation - Key differences between R (BioConductor) and Python ecosystems for bioinformatics

This talk is ideal for data scientists, bioinformaticians, biotech researchers, and AI enthusiasts who want to understand how data science, AI, and biology intersect. Whether you work in genomics, computational biology, or scientific software, you’ll gain insights into real-world tools and workflows shaping the future of bioinformatics.

Links: - MicW2Graph: https://zenodo.org/records/12507444 - VueGen: https://github.com/Multiomics-Analytics-Group/vuegen - Awesome-Bioinformatics: https://github.com/danielecook/Awesome-Bioinformatics

TIMECODES00:00 Sebastian’s Journey into Bioinformatics06:02 From Wet Lab to Computational Biology08:23 Wet Lab vs Dry Lab Explained12:35 Bioinformatics as Data Science for Biology15:30 How DNA Sequencing Works19:29 MCW2 Graph and Wastewater Microbiomes23:10 Building Microbial Networks with CC Lasso26:54 Protein–Ligand Simulation Basics29:58 Predicting Protein Folding in 3D33:30 AlphaFold Revolution in Protein Prediction36:45 Inside the MCW2 Knowledge Graph39:54 VueGen: Automating Scientific Reports43:56 VueCore: Visualizing OMIX Data47:50 Using AI and LLMs in Bioinformatics50:25 R vs Python in Bioinformatics Tools53:17 Closing Thoughts from Ecuador Connect with Sebastian Twitter - https://twitter.com/sayalaruanoLinkedin - https://linkedin.com/in/sayalaruano Github - https://github.com/sayalaruanoWebsite - https://sayalaruano.github.io/ Connect with DataTalks.Club: Join the community - https://datatalks.club/slack.htmlSubscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQCheck other upcoming events - https://lu.ma/dtc-eventsGitHub: https://github.com/DataTalksClubLinkedIn - https://www.linkedin.com/company/datatalks-club/Twitter - https://twitter.com/DataTalksClub - Website - https://datatalks.club/

Outperform Spark with Python Notebooks in Fabric

2025-10-22 · Outperform Spark with Python Notebooks in Fabric

talk

by Christian Henrik Reich (twoday Data & AI)

Data Lakehouse Microsoft Fabric Spark

Session: When Microsoft Fabric was released, it came with Apache Spark out of the box. Spark’s ability to work with more programming languages opened up possibilities for creating data-driven and automated lakehouses. With Python Notebooks, we have a better tool for handling metadata, automation, and processing of more trivial workloads, while still having the option to use Spark Notebooks for handling more demanding processing. We will cover: The difference between Python Notebooks and a Single Node Spark cluster, and why Spark Notebooks are more costly and less performant with certain types of workloads. When to use Python Notebooks and when to use Spark Notebooks. Where to use Python Notebooks in a meta-driven Lakehouse. A brief introduction to tooling and move workload between Python Notebooks and Spark Notebooks. How to avoid overload the Lakehouse tech stack with python technologies. Costs

Building a "Brain" for a Digital City with Python and LLMs

2025-10-21 · PyLadies Dublin x PyData Ireland @ Workday

talk

by Sukanya Mandal (DCU)

Pandas llms rdflib

This talk presents a formal methodology for constructing a Multi-Modal Knowledge Graph for a smart city, addressing data privacy and heterogeneity by using entirely synthetic data. We demonstrate a Python pipeline that leverages Large Language Models for text generation and knowledge extraction, Pandas for sensor data simulation, and rdflib for graph construction. The result is a robust, privacy-preserving foundation for a Cognitive Digital Twin, enabling advanced urban analytics.

Smarter spreadsheet workflows with Celbridge & Python

2025-10-21 · PyLadies Dublin x PyData Ireland @ Workday

talk

by Chris Gregan

celbridge excel

This talk will introduce Celbridge, an open-source Python editing tool with a built-in Excel spreadsheet editor, plus lots of great productivity features. Celbridge makes spreadsheet automation easy; move data between sheets, run complex calculations, and validate your data using simple Python scripts in a friendly editor. The talk will cover how to download, install and get started with Celbridge so you can try it out yourself.

182: This Data Analyst Has Analyzed 1M+ Songs (here’s everything he knows)

2025-10-21 · Data Career Podcast: Helping You Land a Data Analyst Job FAST Listen

podcast_episode

by Avery Smith , Chris Reba

AI/ML Analytics Data Analytics SQL

Help us become the #1 Data Podcast by leaving a rating & review! We are 67 reviews away! Data meets music 🎶 — Avery sits down with Chris Reba, a data analyst who’s studied over 1 million songs, to reveal what the numbers say about how hits are made. From uncovering Billboard chart fraud to exploring how TikTok reshaped music, this episode breaks down the art and science behind every beat. 💌 Join 30k+ aspiring data analysts & get my tips in your inbox weekly 👉 https://www.datacareerjumpstart.com/newsletter 🆘 Feeling stuck in your data journey? Come to my next free "How to Land Your First Data Job" training 👉 https://www.datacareerjumpstart.com/training 👩‍💻 Want to land a data job in less than 90 days? 👉 https://www.datacareerjumpstart.com/daa 👔 Ace The Interview with Confidence 👉 https://www.datacareerjumpstart.com//interviewsimulator ⌚ TIMESTAMPS 00:00 - Intro: How Chris analyzed 1M+ songs using data 01:10 - What data reveals about hit songs and music trends 03:30 - Combining qualitative and quantitative analysis 07:00 - The 1970s Billboard chart fraud explained 10:45 - Why key changes disappeared from modern pop 13:30 - How hip-hop changed song structure and sound 14:10 - TikTok’s influence on the music industry 16:10 - Inside Chris’s open-source music dataset 22:10 - Best tools for music data analysis (SQL, Python, Datawrapper) 27:45 - Advice for aspiring music data analysts 🔗 CONNECT WITH CHRIS 📕 Order Chris's Book: https://www.bloomsbury.com/us/uncharted-territory-9798765149911 📊 Check out Chris's Music Dataset: https://docs.google.com/spreadsheets/d/1j1AUgtMnjpFTz54UdXgCKZ1i4bNxFjf01ImJ-BqBEt0/edit?gid=1974823090#gid=1974823090 💌 Subscribe to Chris's' Newsletter: https://www.cantgetmuchhigher.com 📲 Follow Chris on TikTok: https://www.tiktok.com/@cdallarivamusic 🔗 CONNECT WITH AVERY 🎥 YouTube Channel 🤝 LinkedIn 📸 Instagram 🎵 TikTok 💻 Website Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

Python Software Design Workshop

2025-10-20 · PyData Norwich - October Meetup (Workshop)

workshop

dependency injection refactoring software design patterns testing

Date: 2025-10-20. As your Python programs grow and become more complex, they can quickly become hard to maintain, extend and reason about. Software design patterns help keep this complexity under control, making your code more understable and facilitating the scaling to larger applications. In this workshop, we will take some bad Python code and refactor it step by step. You will learn to spot primitive obsession, replace conditionals with the Strategy pattern, and use Dependency Injection to write testable code.

The True Costs of Legacy Systems: Technical Debt, Risk, and Exit Strategies

2025-10-18 · Data Engineering Podcast Listen

podcast_episode

by Kate Shaw (SnapLogic) , Tobias Macey

AI/ML Analytics Cloud Computing Data Engineering Data Management Data Quality Datafold ETL/ELT Prefect Cyber Security Data Streaming

Summary In this episode Kate Shaw, Senior Product Manager for Data and SLIM at SnapLogic, talks about the hidden and compounding costs of maintaining legacy systems—and practical strategies for modernization. She unpacks how “legacy” is less about age and more about when a system becomes a risk: blocking innovation, consuming excess IT time, and creating opportunity costs. Kate explores technical debt, vendor lock-in, lost context from employee turnover, and the slippery notion of “if it ain’t broke,” especially when data correctness and lineage are unclear. Shee digs into governance, observability, and data quality as foundations for trustworthy analytics and AI, and why exit strategies for system retirement should be planned from day one. The discussion covers composable architectures to avoid monoliths and big-bang migrations, how to bridge valuable systems into AI initiatives without lock-in, and why clear success criteria matter for AI projects. Kate shares lessons from the field on discovery, documentation gaps, parallel run strategies, and using integration as the connective tissue to unlock data for modern, cloud-native and AI-enabled use cases. She closes with guidance on planning migrations, defining measurable outcomes, ensuring lineage and compliance, and building for swap-ability so teams can evolve systems incrementally instead of living with a “bowl of spaghetti.”

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Kate Shaw about the true costs of maintaining legacy systemsInterview IntroductionHow did you get involved in the area of data management?What are your crtieria for when a given system or service transitions to being "legacy"?In order for any service to survive long enough to become "legacy" it must be serving its purpose and providing value. What are the common factors that prompt teams to deprecate or migrate systems?What are the sources of monetary cost related to maintaining legacy systems while they remain operational?Beyond monetary cost, economics also have a concept of "opportunity cost". What are some of the ways that manifests in data teams who are maintaining or migrating from legacy systems?How does that loss of productivity impact the broader organization?How does the process of migration contribute to issues around data accuracy, reliability, etc. as well as contributing to potential compromises of security and compliance?Once a system has been replaced, it needs to be retired. What are some of the costs associated with removing a system from service?What are the most interesting, innovative, or unexpected ways that you have seen teams address the costs of legacy systems and their retirement?What are the most interesting, unexpected, or challenging lessons that you have learned while working on legacy systems migration?When is deprecation/migration the wrong choice?How have evolutionary architecture patterns helped to mitigate the costs of system retirement?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links SnapLogicSLIM == SnapLogic Intelligent ModernizerOpportunity CostSunk Cost FallacyData GovernanceEvolutionary ArchitectureThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Emoji Master Challenge in Python Masterclass

2025-10-18 · Python Emoji Master Challenge! [Ages 12-16] [EN/DE]

workshop

emojis

Hands-on Python masterclass featuring the Emoji Master Challenge for ages 12-18. The session covers an emoji-based coding activity with levels (displaying emoji, concealing a superhero), introduction to Python, and a final reveal of the superheroes created by students. Date: 2025-10-18.

Forest Modeling and Ecosystem Services Evaluation Through a Data-Driven Python Workflow

2025-10-16 · [Online] Analyzing Geospatial Forest Data with Geopandas and Rasterio

webinar

by Laya Zeinali (Tarbiat Modares University)

Pandas gedi geopandas random forest rasterio sentinel-1 sentinel-2 shap srtm

Exploring how to model forest structure and quantify ecosystem services using Python and Earth observation data (GEDI, Sentinel-1, Sentinel-2, SRTM) with data preprocessing, machine learning (Random Forest), and SHAP interpretation to understand variable importance. Estimating canopy height and aboveground biomass and mapping forest ecosystem services for monitoring and climate research. Case study combining GEDI, Sentinel, and SRTM data.

Starship Enterprise: Functional Programming at Huge Companies

2025-10-15 · Scala Talks: Tour of error handling & Functional Programming at Huge Companies

talk

Java Scala

Practicing functional programming inside a Fortune 100 enterprise can feel like flying the Starship Enterprise through asteroid fields of legacy code and bureaucracy. This talk shares hard-earned lessons from the Information Engineering team at JPMorganChase, which runs a production Scala codebase powering a novel metadata platform. We'll explore the political, cultural, and technical friction of pushing functional programming in a Java and Python dominated environment. We'll introduce the domain we work in, the techniques that have worked (and those that haven't), the compromises we've made, and why - despite it all - we still think it's worth it. If you've never tried to run cats-effect in a place where Spring Boot is king, add this talk to your battle log.

Kill GIL: How Python 3.14 Changes Concurrent Programming

2025-10-15 · PyData Prague #31 - PyData Prague meets TechMeetup Ostrava

talk

cpython gil

Python 3.14 introduces the long-awaited ability to disable the Global Interpreter Lock (GIL). Although this feature is still experimental, it has the potential to fundamentally reshape concurrent programming in CPython. This talk will explore the implications of GIL removal, focusing on how it enhances parallelism, the performance improvements for multi-threaded applications, and the challenges developers may encounter as they adapt to this new paradigm.

Unleash the power of dbt on Google Cloud: BigQuery, Iceberg, DataFrames and beyond

2025-10-15 · dbt Coalesce 2025 Watch

talk

by Jobin George (Google Cloud) , Sandeep Karmarkar (Google Cloud)

AI/ML BigQuery Cloud Computing Data Science dbt GCP Iceberg

The data world has long been divided, with data engineers and data scientists working in silos. This fragmentation creates a long, difficult journey from raw data to machine learning models. We've unified these worlds through the Google Cloud and dbt partnership. In this session, we'll show you an end-to-end workflow that simplifies data to AI journey. The availability of dbt Cloud on Google Cloud Marketplace streamlines getting started, and its integration with BigQuery's new Apache Iceberg tables creates an open foundation. We'll also highlight how BigQuery DataFrames' integration with dbt Python models lets you perform complex data science at scale, all within a single, streamlined process. Join us to learn how to build a unified data and AI platform with dbt on Google Cloud.

talk-data.com

Activity Trend

Top Events

Top Speakers

From Swift to Mojo and high-performance AI Engineering with Chris Lattner

Keep it Simple and "Scalable": pythonic Extract, Load, Transform (ELT) using dltHub

Open Data Science Agent

Bridging the AI–Data Gap: Collect, Curate, Serve

Emoji Master Challenge in Python

Modeling cities with build123d library

Beyond the Perimeter: Practical Patterns for Fine‑Grained Data Access

Python Masterclass: Emoji Master Challenge

From Biotechnology to Bioinformatics Software - Sebastian Ayala Ruano

Outperform Spark with Python Notebooks in Fabric

Building a "Brain" for a Digital City with Python and LLMs

Smarter spreadsheet workflows with Celbridge & Python

182: This Data Analyst Has Analyzed 1M+ Songs (here’s everything he knows)

Python Software Design Workshop

The True Costs of Legacy Systems: Technical Debt, Risk, and Exit Strategies

Emoji Master Challenge in Python Masterclass

Forest Modeling and Ecosystem Services Evaluation Through a Data-Driven Python Workflow

Starship Enterprise: Functional Programming at Huge Companies

Kill GIL: How Python 3.14 Changes Concurrent Programming

Unleash the power of dbt on Google Cloud: BigQuery, Iceberg, DataFrames and beyond