Title & Speakers	Event
Data Engineering with Azure Databricks 2026-04-10 Xenia Ireton – author , Tonya Chernyshova – author , Dmitry Foshin – author , Dmitry Anoshin – author Master end-to-end data engineering on Azure Databricks. From data ingestion and Delta Lake to CI/CD and real-time streaming, build secure, scalable, and performant data solutions with Spark, Unity Catalog, and ML tools. Key Features Build scalable data pipelines using Apache Spark and Delta Lake Automate workflows and manage data governance with Unity Catalog Learn real-time processing and structured streaming with practical use cases Implement CI/CD, DevOps, and security for production-ready data solutions Explore Databricks-native ML, AutoML, and Generative AI integration Book Description "Data Engineering with Azure Databricks" is your essential guide to building scalable, secure, and high-performing data pipelines using the powerful Databricks platform on Azure. Designed for data engineers, architects, and developers, this book demystifies the complexities of Spark-based workloads, Delta Lake, Unity Catalog, and real-time data processing. Beginning with the foundational role of Azure Databricks in modern data engineering, you’ll explore how to set up robust environments, manage data ingestion with Auto Loader, optimize Spark performance, and orchestrate complex workflows using tools like Azure Data Factory and Airflow. The book offers deep dives into structured streaming, Delta Live Tables, and Delta Lake’s ACID features for data reliability and schema evolution. You’ll also learn how to manage security, compliance, and access controls using Unity Catalog, and gain insights into managing CI/CD pipelines with Azure DevOps and Terraform. With a special focus on machine learning and generative AI, the final chapters guide you in automating model workflows, leveraging MLflow, and fine-tuning large language models on Databricks. Whether you're building a modern data lakehouse or operationalizing analytics at scale, this book provides the tools and insights you need. What you will learn Set up a full-featured Azure Databricks environment Implement batch and streaming ingestion using Auto Loader Optimize Spark jobs with partitioning and caching Build real-time pipelines with structured streaming and DLT Manage data governance using Unity Catalog Orchestrate production workflows with jobs and ADF Apply CI/CD best practices with Azure DevOps and Git Secure data with RBAC, encryption, and compliance standards Use MLflow and Feature Store for ML pipelines Build generative AI applications in Databricks Who this book is for This book is for data engineers, solution architects, cloud professionals, and software engineers seeking to build robust and scalable data pipelines using Azure Databricks. Whether you're migrating legacy systems, implementing a modern lakehouse architecture, or optimizing data workflows for performance, this guide will help you leverage the full power of Databricks on Azure. A basic understanding of Python, Spark, and cloud infrastructure is recommended. data data-engineering apache-spark AI/ML Airflow Analytics Azure ADF Azure DevOps CI/CD Cloud Computing Data Engineering Data Governance Data Lakehouse Databricks Delta DevOps GenAI Git Python Cyber Security Spark Data Streaming Terraform	O'Reilly Data Engineering Books
Building LLM applications with Python 2026-01-05 · 18:00 Overview Students, developers, and anyone interested in getting started with theory and practice on building LLM-based applications with Python. Who is this for? Undeniably, large language models (LLMs) are at the centre of a modern gold-rush in technology. Students, developers, and anyone interested in getting started with theory and practice on building LLM-based applications with Python. Who is leading the session? The session is led by Dr. Stelios Sotiriadis, CEO of Warestack, Associate Professor and MSc Programme Director at Birkbeck, University of London. His expertise includes cloud computing, distributed systems, and AI engineering. Stelios holds a PhD from the University of Derby, completed a postdoctoral fellowship at the University of Toronto, and has worked with Huawei, IBM, Autodesk, and several startups. Since 2018 he has taught at Birkbeck and, in 2021, founded Warestack, building software for startups globally. What we’ll cover A practical introduction on the basics of local models and cloud APIs to build real software systems. You will learn: Introduction to natural language processing LLMs theory and intuition Agents are and how to build them Running local models with Ollama (free and offline) Calling local models using Python Building a ChatGPT-like chatbot with Python libraries Requirements A laptop with Python (Windows, macOS, or Linux) Visual Studio Code installed Python pip installed At least 10 GB free disk space At least 8 GB RAM This space is needed for running local models. You may also use the lab computers if your device doesn’t meet the requirements. Format A 1.5-hours live session including: Interactive theory Hands-on coding Step-by-step exercises The session will run in person, with streaming available for remote attendees. Prerequisites You should be comfortable writing Python scripts (basic to intermediate level).	Building LLM applications with Python
Hands-On LLM Engineering with Python (Part 1) 2025-12-18 · 18:00 REGISTER BELOW FOR MORE AVAILABLE DATES! ↓↓↓↓↓ https://luma.com/stelios ----------------------------------------------------------------------------------- Who is this for? Students, developers, and anyone interested in using Large Language Models (LLMs) to build real software solutions with Python. Tired of vibe coding with AI tools? Want to actually understand and own your code, instead of relying on black-box magic? This session shows you how to build LLM systems properly, with full control and clear engineering principles. Who is leading the session? The session is led by Dr. Stelios Sotiriadis, CEO of Warestack, Associate Professor and MSc Programme Director at Birkbeck, University of London, specialising in cloud computing, distributed systems, and AI engineering. Stelios holds a PhD from the University of Derby, completed a postdoctoral fellowship at the University of Toronto, and has worked on industry and research projects with Huawei, IBM, Autodesk, and multiple startups. Since moving to London in 2018, he has been teaching at Birkbeck. In 2021, he founded Warestack, building software for startups around the world. What we’ll cover? A hands-on introduction to building software with LLMs using Python, Ollama, and LiteLLM, including: How LLMs, embeddings, and agents work. Calling local models with Ollama or cloud models (OpenAI, Gemini and more). Using LiteLLM for custom prompts and tool-calling. Building simple agents from scratch. Introduction to RAG (Retrieval-Augmented Generation). Working with vector databases (ChromaDB) and vector similarity search library (FAISS). Storing, searching, and retrieving embeddings. Introduction to Streamlit for interactive data apps. End-to-end examples you can run on your own machine. This session focuses on theory, fundamentals and real code you can re-use. Why LiteLLM? LiteLLM gives you low-level control to build custom LLM solutions your own way, without a heavy framework like LangChain, so you understand how everything works and design your own architecture. A dedicated LangChain session will follow for those who want to go further. What are the requirements? Bring a laptop with Python installed (Windows, macOS, or Linux), along with Visual Studio Code or a similar IDE, with at least 10GB of free disk space and 8GB of RAM*. This space is needed for running local models during the workshop.* If you don’t have a suitable laptop, please contact Stelios ([email protected]) before registering. What is the format? A 3-hour live session with: Interactive theory blocks Hands-on coding Step-by-step exercises Small group support Three 10-minute breaks Q&A and class quizzes This is a highly practical, hands-on class focused on code and building working LLM systems. What are the prerequisites? A good understanding of programming with Python is required (basic to intermediate level). I assume you are already comfortable writing Python scripts. What comes after? Participants will receive an optional mini capstone project with one-to-one personalised feedback. Is it just one session? This is the first session in a new sequence on applied AI, covering agents, RAG systems, vector databases, and production-ready LLM workflows. Later sessions will dive deeper into topics such as embeddings with deep neural networks, LangChain, advanced retrieval, and multi-agent architectures. You can decide afterwards whether you’d like to join future sessions. How many participants? To keep this interactive, only 15 spots are available. Please register as soon as possible.	Hands-On LLM Engineering with Python (Part 1)
A Practical Starter's Guide to building LLM based projects \| Marcin S. \| DSC DACH 25 2025-12-10 · 15:28 In his tech tutorial, Marcin showed how to go beyond creating prompts for ChatGPT and build full applications leveraging generative AI. He covered the fundamentals of large language models (LLMs), introduced LangChain, and demonstrated techniques like question answering over documents and creating reasoning agents. The session also addressed advanced methods and practical challenges of deploying LLMs in production. By the end, participants with Python experience gained hands-on knowledge to develop GPT-driven applications while understanding potential pitfalls and limitations. This tutorial by Marcin Szymaniuk was held on October 14th at DSC DACH 25 in Vienna. Follow us on social media : LinkedIn: https://www.linkedin.com/company/11184830/admin/ Instagram: https://www.instagram.com/datasciconf/ Facebook page: https://www.facebook.com/DataSciConference Website: https://datasciconference.com/	DSC DACH 25 YouTube
Uncertainty-Guided AI Red Teaming: Efficient Vulnerability Discovery in LLMs 2025-12-10 · 14:45 Zvi Topol AI red teaming is crucial for identifying security and safety vulnerabilities (e.g., jailbreaks, prompt injection, harmful content generation) of Large Language Models. However, manual and brute-force adversarial testing is resource-intensive and often inefficiently consumes time and compute resources exploring low-risk regions of the input space. This talk introduces a practical, Python-based methodology for accelerating red teaming using model uncertainty quantification (UQ). AI/ML LLM Python Cyber Security	PyData Boston 2025
State, Scale, and Signals: Rethinking Orchestration with Durable Execution 2025-11-16 · 23:19 Preeti Somal – EVP of Engineering @ Temporal , Tobias Macey – host Summary In this episode Preeti Somal, EVP of Engineering at Temporal, talks about the durable execution model and how it reshapes the way teams build reliable, stateful systems for data and AI. She explores Temporal’s code‑first programming model—workflows, activities, task queues, and replay—and how it eliminates hand‑rolled retry, checkpoint, and error‑handling scaffolding while letting data remain where it lives. Preeti shares real-world patterns for replacing DAG-first orchestration, integrating application and data teams through signals and Nexus for cross-boundary calls, and using Temporal to coordinate long-running, human-in-the-loop, and agentic AI workflows with full observability and auditability. Shee also discusses heuristics for choosing Temporal alongside (or instead of) traditional orchestrators, managing scale without moving large datasets, and lessons from running durable execution as a cloud service. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Preeti Somal about how to incorporate durable execution and state management into AI application architectures Interview IntroductionHow did you get involved in the area of data management?Can you describe what durable execution is and how it impacts system architecture?With the strong focus on state maintenance and high reliability, what are some of the most impactful ways that data teams are incorporating tools like Temporal into their work?One of the core primitives in Temporal is a "workflow". How does that compare to similar primitives in common data orchestration systems such as Airflow, Dagster, Prefect, etc.? What are the heuristics that you recommend when deciding which tool to use for a given task, particularly in data/pipeline oriented projects? Even if a team is using a more data-focused orchestration engine, what are some of the ways that Temporal can be applied to handle the processing logic of the actual data?AI applications are also very dependent on reliable data to be effective in production contexts. What are some of the design patterns where durable execution can be integrated into RAG/agent applications?What are some of the conceptual hurdles that teams experience when they are starting to adopt Temporal or other durable execution frameworks?What are the most interesting, innovative, or unexpected ways that you have seen Temporal/durable execution used for data/AI services?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Temporal?When is Temporal/durable execution the wrong choice?What do you have planned for the future of Temporal for data and AI systems? Contact Info LinkedIn Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story. Links TemporalDurable ExecutionFlinkMachine Learning EpochSpark StreamingAirflowDirected Acyclic Graph (DAG)Temporal NexusTensorZeroAI Engineering Podcast Episode The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA AI/ML Airflow Cloud Computing Dagster Data Engineering Data Management Data Quality Datafold dbt ETL/ELT Prefect Python RAG SQL Data Streaming	Data Engineering Podcast Listen
From Biotechnology to Bioinformatics Software - Sebastian Ayala Ruano 2025-10-24 · 17:00 Data Talks Club – host @ DataTalks.Club , Sebastian Ayala Ruano – bioinformatics researcher and software engineer In this talk, Sebastian, a bioinformatics researcher and software engineer, shares his inspiring journey from wet lab biotechnology to computational bioinformatics. Hosted by Data Talks Club, this session explores how data science, AI, and open-source tools are transforming modern biological research — from DNA sequencing to metagenomics and protein structure prediction. You’ll learn about: - The difference between wet lab and dry lab workflows in biotechnology - How bioinformatics enables faster insights through data-driven modeling - The MCW2 Graph Project and its role in studying wastewater microbiomes - Using co-abundance networks and the CC Lasso algorithm to map microbial interactions - How AlphaFold revolutionized protein structure prediction - Building scientific knowledge graphs to integrate biological metadata - Open-source tools like VueGen and VueCore for automating reports and visualizations - The growing impact of AI and large language models (LLMs) in research and documentation - Key differences between R (BioConductor) and Python ecosystems for bioinformatics This talk is ideal for data scientists, bioinformaticians, biotech researchers, and AI enthusiasts who want to understand how data science, AI, and biology intersect. Whether you work in genomics, computational biology, or scientific software, you’ll gain insights into real-world tools and workflows shaping the future of bioinformatics. Links: - MicW2Graph: https://zenodo.org/records/12507444 - VueGen: https://github.com/Multiomics-Analytics-Group/vuegen - Awesome-Bioinformatics: https://github.com/danielecook/Awesome-Bioinformatics TIMECODES00:00 Sebastian’s Journey into Bioinformatics06:02 From Wet Lab to Computational Biology08:23 Wet Lab vs Dry Lab Explained12:35 Bioinformatics as Data Science for Biology15:30 How DNA Sequencing Works19:29 MCW2 Graph and Wastewater Microbiomes23:10 Building Microbial Networks with CC Lasso26:54 Protein–Ligand Simulation Basics29:58 Predicting Protein Folding in 3D33:30 AlphaFold Revolution in Protein Prediction36:45 Inside the MCW2 Knowledge Graph39:54 VueGen: Automating Scientific Reports43:56 VueCore: Visualizing OMIX Data47:50 Using AI and LLMs in Bioinformatics50:25 R vs Python in Bioinformatics Tools53:17 Closing Thoughts from Ecuador Connect with Sebastian Twitter - https://twitter.com/sayalaruanoLinkedin - https://linkedin.com/in/sayalaruano Github - https://github.com/sayalaruanoWebsite - https://sayalaruano.github.io/ Connect with DataTalks.Club: Join the community - https://datatalks.club/slack.htmlSubscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQCheck other upcoming events - https://lu.ma/dtc-eventsGitHub: https://github.com/DataTalksClubLinkedIn - https://www.linkedin.com/company/datatalks-club/Twitter - https://twitter.com/DataTalksClub - Website - https://datatalks.club/ AI/ML Analytics Data Science GitHub LLM Python	DataTalks.Club Listen
Python + AI: Large Language Models 2025-10-07 · 17:00 Join us for the first session in our Python + AI series! In this session, we'll talk about Large Language Models (LLMs), the models that power ChatGPT and GitHub Copilot. We'll use Python to interact with LLMs using popular packages like the OpenAI SDK and Langchain. We'll experiment with prompt engineering and few-shot examples to improve our outputs. We'll also show how to build a full stack app powered by LLMs, and explain the importance of concurrency and streaming for user-facing AI apps. This session is a part of a series! To learn more, click here Pre-requisites: If you'd like to follow along with the live examples, make sure you've got a GitHub account. Habla español? Tendremos una serie para hispanohablantes!	Python + AI: Large Language Models
Python + AI: Large Language Models 2025-10-07 · 17:00 Join us for the first session in our Python + AI series! In this session, we'll talk about Large Language Models (LLMs), the models that power ChatGPT and GitHub Copilot. We'll use Python to interact with LLMs using popular packages like the OpenAI SDK and Langchain. We'll experiment with prompt engineering and few-shot examples to improve our outputs. We'll also show how to build a full stack app powered by LLMs, and explain the importance of concurrency and streaming for user-facing AI apps. This session is a part of a series! To learn more, click here Pre-requisites: If you'd like to follow along with the live examples, make sure you've got a GitHub account. Habla español? Tendremos una serie para hispanohablantes!	Python + AI: Large Language Models
Python + AI: Large Language Models 2025-10-07 · 17:00 Join us for the first session in our Python + AI series! In this session, we'll talk about Large Language Models (LLMs), the models that power ChatGPT and GitHub Copilot. We'll use Python to interact with LLMs using popular packages like the OpenAI SDK and Langchain. We'll experiment with prompt engineering and few-shot examples to improve our outputs. We'll also show how to build a full stack app powered by LLMs, and explain the importance of concurrency and streaming for user-facing AI apps. This session is a part of a series! To learn more, click here Pre-requisites: If you'd like to follow along with the live examples, make sure you've got a GitHub account. Habla español? Tendremos una serie para hispanohablantes!	Python + AI: Large Language Models
Hands-On Webinar "Building and Deploying your First Agent with Tools on ADK" 2025-10-01 · 16:00 Pre-registration is REQUIRED. Add to your calendar - https://hubs.li/Q03HQmw30 Speaker: Amit Maraj PhD, Senior AI Developer Relations Engineer at Google * In the rapidly evolving world of generative AI, standalone large language models (LLMs) are powerful, but their true potential is unlocked when they can interact with the outside world. This is where agents and tools come in. An agent acts as an intelligent orchestrator, leveraging tools to perform goal-oriented operations that go beyond simple text generation—like looking up real-time data, interacting with APIs, or managing files. Join us for this 1-hour webinar where you'll learn how to build and deploy your very first AI agent with tools using the Agent Development Kit (ADK), an open-source, code-first Python toolkit from Google. We will demystify the core concepts of agents and tools, and guide you through a practical, step-by-step process to create a functional agent that can access and use external data. Who Should Attend: This webinar is for developers, data scientists, and anyone interested in moving from simple AI prototypes to building intelligent, autonomous applications. A basic understanding of Python is recommended. ODSC Links:** • Get free access to more talks/trainings like this at Ai+ Training platform: https://hubs.li/H0Zycsf0 • ODSC blog: https://opendatascience.com/ • Facebook: https://www.facebook.com/OPENDATASCI • Twitter: https://twitter.com/_ODSC & @odsc • LinkedIn: https://www.linkedin.com/company/open-data-science • Slack Channel: https://hubs.li/Q038cQBy0 • Code of conduct: https://odsc.com/code-of-conduct/	Hands-On Webinar "Building and Deploying your First Agent with Tools on ADK"
Hands-On Webinar "Building and Deploying your First Agent with Tools on ADK" 2025-10-01 · 16:00 Pre-registration is REQUIRED. Add to your calendar - https://hubs.li/Q03HQmw30 Speaker: Amit Maraj PhD, Senior AI Developer Relations Engineer at Google * In the rapidly evolving world of generative AI, standalone large language models (LLMs) are powerful, but their true potential is unlocked when they can interact with the outside world. This is where agents and tools come in. An agent acts as an intelligent orchestrator, leveraging tools to perform goal-oriented operations that go beyond simple text generation—like looking up real-time data, interacting with APIs, or managing files. Join us for this 1-hour webinar where you'll learn how to build and deploy your very first AI agent with tools using the Agent Development Kit (ADK), an open-source, code-first Python toolkit from Google. We will demystify the core concepts of agents and tools, and guide you through a practical, step-by-step process to create a functional agent that can access and use external data. Who Should Attend: This webinar is for developers, data scientists, and anyone interested in moving from simple AI prototypes to building intelligent, autonomous applications. A basic understanding of Python is recommended. ODSC Links:** • Get free access to more talks/trainings like this at Ai+ Training platform: https://hubs.li/H0Zycsf0 • ODSC blog: https://opendatascience.com/ • Facebook: https://www.facebook.com/OPENDATASCI • Twitter: https://twitter.com/_ODSC & @odsc • LinkedIn: https://www.linkedin.com/company/open-data-science • Slack Channel: https://hubs.li/Q038cQBy0 • Code of conduct: https://odsc.com/code-of-conduct/	Hands-On Webinar "Building and Deploying your First Agent with Tools on ADK"
Hands-On Webinar "Building and Deploying your First Agent with Tools on ADK" 2025-10-01 · 16:00 Pre-registration is REQUIRED. Add to your calendar - https://hubs.li/Q03HQmw30 Speaker: Amit Maraj PhD, Senior AI Developer Relations Engineer at Google * In the rapidly evolving world of generative AI, standalone large language models (LLMs) are powerful, but their true potential is unlocked when they can interact with the outside world. This is where agents and tools come in. An agent acts as an intelligent orchestrator, leveraging tools to perform goal-oriented operations that go beyond simple text generation—like looking up real-time data, interacting with APIs, or managing files. Join us for this 1-hour webinar where you'll learn how to build and deploy your very first AI agent with tools using the Agent Development Kit (ADK), an open-source, code-first Python toolkit from Google. We will demystify the core concepts of agents and tools, and guide you through a practical, step-by-step process to create a functional agent that can access and use external data. Who Should Attend: This webinar is for developers, data scientists, and anyone interested in moving from simple AI prototypes to building intelligent, autonomous applications. A basic understanding of Python is recommended. ODSC Links:** • Get free access to more talks/trainings like this at Ai+ Training platform: https://hubs.li/H0Zycsf0 • ODSC blog: https://opendatascience.com/ • Facebook: https://www.facebook.com/OPENDATASCI • Twitter: https://twitter.com/_ODSC & @odsc • LinkedIn: https://www.linkedin.com/company/open-data-science • Slack Channel: https://hubs.li/Q038cQBy0 • Code of conduct: https://odsc.com/code-of-conduct/	Hands-On Webinar "Building and Deploying your First Agent with Tools on ADK"
Webinar "Building and Deploying your First Agent with Tools on ADK" 2025-10-01 · 16:00 Pre-registration is REQUIRED. Add to your calendar - https://hubs.li/Q03HQmw30 Speaker: Amit Maraj PhD, Senior AI Developer Relations Engineer at Google * In the rapidly evolving world of generative AI, standalone large language models (LLMs) are powerful, but their true potential is unlocked when they can interact with the outside world. This is where agents and tools come in. An agent acts as an intelligent orchestrator, leveraging tools to perform goal-oriented operations that go beyond simple text generation—like looking up real-time data, interacting with APIs, or managing files. Join us for this 1-hour webinar where you'll learn how to build and deploy your very first AI agent with tools using the Agent Development Kit (ADK), an open-source, code-first Python toolkit from Google. We will demystify the core concepts of agents and tools, and guide you through a practical, step-by-step process to create a functional agent that can access and use external data. Who Should Attend: This webinar is for developers, data scientists, and anyone interested in moving from simple AI prototypes to building intelligent, autonomous applications. A basic understanding of Python is recommended. ODSC Links:** • Get free access to more talks/trainings like this at Ai+ Training platform: https://hubs.li/H0Zycsf0 • ODSC blog: https://opendatascience.com/ • Facebook: https://www.facebook.com/OPENDATASCI • Twitter: https://twitter.com/_ODSC & @odsc • LinkedIn: https://www.linkedin.com/company/open-data-science • Slack Channel: https://hubs.li/Q038cQBy0 • Code of conduct: https://odsc.com/code-of-conduct/	Webinar "Building and Deploying your First Agent with Tools on ADK"
ActiveTigger: A Collaborative Text Annotation Research Tool for Computational Social Sciences 2025-09-30 · 13:15 Emilien SCHULTZ , Paul Girard , Julien Boelaert The exponential growth of textual data—ranging from social media posts and digital news archives to speech-to-text transcripts—has opened new frontiers for research in the social sciences. Tasks such as stance detection, topic classification, and information extraction have become increasingly common. At the same time, the rapid evolution of Natural Language Processing, especially pretrained language models and generative AI, has largely been led by the computer science community, often leaving a gap in accessibility for social scientists. To address this, we initiated since 2023 the development of ActiveTigger, a lightweight, open-source Python application (with a web frontend in React) designed to accelerate annotation process and manage large-scale datasets through the integration of fine-tuned models. It aims to support computational social science for a large public both within and outside social sciences. Already used by a dynamic community in social sciences, the stable version is planned for early June 2025. From a more technical prospect, the API is designed to manage the complete workflow from project creation, embeddings computation, exploration of the text corpus, human annotation with active learning, fine-tuning of pre-trained models (BERT-like), prediction on a larger corpus, and export. It also integrates LLM-as-a-service capabilities for prompt-based annotation and information extraction, offering a flexible approach for hybrid manual/automatic labeling. Accessible both with a web frontend and a Python client, ActiveTigger encourages customization and adaptation to specific research contexts and practices. In this talk, we will delve into the motivations behind the creation of ActiveTigger, outline its technical architecture, and walk through its core functionalities. Drawing on several ongoing research projects within the Computational Social Science (CSS) group at CREST, we will illustrate concrete use cases where ActiveTigger has accelerated data annotation, enabled scalable workflows, and fostered collaborations. Beyond the technical demonstration, the talk will also open a broader reflection on the challenges and opportunities brought by generative AI in academic research—especially in terms of reliability, transparency, and methodological adaptation for qualitative and quantitative inquiries. The repository of the project : https://github.com/emilienschultz/activetigger/ The development of this software is funded by the DRARI Ile-de-France and supported by Progédo. AI/ML API Computer Science GenAI GitHub LLM NLP Python React	PyData Paris 2025 Video
Deep Learning with Python, Third Edition 2025-09-24 Matthew Watson – author , Francois Chollet – author The bestselling book on Python deep learning, now covering generative AI, Keras 3, PyTorch, and JAX! Deep Learning with Python, Third Edition puts the power of deep learning in your hands. This new edition includes the latest Keras and TensorFlow features, generative AI models, and added coverage of PyTorch and JAX. Learn directly from the creator of Keras and step confidently into the world of deep learning with Python. In Deep Learning with Python, Third Edition you’ll discover: Deep learning from first principles The latest features of Keras 3 A primer on JAX, PyTorch, and TensorFlow Image classification and image segmentation Time series forecasting Large Language models Text classification and machine translation Text and image generation—build your own GPT and diffusion models! Scaling and tuning models With over 100,000 copies sold, Deep Learning with Python makes it possible for developers, data scientists, and machine learning enthusiasts to put deep learning into action. In this expanded and updated third edition, Keras creator François Chollet offers insights for both novice and experienced machine learning practitioners. You'll master state-of-the-art deep learning tools and techniques, from the latest features of Keras 3 to building AI models that can generate text and images. About the Technology In less than a decade, deep learning has changed the world—twice. First, Python-based libraries like Keras, TensorFlow, and PyTorch elevated neural networks from lab experiments to high-performance production systems deployed at scale. And now, through Large Language Models and other generative AI tools, deep learning is again transforming business and society. In this new edition, Keras creator François Chollet invites you into this amazing subject in the fluid, mentoring style of a true insider. About the Book Deep Learning with Python, Third Edition makes the concepts behind deep learning and generative AI understandable and approachable. This complete rewrite of the bestselling original includes fresh chapters on transformers, building your own GPT-like LLM, and generating images with diffusion models. Each chapter introduces practical projects and code examples that build your understanding of deep learning, layer by layer. What's Inside Hands-on, code-first learning Comprehensive, from basics to generative AI Intuitive and easy math explanations Examples in Keras, PyTorch, JAX, and TensorFlow About the Reader For readers with intermediate Python skills. No previous experience with machine learning or linear algebra required. About the Authors François Chollet is the co-founder of Ndea and the creator of Keras. Matthew Watson is a software engineer at Google working on Gemini and a core maintainer of Keras. Quotes Perfect for anyone interested in learning by doing from one of the industry greats. - Anthony Goldbloom, Founder of Kaggle A sharp, deeply practical guide that teaches you how to think from first principles to build models that actually work. - Santiago Valdarrama, Founder of ml.school The most up-to-date and complete guide to deep learning you’ll find today! - Aran Komatsuzaki, EleutherAI Masterfully conveys the true essence of neural networks. A rare case in recent years of outstanding technical writing. - Salvatore Sanfilippo, Creator of Redis data ai-ml machine-learning deep-learning AI/ML GenAI Keras LLM Python PyTorch Redis TensorFlow	O'Reilly AI & ML Books
Taiob Ali - Leveraging Azure AI and Python for Data-Driven Decision Making 2025-09-10 · 22:00 In this technical talk, we will explore how to harness the power of Azure AI, Azure AI Studio, Azure Search Services, and large language models to extract valuable decision-making data from the Azure SQL Database. We will begin by discussing Azure AI and its capabilities. Starting with a clean slate, build a solution using Azure AI Studio and its user-friendly interface that can chat with an SQL database, helping make data-driven decisions without writing code. This solution will delve into Azure Search Services, highlighting how it can be used to efficiently index and query data. The second part of the presentation will focus on utilizing large language models and Python notebooks to extract and analyze data from the Azure SQL Database. Attendees will learn how to set up their environment, connect to the database, and implement AI-driven solutions (talk to the database). By the end of the session, participants will have a solid foundation in using Azure AI and Python for data-driven decision-making, empowering them to leverage these tools in their projects.	Taiob Ali - Leveraging Azure AI and Python for Data-Driven Decision Making
Duck Lake: Simplifying the Lakehouse Ecosystem 2025-09-10 · 01:03 Hannes Mühleisen – co-creator and CEO @ DuckDB Labs , Mark Raasveldt – Co-creator @ DuckDB , Tobias Macey – host Summary In this episode of the Data Engineering Podcast Hannes Mühleisen and Mark Raasveldt, the creators of DuckDB, share their work on Duck Lake, a new entrant in the open lakehouse ecosystem. They discuss how Duck Lake, is focused on simplicity, flexibility, and offers a unified catalog and table format compared to other lakehouse formats like Iceberg and Delta. Hannes and Mark share insights into how Duck Lake revolutionizes data architecture by enabling local-first data processing, simplifying deployment of lakehouse solutions, and offering benefits such as encryption features, data inlining, and integration with existing ecosystems. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. Your host is Tobias Macey and today I'm interviewing Hannes Mühleisen and Mark Raasveldt about DuckLake, the latest entrant into the open lakehouse ecosystemInterview IntroductionHow did you get involved in the area of data management?Can you describe what DuckLake is and the story behind it?What are the particular problems that DuckLake is solving for?How does this compare to the capabilities of MotherDuck?Iceberg and Delta already have a well established ecosystem, but so does DuckDB. Who are the primary personas that you are trying to focus on in these early days of DuckLake?One of the major factors driving the adoption of formats like Iceberg is cost efficiency for large volumes of data. That brings with it challenges of large batch processing of data. How does DuckLake account for these axes of scale?There is also a substantial investment in the ecosystem of technologies that support Iceberg. The most notable ecosystem challenge for DuckDB and DuckLake is in the query layer. How are you thinking about the evolution and growth of that capability beyond DuckDB (e.g. support in Trino/Spark/Flink)?What are your opinions on the viability of a future where DuckLake and Iceberg become a unified standard and implementation? (why can't Iceberg REST catalog implementations just use DuckLake under the hood?)Digging into the specifics of the specification and implementation, what are some of the capabilities that it offers above and beyond Iceberg?Is it now possible to enforce PK/FK constraints, indexing on underlying data?Given that DuckDB has a vector type, how do you think about the support for vector storage/indexing?How do the capabilities of DuckLake and the integration with DuckDB change the ways that data teams design their data architecture and access patterns?What are your thoughts on the impact of "data gravity" in today's data ecosystem, with engines like DuckDB, KuzuDB, LanceDB, etc. available for embedded and edge use cases?What are the most interesting, innovative, or unexpected ways that you have seen DuckLake used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on DuckLake?When is DuckLake the wrong choice?What do you have planned for the future of DuckLake?Contact Info HannesWebsiteMarkWebsiteParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links DuckDBPodcast EpisodeDuckLakeDuckDB LabsMySQLCWIMonetDBIcebergIceberg REST CatalogDeltaHudiLanceDuckDB Iceberg ConnectorACID == Atomicity, Consistency, Isolation, DurabilityMotherDuckMotherDuck Managed DuckLakeTrinoSparkPrestoSpark DuckLake DemoDelta KernelArrowdltS3 TablesAttribute Based Access Control (ABAC)ParquetArrow FlightHadoopHDFSDuckLake RoadmapThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA AI/ML Flink Data Engineering Data Lakehouse Data Management Datafold Delta DuckDB ETL/ELT Iceberg Lance Motherduck Prefect Python Spark Data Streaming Trino	Data Engineering Podcast Listen
Virtual 6-Week AI Bootcamp 2025 2025-09-09 · 16:00 This is PAID event. REGISTER HERE - https://lu.ma/psj48l0t As a gesture of our appreciation for being part of ODSC Community, we are offering a 20% discount on the 6-week Bootcamp pass. Apply code - `COMMUNITY-20`- to save more. Level Up Your AI Skills This Fall! 🚀 Join us for an intensive 6-week virtual AI Bootcamp, a fantastic prelude to the renowned ODSC AI West Bootcamp in October! This isn't just any bootcamp; it's your chance to build a strong foundation in AI from the comfort of your home, all before experiencing the full, immersive 4-day event in person. 🚀And here's a pro-tip: If you secure a pass for the ODSC AI West Bootcamp, you'll gain free access to this 6-week virtual training. It's the perfect way to maximize your learning experience and hit the ground running at ODSC West. 🚀 Curious to learn more? Head over to https://odsc.ai/west/bootcamp/ for all the details. Don't forget to use code COMMUNITYWest2025 at checkout for an extra discount! If you choose to enroll in the 6-Week Virtual AI Bootcamp as a standalone event: Over 6 weeks, gain a comprehensive understanding of AI, from foundational concepts in coding and machine learning to LLMs, AI Agents & RAG Here are the list of sessions you will be attending: AI & Machine Learning Modeling - September 9th, 2025 Vibe Coding with AI (NEW) - September 11th, 2025 Machine Learning Data Prep with Python - September 16th, 2025 Introduction to Machine Learning - September 18th, 2025 Large Language Models & Fine-Tuning - September 25th, 2025 Introduction to RAG - October 2nd, 2025 Introduction AI Agents - October 9th, 2025 Build and Launch Your AI Project - October 16th, 2025 Regardless of your current skill level, this AI bootcamp will help transform you from AI novice to confident practitioner 🚀 If you’ve missed any, don’t worry—you still have access to the 13 sessions that passed including our On-Demand courses: * Data & Generative AI Literacy .Data Wrangling With SQL Linear Algebra * Statistics and Hypothesis Testing * Introduction to Math for Data Science How it works 🌍 : Each course is 2.5 hours long and includes extra materials The primer series is taught live and then available on demand. If you miss the live course, each session is available on-demand as soon as you register. Each course includes exercises to improve learning outcomes. Coding expercises allow you to learn hands-on skills. Learn at your own pace. Courses can be taken alongside additional Ai+ courses - aiplus.training Your Instructor🚀: Sheamus McGovern, Founder and Engineer \\| ODSC AI Sheamus McGovern is the founder of ODSC AI (The Open Data Science Conference). He is also a software architect, data engineer, and AI expert. He started his career in finance by building stock and bond trading systems and risk assessment platforms and has worked for numerous financial institutions and quant hedge funds. Over the last decade, Sheamus has consulted with dozens of companies and startups to build leading-edge data-driven applications in finance, healthcare, eCommerce, and venture capital. He holds degrees from Northeastern University, Boston University, Harvard University, and a CQF in Quantitative Finance. Some useful links: • Get free access to more talks/trainings like this at Ai+ Training platform: https://aiplus.training/ • ODSC blog: https://opendatascience.com/ • Slack Channel: https://hubs.li/Q038cQBy0 • Code of conduct: https://odsc.ai/code-of-conduct/	Virtual 6-Week AI Bootcamp 2025
Virtual 6-Week AI Bootcamp 2025 2025-09-09 · 16:00 This is PAID event. REGISTER HERE - https://lu.ma/psj48l0t As a gesture of our appreciation for being part of ODSC Community, we are offering a 20% discount on the 6-week Bootcamp pass. Apply code - `COMMUNITY-20`- to save more. Level Up Your AI Skills This Fall! 🚀 Join us for an intensive 6-week virtual AI Bootcamp, a fantastic prelude to the renowned ODSC AI West Bootcamp in October! This isn't just any bootcamp; it's your chance to build a strong foundation in AI from the comfort of your home, all before experiencing the full, immersive 4-day event in person. 🚀And here's a pro-tip: If you secure a pass for the ODSC AI West Bootcamp, you'll gain free access to this 6-week virtual training. It's the perfect way to maximize your learning experience and hit the ground running at ODSC West. 🚀 Curious to learn more? Head over to https://odsc.ai/west/bootcamp/ for all the details. Don't forget to use code COMMUNITYWest2025 at checkout for an extra discount! If you choose to enroll in the 6-Week Virtual AI Bootcamp as a standalone event: Over 6 weeks, gain a comprehensive understanding of AI, from foundational concepts in coding and machine learning to LLMs, AI Agents & RAG Here are the list of sessions you will be attending: AI & Machine Learning Modeling - September 9th, 2025 Vibe Coding with AI (NEW) - September 11th, 2025 Machine Learning Data Prep with Python - September 16th, 2025 Introduction to Machine Learning - September 18th, 2025 Large Language Models & Fine-Tuning - September 25th, 2025 Introduction to RAG - October 2nd, 2025 Introduction AI Agents - October 9th, 2025 Build and Launch Your AI Project - October 16th, 2025 Regardless of your current skill level, this AI bootcamp will help transform you from AI novice to confident practitioner 🚀 If you’ve missed any, don’t worry—you still have access to the 13 sessions that passed including our On-Demand courses: * Data & Generative AI Literacy .Data Wrangling With SQL Linear Algebra * Statistics and Hypothesis Testing * Introduction to Math for Data Science How it works 🌍 : Each course is 2.5 hours long and includes extra materials The primer series is taught live and then available on demand. If you miss the live course, each session is available on-demand as soon as you register. Each course includes exercises to improve learning outcomes. Coding expercises allow you to learn hands-on skills. Learn at your own pace. Courses can be taken alongside additional Ai+ courses - aiplus.training Your Instructor🚀: Sheamus McGovern, Founder and Engineer \\| ODSC AI Sheamus McGovern is the founder of ODSC AI (The Open Data Science Conference). He is also a software architect, data engineer, and AI expert. He started his career in finance by building stock and bond trading systems and risk assessment platforms and has worked for numerous financial institutions and quant hedge funds. Over the last decade, Sheamus has consulted with dozens of companies and startups to build leading-edge data-driven applications in finance, healthcare, eCommerce, and venture capital. He holds degrees from Northeastern University, Boston University, Harvard University, and a CQF in Quantitative Finance. Some useful links: • Get free access to more talks/trainings like this at Ai+ Training platform: https://aiplus.training/ • ODSC blog: https://opendatascience.com/ • Slack Channel: https://hubs.li/Q038cQBy0 • Code of conduct: https://odsc.ai/code-of-conduct/	Virtual 6-Week AI Bootcamp 2025

Data Engineering with Azure Databricks 2026-04-10

Xenia Ireton – author , Tonya Chernyshova – author , Dmitry Foshin – author , Dmitry Anoshin – author

Master end-to-end data engineering on Azure Databricks. From data ingestion and Delta Lake to CI/CD and real-time streaming, build secure, scalable, and performant data solutions with Spark, Unity Catalog, and ML tools. Key Features Build scalable data pipelines using Apache Spark and Delta Lake Automate workflows and manage data governance with Unity Catalog Learn real-time processing and structured streaming with practical use cases Implement CI/CD, DevOps, and security for production-ready data solutions Explore Databricks-native ML, AutoML, and Generative AI integration Book Description "Data Engineering with Azure Databricks" is your essential guide to building scalable, secure, and high-performing data pipelines using the powerful Databricks platform on Azure. Designed for data engineers, architects, and developers, this book demystifies the complexities of Spark-based workloads, Delta Lake, Unity Catalog, and real-time data processing. Beginning with the foundational role of Azure Databricks in modern data engineering, you’ll explore how to set up robust environments, manage data ingestion with Auto Loader, optimize Spark performance, and orchestrate complex workflows using tools like Azure Data Factory and Airflow. The book offers deep dives into structured streaming, Delta Live Tables, and Delta Lake’s ACID features for data reliability and schema evolution. You’ll also learn how to manage security, compliance, and access controls using Unity Catalog, and gain insights into managing CI/CD pipelines with Azure DevOps and Terraform. With a special focus on machine learning and generative AI, the final chapters guide you in automating model workflows, leveraging MLflow, and fine-tuning large language models on Databricks. Whether you're building a modern data lakehouse or operationalizing analytics at scale, this book provides the tools and insights you need. What you will learn Set up a full-featured Azure Databricks environment Implement batch and streaming ingestion using Auto Loader Optimize Spark jobs with partitioning and caching Build real-time pipelines with structured streaming and DLT Manage data governance using Unity Catalog Orchestrate production workflows with jobs and ADF Apply CI/CD best practices with Azure DevOps and Git Secure data with RBAC, encryption, and compliance standards Use MLflow and Feature Store for ML pipelines Build generative AI applications in Databricks Who this book is for This book is for data engineers, solution architects, cloud professionals, and software engineers seeking to build robust and scalable data pipelines using Azure Databricks. Whether you're migrating legacy systems, implementing a modern lakehouse architecture, or optimizing data workflows for performance, this guide will help you leverage the full power of Databricks on Azure. A basic understanding of Python, Spark, and cloud infrastructure is recommended.

data data-engineering apache-spark AI/ML Airflow Analytics Azure ADF Azure DevOps CI/CD Cloud Computing Data Engineering Data Governance Data Lakehouse Databricks Delta DevOps GenAI Git Python Cyber Security Spark Data Streaming Terraform

O'Reilly Data Engineering Books

Building LLM applications with Python 2026-01-05 · 18:00

Overview

Students, developers, and anyone interested in getting started with theory and practice on building LLM-based applications with Python.

Who is this for?

Undeniably, large language models (LLMs) are at the centre of a modern gold-rush in technology.

Students, developers, and anyone interested in getting started with theory and practice on building LLM-based applications with Python.

Who is leading the session?

The session is led by Dr. Stelios Sotiriadis, CEO of Warestack, Associate Professor and MSc Programme Director at Birkbeck, University of London. His expertise includes cloud computing, distributed systems, and AI engineering.

Stelios holds a PhD from the University of Derby, completed a postdoctoral fellowship at the University of Toronto, and has worked with Huawei, IBM, Autodesk, and several startups. Since 2018 he has taught at Birkbeck and, in 2021, founded Warestack, building software for startups globally.

What we’ll cover

A practical introduction on the basics of local models and cloud APIs to build real software systems. You will learn:

Introduction to natural language processing
LLMs theory and intuition
Agents are and how to build them
Running local models with Ollama (free and offline)
Calling local models using Python
Building a ChatGPT-like chatbot with Python libraries

Requirements

A laptop with Python (Windows, macOS, or Linux)
Visual Studio Code installed
Python pip installed
At least 10 GB free disk space
At least 8 GB RAM

This space is needed for running local models.

You may also use the lab computers if your device doesn’t meet the requirements.

Format

A 1.5-hours live session including:

Interactive theory
Hands-on coding
Step-by-step exercises

The session will run in person, with streaming available for remote attendees.

Prerequisites You should be comfortable writing Python scripts (basic to intermediate level).

Building LLM applications with Python

Hands-On LLM Engineering with Python (Part 1) 2025-12-18 · 18:00

REGISTER BELOW FOR MORE AVAILABLE DATES! ↓↓↓↓↓ https://luma.com/stelios

-----------------------------------------------------------------------------------

Who is this for?

Students, developers, and anyone interested in using Large Language Models (LLMs) to build real software solutions with ** Python.

Tired of vibe coding with AI tools? Want to actually understand and own your code, instead of relying on black-box magic? This session shows you how to build LLM systems properly, with full control and clear engineering principles. Who is leading the session?

The session is led by Dr. Stelios Sotiriadis, CEO of Warestack, Associate Professor and MSc Programme Director at Birkbeck, University of London, specialising in cloud computing, distributed systems, and AI engineering.

Stelios holds a PhD from the University of Derby, completed a postdoctoral fellowship at the University of Toronto, and has worked on industry and research projects with Huawei, IBM, Autodesk, and multiple startups. Since moving to London in 2018, he has been teaching at Birkbeck. In 2021, he founded Warestack, building software for startups around the world. What we’ll cover?

A hands-on introduction to building software with LLMs using Python, Ollama, and LiteLLM, including:

How LLMs, embeddings, and agents work.
Calling local models with Ollama or cloud models (OpenAI, Gemini and more).
Using LiteLLM for custom prompts and tool-calling.
Building simple agents from scratch.
Introduction to RAG (Retrieval-Augmented Generation).
Working with vector databases (ChromaDB) and vector similarity search library (FAISS).
Storing, searching, and retrieving embeddings.
Introduction to Streamlit for interactive data apps.
End-to-end examples you can run on your own machine.

This session focuses on theory, fundamentals and real code you can re-use.

Why LiteLLM?

LiteLLM gives you low-level control to build custom LLM solutions your own way, without a heavy framework like LangChain, so you understand how everything works and design your own architecture. A dedicated LangChain session will follow for those who want to go further.

What are the requirements?

Bring a laptop with Python installed (Windows, macOS, or Linux), along with Visual Studio Code or a similar IDE, with at least 10GB of free disk space and 8GB of RAM.

This space is needed for running local models during the workshop. If you don’t have a suitable laptop, please contact Stelios ([email protected]) before registering.

What is the format?

A 3-hour live session with:

Interactive theory blocks
Hands-on coding
Step-by-step exercises
Small group support
Three 10-minute breaks
Q&A and class quizzes

This is a highly practical, hands-on class focused on code and building working LLM systems.

What are the prerequisites?

A good understanding of programming with Python is required (basic to intermediate level). I assume you are already comfortable writing Python scripts.

What comes after?

Participants will receive an optional mini capstone project with one-to-one personalised feedback.

Is it just one session?

This is the first session in a new sequence on applied AI, covering agents, RAG systems, vector databases, and production-ready LLM workflows. Later sessions will dive deeper into topics such as embeddings with deep neural networks, LangChain, advanced retrieval, and multi-agent architectures.

You can decide afterwards whether you’d like to join future sessions.

How many participants?

To keep this interactive, only 15 spots are available. Please register as soon as possible.

Hands-On LLM Engineering with Python (Part 1)

A Practical Starter's Guide to building LLM based projects | Marcin S. | DSC DACH 25 2025-12-10 · 15:28

In his tech tutorial, Marcin showed how to go beyond creating prompts for ChatGPT and build full applications leveraging generative AI. He covered the fundamentals of large language models (LLMs), introduced LangChain, and demonstrated techniques like question answering over documents and creating reasoning agents. The session also addressed advanced methods and practical challenges of deploying LLMs in production. By the end, participants with Python experience gained hands-on knowledge to develop GPT-driven applications while understanding potential pitfalls and limitations. This tutorial by Marcin Szymaniuk was held on October 14th at DSC DACH 25 in Vienna.

Follow us on social media : LinkedIn: https://www.linkedin.com/company/11184830/admin/ Instagram: https://www.instagram.com/datasciconf/ Facebook page: https://www.facebook.com/DataSciConference Website: https://datasciconference.com/

DSC DACH 25

YouTube

Uncertainty-Guided AI Red Teaming: Efficient Vulnerability Discovery in LLMs 2025-12-10 · 14:45

Zvi Topol

AI red teaming is crucial for identifying security and safety vulnerabilities (e.g., jailbreaks, prompt injection, harmful content generation) of Large Language Models. However, manual and brute-force adversarial testing is resource-intensive and often inefficiently consumes time and compute resources exploring low-risk regions of the input space. This talk introduces a practical, Python-based methodology for accelerating red teaming using model uncertainty quantification (UQ).

AI/ML LLM Python Cyber Security

PyData Boston 2025

State, Scale, and Signals: Rethinking Orchestration with Durable Execution 2025-11-16 · 23:19

Preeti Somal – EVP of Engineering @ Temporal , Tobias Macey – host

Summary In this episode Preeti Somal, EVP of Engineering at Temporal, talks about the durable execution model and how it reshapes the way teams build reliable, stateful systems for data and AI. She explores Temporal’s code‑first programming model—workflows, activities, task queues, and replay—and how it eliminates hand‑rolled retry, checkpoint, and error‑handling scaffolding while letting data remain where it lives. Preeti shares real-world patterns for replacing DAG-first orchestration, integrating application and data teams through signals and Nexus for cross-boundary calls, and using Temporal to coordinate long-running, human-in-the-loop, and agentic AI workflows with full observability and auditability. Shee also discusses heuristics for choosing Temporal alongside (or instead of) traditional orchestrators, managing scale without moving large datasets, and lessons from running durable execution as a cloud service.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Preeti Somal about how to incorporate durable execution and state management into AI application architectures Interview IntroductionHow did you get involved in the area of data management?Can you describe what durable execution is and how it impacts system architecture?With the strong focus on state maintenance and high reliability, what are some of the most impactful ways that data teams are incorporating tools like Temporal into their work?One of the core primitives in Temporal is a "workflow". How does that compare to similar primitives in common data orchestration systems such as Airflow, Dagster, Prefect, etc.? What are the heuristics that you recommend when deciding which tool to use for a given task, particularly in data/pipeline oriented projects? Even if a team is using a more data-focused orchestration engine, what are some of the ways that Temporal can be applied to handle the processing logic of the actual data?AI applications are also very dependent on reliable data to be effective in production contexts. What are some of the design patterns where durable execution can be integrated into RAG/agent applications?What are some of the conceptual hurdles that teams experience when they are starting to adopt Temporal or other durable execution frameworks?What are the most interesting, innovative, or unexpected ways that you have seen Temporal/durable execution used for data/AI services?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Temporal?When is Temporal/durable execution the wrong choice?What do you have planned for the future of Temporal for data and AI systems? Contact Info LinkedIn Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story. Links TemporalDurable ExecutionFlinkMachine Learning EpochSpark StreamingAirflowDirected Acyclic Graph (DAG)Temporal NexusTensorZeroAI Engineering Podcast Episode The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

AI/ML Airflow Cloud Computing Dagster Data Engineering Data Management Data Quality Datafold dbt ETL/ELT Prefect Python RAG SQL Data Streaming

Data Engineering Podcast

Listen

From Biotechnology to Bioinformatics Software - Sebastian Ayala Ruano 2025-10-24 · 17:00

Data Talks Club – host @ DataTalks.Club , Sebastian Ayala Ruano – bioinformatics researcher and software engineer

In this talk, Sebastian, a bioinformatics researcher and software engineer, shares his inspiring journey from wet lab biotechnology to computational bioinformatics. Hosted by Data Talks Club, this session explores how data science, AI, and open-source tools are transforming modern biological research — from DNA sequencing to metagenomics and protein structure prediction.

You’ll learn about: - The difference between wet lab and dry lab workflows in biotechnology - How bioinformatics enables faster insights through data-driven modeling - The MCW2 Graph Project and its role in studying wastewater microbiomes - Using co-abundance networks and the CC Lasso algorithm to map microbial interactions - How AlphaFold revolutionized protein structure prediction - Building scientific knowledge graphs to integrate biological metadata - Open-source tools like VueGen and VueCore for automating reports and visualizations - The growing impact of AI and large language models (LLMs) in research and documentation - Key differences between R (BioConductor) and Python ecosystems for bioinformatics

This talk is ideal for data scientists, bioinformaticians, biotech researchers, and AI enthusiasts who want to understand how data science, AI, and biology intersect. Whether you work in genomics, computational biology, or scientific software, you’ll gain insights into real-world tools and workflows shaping the future of bioinformatics.

Links: - MicW2Graph: https://zenodo.org/records/12507444 - VueGen: https://github.com/Multiomics-Analytics-Group/vuegen - Awesome-Bioinformatics: https://github.com/danielecook/Awesome-Bioinformatics

TIMECODES00:00 Sebastian’s Journey into Bioinformatics06:02 From Wet Lab to Computational Biology08:23 Wet Lab vs Dry Lab Explained12:35 Bioinformatics as Data Science for Biology15:30 How DNA Sequencing Works19:29 MCW2 Graph and Wastewater Microbiomes23:10 Building Microbial Networks with CC Lasso26:54 Protein–Ligand Simulation Basics29:58 Predicting Protein Folding in 3D33:30 AlphaFold Revolution in Protein Prediction36:45 Inside the MCW2 Knowledge Graph39:54 VueGen: Automating Scientific Reports43:56 VueCore: Visualizing OMIX Data47:50 Using AI and LLMs in Bioinformatics50:25 R vs Python in Bioinformatics Tools53:17 Closing Thoughts from Ecuador Connect with Sebastian Twitter - https://twitter.com/sayalaruanoLinkedin - https://linkedin.com/in/sayalaruano Github - https://github.com/sayalaruanoWebsite - https://sayalaruano.github.io/ Connect with DataTalks.Club: Join the community - https://datatalks.club/slack.htmlSubscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQCheck other upcoming events - https://lu.ma/dtc-eventsGitHub: https://github.com/DataTalksClubLinkedIn - https://www.linkedin.com/company/datatalks-club/Twitter - https://twitter.com/DataTalksClub - Website - https://datatalks.club/

AI/ML Analytics Data Science GitHub LLM Python

DataTalks.Club

Listen

Python + AI: Large Language Models 2025-10-07 · 17:00

Join us for the first session in our Python + AI series!

In this session, we'll talk about Large Language Models (LLMs), the models that power ChatGPT and GitHub Copilot. We'll use Python to interact with LLMs using popular packages like the OpenAI SDK and Langchain. We'll experiment with prompt engineering and few-shot examples to improve our outputs.

We'll also show how to build a full stack app powered by LLMs, and explain the importance of concurrency and streaming for user-facing AI apps.

This session is a part of a series! To learn more, click here

Pre-requisites: If you'd like to follow along with the live examples, make sure you've got a GitHub account.

Habla español? Tendremos una serie para hispanohablantes!

Python + AI: Large Language Models

Python + AI: Large Language Models 2025-10-07 · 17:00

Join us for the first session in our Python + AI series!

In this session, we'll talk about Large Language Models (LLMs), the models that power ChatGPT and GitHub Copilot. We'll use Python to interact with LLMs using popular packages like the OpenAI SDK and Langchain. We'll experiment with prompt engineering and few-shot examples to improve our outputs.

We'll also show how to build a full stack app powered by LLMs, and explain the importance of concurrency and streaming for user-facing AI apps.

This session is a part of a series! To learn more, click here

Pre-requisites: If you'd like to follow along with the live examples, make sure you've got a GitHub account.

Habla español? Tendremos una serie para hispanohablantes!

Python + AI: Large Language Models

Python + AI: Large Language Models 2025-10-07 · 17:00

Join us for the first session in our Python + AI series!

In this session, we'll talk about Large Language Models (LLMs), the models that power ChatGPT and GitHub Copilot. We'll use Python to interact with LLMs using popular packages like the OpenAI SDK and Langchain. We'll experiment with prompt engineering and few-shot examples to improve our outputs.

We'll also show how to build a full stack app powered by LLMs, and explain the importance of concurrency and streaming for user-facing AI apps.

This session is a part of a series! To learn more, click here

Pre-requisites: If you'd like to follow along with the live examples, make sure you've got a GitHub account.

Habla español? Tendremos una serie para hispanohablantes!

Python + AI: Large Language Models

Hands-On Webinar "Building and Deploying your First Agent with Tools on ADK" 2025-10-01 · 16:00

Pre-registration is REQUIRED. Add to your calendar - https://hubs.li/Q03HQmw30

Speaker: Amit Maraj PhD, Senior AI Developer Relations Engineer at Google

*** In the rapidly evolving world of generative AI, standalone large language models (LLMs) are powerful, but their true potential is unlocked when they can interact with the outside world. This is where agents and tools come in. An agent acts as an intelligent orchestrator, leveraging tools to perform goal-oriented operations that go beyond simple text generation—like looking up real-time data, interacting with APIs, or managing files.

Join us for this 1-hour webinar where you'll learn how to build and deploy your very first AI agent with tools using the Agent Development Kit (ADK), an open-source, code-first Python toolkit from Google. We will demystify the core concepts of agents and tools, and guide you through a practical, step-by-step process to create a functional agent that can access and use external data.

Who Should Attend: This webinar is for developers, data scientists, and anyone interested in moving from simple AI prototypes to building intelligent, autonomous applications. A basic understanding of Python is recommended.

ODSC Links: • Get free access to more talks/trainings like this at Ai+ Training platform: https://hubs.li/H0Zycsf0 • ODSC blog: https://opendatascience.com/ • Facebook: https://www.facebook.com/OPENDATASCI • Twitter: https://twitter.com/_ODSC & @odsc • LinkedIn: https://www.linkedin.com/company/open-data-science • Slack Channel: https://hubs.li/Q038cQBy0 • Code of conduct: https://odsc.com/code-of-conduct/

Hands-On Webinar "Building and Deploying your First Agent with Tools on ADK"

Hands-On Webinar "Building and Deploying your First Agent with Tools on ADK" 2025-10-01 · 16:00

Pre-registration is REQUIRED. Add to your calendar - https://hubs.li/Q03HQmw30

Speaker: Amit Maraj PhD, Senior AI Developer Relations Engineer at Google

*** In the rapidly evolving world of generative AI, standalone large language models (LLMs) are powerful, but their true potential is unlocked when they can interact with the outside world. This is where agents and tools come in. An agent acts as an intelligent orchestrator, leveraging tools to perform goal-oriented operations that go beyond simple text generation—like looking up real-time data, interacting with APIs, or managing files.

Join us for this 1-hour webinar where you'll learn how to build and deploy your very first AI agent with tools using the Agent Development Kit (ADK), an open-source, code-first Python toolkit from Google. We will demystify the core concepts of agents and tools, and guide you through a practical, step-by-step process to create a functional agent that can access and use external data.

Who Should Attend: This webinar is for developers, data scientists, and anyone interested in moving from simple AI prototypes to building intelligent, autonomous applications. A basic understanding of Python is recommended.

ODSC Links: • Get free access to more talks/trainings like this at Ai+ Training platform: https://hubs.li/H0Zycsf0 • ODSC blog: https://opendatascience.com/ • Facebook: https://www.facebook.com/OPENDATASCI • Twitter: https://twitter.com/_ODSC & @odsc • LinkedIn: https://www.linkedin.com/company/open-data-science • Slack Channel: https://hubs.li/Q038cQBy0 • Code of conduct: https://odsc.com/code-of-conduct/

Hands-On Webinar "Building and Deploying your First Agent with Tools on ADK"

Hands-On Webinar "Building and Deploying your First Agent with Tools on ADK" 2025-10-01 · 16:00

Pre-registration is REQUIRED. Add to your calendar - https://hubs.li/Q03HQmw30

Speaker: Amit Maraj PhD, Senior AI Developer Relations Engineer at Google

*** In the rapidly evolving world of generative AI, standalone large language models (LLMs) are powerful, but their true potential is unlocked when they can interact with the outside world. This is where agents and tools come in. An agent acts as an intelligent orchestrator, leveraging tools to perform goal-oriented operations that go beyond simple text generation—like looking up real-time data, interacting with APIs, or managing files.

Join us for this 1-hour webinar where you'll learn how to build and deploy your very first AI agent with tools using the Agent Development Kit (ADK), an open-source, code-first Python toolkit from Google. We will demystify the core concepts of agents and tools, and guide you through a practical, step-by-step process to create a functional agent that can access and use external data.

Who Should Attend: This webinar is for developers, data scientists, and anyone interested in moving from simple AI prototypes to building intelligent, autonomous applications. A basic understanding of Python is recommended.

ODSC Links: • Get free access to more talks/trainings like this at Ai+ Training platform: https://hubs.li/H0Zycsf0 • ODSC blog: https://opendatascience.com/ • Facebook: https://www.facebook.com/OPENDATASCI • Twitter: https://twitter.com/_ODSC & @odsc • LinkedIn: https://www.linkedin.com/company/open-data-science • Slack Channel: https://hubs.li/Q038cQBy0 • Code of conduct: https://odsc.com/code-of-conduct/

Hands-On Webinar "Building and Deploying your First Agent with Tools on ADK"

Webinar "Building and Deploying your First Agent with Tools on ADK" 2025-10-01 · 16:00

Pre-registration is REQUIRED. Add to your calendar - https://hubs.li/Q03HQmw30

Speaker: Amit Maraj PhD, Senior AI Developer Relations Engineer at Google

*** In the rapidly evolving world of generative AI, standalone large language models (LLMs) are powerful, but their true potential is unlocked when they can interact with the outside world. This is where agents and tools come in. An agent acts as an intelligent orchestrator, leveraging tools to perform goal-oriented operations that go beyond simple text generation—like looking up real-time data, interacting with APIs, or managing files.

Join us for this 1-hour webinar where you'll learn how to build and deploy your very first AI agent with tools using the Agent Development Kit (ADK), an open-source, code-first Python toolkit from Google. We will demystify the core concepts of agents and tools, and guide you through a practical, step-by-step process to create a functional agent that can access and use external data.

Who Should Attend: This webinar is for developers, data scientists, and anyone interested in moving from simple AI prototypes to building intelligent, autonomous applications. A basic understanding of Python is recommended.

ODSC Links: • Get free access to more talks/trainings like this at Ai+ Training platform: https://hubs.li/H0Zycsf0 • ODSC blog: https://opendatascience.com/ • Facebook: https://www.facebook.com/OPENDATASCI • Twitter: https://twitter.com/_ODSC & @odsc • LinkedIn: https://www.linkedin.com/company/open-data-science • Slack Channel: https://hubs.li/Q038cQBy0 • Code of conduct: https://odsc.com/code-of-conduct/

Webinar "Building and Deploying your First Agent with Tools on ADK"

ActiveTigger: A Collaborative Text Annotation Research Tool for Computational Social Sciences 2025-09-30 · 13:15

Emilien SCHULTZ , Paul Girard , Julien Boelaert

The exponential growth of textual data—ranging from social media posts and digital news archives to speech-to-text transcripts—has opened new frontiers for research in the social sciences. Tasks such as stance detection, topic classification, and information extraction have become increasingly common. At the same time, the rapid evolution of Natural Language Processing, especially pretrained language models and generative AI, has largely been led by the computer science community, often leaving a gap in accessibility for social scientists.

To address this, we initiated since 2023 the development of ActiveTigger, a lightweight, open-source Python application (with a web frontend in React) designed to accelerate annotation process and manage large-scale datasets through the integration of fine-tuned models. It aims to support computational social science for a large public both within and outside social sciences. Already used by a dynamic community in social sciences, the stable version is planned for early June 2025.

From a more technical prospect, the API is designed to manage the complete workflow from project creation, embeddings computation, exploration of the text corpus, human annotation with active learning, fine-tuning of pre-trained models (BERT-like), prediction on a larger corpus, and export. It also integrates LLM-as-a-service capabilities for prompt-based annotation and information extraction, offering a flexible approach for hybrid manual/automatic labeling. Accessible both with a web frontend and a Python client, ActiveTigger encourages customization and adaptation to specific research contexts and practices.

In this talk, we will delve into the motivations behind the creation of ActiveTigger, outline its technical architecture, and walk through its core functionalities. Drawing on several ongoing research projects within the Computational Social Science (CSS) group at CREST, we will illustrate concrete use cases where ActiveTigger has accelerated data annotation, enabled scalable workflows, and fostered collaborations. Beyond the technical demonstration, the talk will also open a broader reflection on the challenges and opportunities brought by generative AI in academic research—especially in terms of reliability, transparency, and methodological adaptation for qualitative and quantitative inquiries.

The repository of the project : https://github.com/emilienschultz/activetigger/

The development of this software is funded by the DRARI Ile-de-France and supported by Progédo.

AI/ML API Computer Science GenAI GitHub LLM NLP Python React

PyData Paris 2025

Video

Deep Learning with Python, Third Edition 2025-09-24

Matthew Watson – author , Francois Chollet – author

The bestselling book on Python deep learning, now covering generative AI, Keras 3, PyTorch, and JAX! Deep Learning with Python, Third Edition puts the power of deep learning in your hands. This new edition includes the latest Keras and TensorFlow features, generative AI models, and added coverage of PyTorch and JAX. Learn directly from the creator of Keras and step confidently into the world of deep learning with Python. In Deep Learning with Python, Third Edition you’ll discover: Deep learning from first principles The latest features of Keras 3 A primer on JAX, PyTorch, and TensorFlow Image classification and image segmentation Time series forecasting Large Language models Text classification and machine translation Text and image generation—build your own GPT and diffusion models! Scaling and tuning models With over 100,000 copies sold, Deep Learning with Python makes it possible for developers, data scientists, and machine learning enthusiasts to put deep learning into action. In this expanded and updated third edition, Keras creator François Chollet offers insights for both novice and experienced machine learning practitioners. You'll master state-of-the-art deep learning tools and techniques, from the latest features of Keras 3 to building AI models that can generate text and images. About the Technology In less than a decade, deep learning has changed the world—twice. First, Python-based libraries like Keras, TensorFlow, and PyTorch elevated neural networks from lab experiments to high-performance production systems deployed at scale. And now, through Large Language Models and other generative AI tools, deep learning is again transforming business and society. In this new edition, Keras creator François Chollet invites you into this amazing subject in the fluid, mentoring style of a true insider. About the Book Deep Learning with Python, Third Edition makes the concepts behind deep learning and generative AI understandable and approachable. This complete rewrite of the bestselling original includes fresh chapters on transformers, building your own GPT-like LLM, and generating images with diffusion models. Each chapter introduces practical projects and code examples that build your understanding of deep learning, layer by layer. What's Inside Hands-on, code-first learning Comprehensive, from basics to generative AI Intuitive and easy math explanations Examples in Keras, PyTorch, JAX, and TensorFlow About the Reader For readers with intermediate Python skills. No previous experience with machine learning or linear algebra required. About the Authors François Chollet is the co-founder of Ndea and the creator of Keras. Matthew Watson is a software engineer at Google working on Gemini and a core maintainer of Keras. Quotes Perfect for anyone interested in learning by doing from one of the industry greats. - Anthony Goldbloom, Founder of Kaggle A sharp, deeply practical guide that teaches you how to think from first principles to build models that actually work. - Santiago Valdarrama, Founder of ml.school The most up-to-date and complete guide to deep learning you’ll find today! - Aran Komatsuzaki, EleutherAI Masterfully conveys the true essence of neural networks. A rare case in recent years of outstanding technical writing. - Salvatore Sanfilippo, Creator of Redis

data ai-ml machine-learning deep-learning AI/ML GenAI Keras LLM Python PyTorch Redis TensorFlow

O'Reilly AI & ML Books

Taiob Ali - Leveraging Azure AI and Python for Data-Driven Decision Making 2025-09-10 · 22:00

In this technical talk, we will explore how to harness the power of Azure AI, Azure AI Studio, Azure Search Services, and large language models to extract valuable decision-making data from the Azure SQL Database.

We will begin by discussing Azure AI and its capabilities. Starting with a clean slate, build a solution using Azure AI Studio and its user-friendly interface that can chat with an SQL database, helping make data-driven decisions without writing code. This solution will delve into Azure Search Services, highlighting how it can be used to efficiently index and query data.

The second part of the presentation will focus on utilizing large language models and Python notebooks to extract and analyze data from the Azure SQL Database. Attendees will learn how to set up their environment, connect to the database, and implement AI-driven solutions (talk to the database).

By the end of the session, participants will have a solid foundation in using Azure AI and Python for data-driven decision-making, empowering them to leverage these tools in their projects.

Taiob Ali - Leveraging Azure AI and Python for Data-Driven Decision Making

Duck Lake: Simplifying the Lakehouse Ecosystem 2025-09-10 · 01:03

Hannes Mühleisen – co-creator and CEO @ DuckDB Labs , Mark Raasveldt – Co-creator @ DuckDB , Tobias Macey – host

Summary In this episode of the Data Engineering Podcast Hannes Mühleisen and Mark Raasveldt, the creators of DuckDB, share their work on Duck Lake, a new entrant in the open lakehouse ecosystem. They discuss how Duck Lake, is focused on simplicity, flexibility, and offers a unified catalog and table format compared to other lakehouse formats like Iceberg and Delta. Hannes and Mark share insights into how Duck Lake revolutionizes data architecture by enabling local-first data processing, simplifying deployment of lakehouse solutions, and offering benefits such as encryption features, data inlining, and integration with existing ecosystems.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. Your host is Tobias Macey and today I'm interviewing Hannes Mühleisen and Mark Raasveldt about DuckLake, the latest entrant into the open lakehouse ecosystemInterview IntroductionHow did you get involved in the area of data management?Can you describe what DuckLake is and the story behind it?What are the particular problems that DuckLake is solving for?How does this compare to the capabilities of MotherDuck?Iceberg and Delta already have a well established ecosystem, but so does DuckDB. Who are the primary personas that you are trying to focus on in these early days of DuckLake?One of the major factors driving the adoption of formats like Iceberg is cost efficiency for large volumes of data. That brings with it challenges of large batch processing of data. How does DuckLake account for these axes of scale?There is also a substantial investment in the ecosystem of technologies that support Iceberg. The most notable ecosystem challenge for DuckDB and DuckLake is in the query layer. How are you thinking about the evolution and growth of that capability beyond DuckDB (e.g. support in Trino/Spark/Flink)?What are your opinions on the viability of a future where DuckLake and Iceberg become a unified standard and implementation? (why can't Iceberg REST catalog implementations just use DuckLake under the hood?)Digging into the specifics of the specification and implementation, what are some of the capabilities that it offers above and beyond Iceberg?Is it now possible to enforce PK/FK constraints, indexing on underlying data?Given that DuckDB has a vector type, how do you think about the support for vector storage/indexing?How do the capabilities of DuckLake and the integration with DuckDB change the ways that data teams design their data architecture and access patterns?What are your thoughts on the impact of "data gravity" in today's data ecosystem, with engines like DuckDB, KuzuDB, LanceDB, etc. available for embedded and edge use cases?What are the most interesting, innovative, or unexpected ways that you have seen DuckLake used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on DuckLake?When is DuckLake the wrong choice?What do you have planned for the future of DuckLake?Contact Info HannesWebsiteMarkWebsiteParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links DuckDBPodcast EpisodeDuckLakeDuckDB LabsMySQLCWIMonetDBIcebergIceberg REST CatalogDeltaHudiLanceDuckDB Iceberg ConnectorACID == Atomicity, Consistency, Isolation, DurabilityMotherDuckMotherDuck Managed DuckLakeTrinoSparkPrestoSpark DuckLake DemoDelta KernelArrowdltS3 TablesAttribute Based Access Control (ABAC)ParquetArrow FlightHadoopHDFSDuckLake RoadmapThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

AI/ML Flink Data Engineering Data Lakehouse Data Management Datafold Delta DuckDB ETL/ELT Iceberg Lance Motherduck Prefect Python Spark Data Streaming Trino

Data Engineering Podcast

Listen

Virtual 6-Week AI Bootcamp 2025 2025-09-09 · 16:00

This is PAID event. REGISTER HERE - https://lu.ma/psj48l0t

As a gesture of our appreciation for being part of ODSC Community, we are offering a 20% discount on the 6-week Bootcamp pass.

Apply code - COMMUNITY-20- to save more.

Level Up Your AI Skills This Fall! 🚀

Join us for an intensive 6-week virtual AI Bootcamp, a fantastic prelude to the renowned ODSC AI West Bootcamp in October! This isn't just any bootcamp; it's your chance to build a strong foundation in AI from the comfort of your home, all before experiencing the full, immersive 4-day event in person.

🚀And here's a pro-tip:

If you secure a pass for the ODSC AI West Bootcamp, you'll gain free access to this 6-week virtual training. It's the perfect way to maximize your learning experience and hit the ground running at ODSC West.

🚀 Curious to learn more? Head over to https://odsc.ai/west/bootcamp/ for all the details. Don't forget to use code COMMUNITYWest2025 at checkout for an extra discount!

If you choose to enroll in the 6-Week Virtual AI Bootcamp as a standalone event:

Over 6 weeks, gain a comprehensive understanding of AI, from foundational concepts in coding and machine learning to LLMs, AI Agents & RAG Here are the list of sessions you will be attending:

AI & Machine Learning Modeling - September 9th, 2025
Vibe Coding with AI (NEW) - September 11th, 2025
Machine Learning Data Prep with Python - September 16th, 2025
Introduction to Machine Learning - September 18th, 2025
Large Language Models & Fine-Tuning - September 25th, 2025
Introduction to RAG - October 2nd, 2025
Introduction AI Agents - October 9th, 2025
Build and Launch Your AI Project - October 16th, 2025

Regardless of your current skill level, this AI bootcamp will help transform you from AI novice to confident practitioner 🚀

If you’ve missed any, don’t worry—you still have access to the 13 sessions that passed including our On-Demand courses: * Data & Generative AI Literacy *.Data Wrangling With SQL * Linear Algebra * Statistics and Hypothesis Testing * Introduction to Math for Data Science

How it works 🌍 :

Each course is 2.5 hours long and includes extra materials
The primer series is taught live and then available on demand.
If you miss the live course, each session is available on-demand as soon as you register.
Each course includes exercises to improve learning outcomes.
Coding expercises allow you to learn hands-on skills.
Learn at your own pace. Courses can be taken alongside additional Ai+ courses - aiplus.training

Your Instructor🚀:

Sheamus McGovern, Founder and Engineer \| ODSC AI

Sheamus McGovern is the founder of ODSC AI (The Open Data Science Conference). He is also a software architect, data engineer, and AI expert. He started his career in finance by building stock and bond trading systems and risk assessment platforms and has worked for numerous financial institutions and quant hedge funds. Over the last decade, Sheamus has consulted with dozens of companies and startups to build leading-edge data-driven applications in finance, healthcare, eCommerce, and venture capital. He holds degrees from Northeastern University, Boston University, Harvard University, and a CQF in Quantitative Finance.

Some useful links: • Get free access to more talks/trainings like this at Ai+ Training platform: https://aiplus.training/ • ODSC blog: https://opendatascience.com/ • Slack Channel: https://hubs.li/Q038cQBy0 • Code of conduct: https://odsc.ai/code-of-conduct/

Virtual 6-Week AI Bootcamp 2025

Virtual 6-Week AI Bootcamp 2025 2025-09-09 · 16:00

This is PAID event. REGISTER HERE - https://lu.ma/psj48l0t

As a gesture of our appreciation for being part of ODSC Community, we are offering a 20% discount on the 6-week Bootcamp pass.

Apply code - COMMUNITY-20- to save more.

Level Up Your AI Skills This Fall! 🚀

Join us for an intensive 6-week virtual AI Bootcamp, a fantastic prelude to the renowned ODSC AI West Bootcamp in October! This isn't just any bootcamp; it's your chance to build a strong foundation in AI from the comfort of your home, all before experiencing the full, immersive 4-day event in person.

🚀And here's a pro-tip:

If you secure a pass for the ODSC AI West Bootcamp, you'll gain free access to this 6-week virtual training. It's the perfect way to maximize your learning experience and hit the ground running at ODSC West.

🚀 Curious to learn more? Head over to https://odsc.ai/west/bootcamp/ for all the details. Don't forget to use code COMMUNITYWest2025 at checkout for an extra discount!

If you choose to enroll in the 6-Week Virtual AI Bootcamp as a standalone event:

Over 6 weeks, gain a comprehensive understanding of AI, from foundational concepts in coding and machine learning to LLMs, AI Agents & RAG Here are the list of sessions you will be attending:

AI & Machine Learning Modeling - September 9th, 2025
Vibe Coding with AI (NEW) - September 11th, 2025
Machine Learning Data Prep with Python - September 16th, 2025
Introduction to Machine Learning - September 18th, 2025
Large Language Models & Fine-Tuning - September 25th, 2025
Introduction to RAG - October 2nd, 2025
Introduction AI Agents - October 9th, 2025
Build and Launch Your AI Project - October 16th, 2025

Regardless of your current skill level, this AI bootcamp will help transform you from AI novice to confident practitioner 🚀

If you’ve missed any, don’t worry—you still have access to the 13 sessions that passed including our On-Demand courses: * Data & Generative AI Literacy *.Data Wrangling With SQL * Linear Algebra * Statistics and Hypothesis Testing * Introduction to Math for Data Science

How it works 🌍 :

Each course is 2.5 hours long and includes extra materials
The primer series is taught live and then available on demand.
If you miss the live course, each session is available on-demand as soon as you register.
Each course includes exercises to improve learning outcomes.
Coding expercises allow you to learn hands-on skills.
Learn at your own pace. Courses can be taken alongside additional Ai+ courses - aiplus.training

Your Instructor🚀:

Sheamus McGovern, Founder and Engineer \| ODSC AI

Sheamus McGovern is the founder of ODSC AI (The Open Data Science Conference). He is also a software architect, data engineer, and AI expert. He started his career in finance by building stock and bond trading systems and risk assessment platforms and has worked for numerous financial institutions and quant hedge funds. Over the last decade, Sheamus has consulted with dozens of companies and startups to build leading-edge data-driven applications in finance, healthcare, eCommerce, and venture capital. He holds degrees from Northeastern University, Boston University, Harvard University, and a CQF in Quantitative Finance.

Some useful links: • Get free access to more talks/trainings like this at Ai+ Training platform: https://aiplus.training/ • ODSC blog: https://opendatascience.com/ • Slack Channel: https://hubs.li/Q038cQBy0 • Code of conduct: https://odsc.ai/code-of-conduct/

Virtual 6-Week AI Bootcamp 2025

talk-data.com

People (6 results)

Activities & events

Level Up Your AI Skills This Fall! 🚀

🚀And here's a pro-tip:

If you choose to enroll in the 6-Week Virtual AI Bootcamp as a standalone event:

Regardless of your current skill level, this AI bootcamp will help transform you from AI novice to confident practitioner 🚀

How it works 🌍 :

Your Instructor🚀:

Level Up Your AI Skills This Fall! 🚀

🚀And here's a pro-tip:

If you choose to enroll in the 6-Week Virtual AI Bootcamp as a standalone event:

Regardless of your current skill level, this AI bootcamp will help transform you from AI novice to confident practitioner 🚀

How it works 🌍 :

Your Instructor🚀: