talk-data.com talk-data.com

Filter by Source

Select conferences and events

People (286 results)

See all 286 →

Activities & events

Title & Speakers Event

Join us for the official kickoff of the AKS Automatic Hackathon, a four-week virtual challenge designed to empower developers to build intelligent, scalable, and cloud-native solutions using AKS Automatic and other Azure services.

Whether you're new to Kubernetes or a seasoned cloud-native builder, this session will walk you through: - 🔍 What AKS Automatic is and why it matters - 🛠️ Hackathon structure\, timeline\, and deliverables - 🎯 Judging criteria and prize details - 📚 Resources to help you get started---from GitHub repos to technical docs - 💬 Live Q&A with the product and engineering teams

This is your chance to go hands-on with AKS Automatic, solve real-world problems, and showcase your work to the Azure community. Let's build the future---together.

📌 Register for the hack here

AKS Automatic Hackathon Kickoff Call AMER/EMEA

Join us for the official kickoff of the AKS Automatic Hackathon, a four-week virtual challenge designed to empower developers to build intelligent, scalable, and cloud-native solutions using AKS Automatic and other Azure services.

Whether you're new to Kubernetes or a seasoned cloud-native builder, this session will walk you through: - 🔍 What AKS Automatic is and why it matters - 🛠️ Hackathon structure\, timeline\, and deliverables - 🎯 Judging criteria and prize details - 📚 Resources to help you get started---from GitHub repos to technical docs - 💬 Live Q&A with the product and engineering teams

This is your chance to go hands-on with AKS Automatic, solve real-world problems, and showcase your work to the Azure community. Let's build the future---together.

📌 Register for the hack here

AKS Automatic Hackathon Kickoff Call AMER/EMEA

Join us for the official kickoff of the AKS Automatic Hackathon, a four-week virtual challenge designed to empower developers to build intelligent, scalable, and cloud-native solutions using AKS Automatic and other Azure services.

Whether you're new to Kubernetes or a seasoned cloud-native builder, this session will walk you through: - 🔍 What AKS Automatic is and why it matters - 🛠️ Hackathon structure\, timeline\, and deliverables - 🎯 Judging criteria and prize details - 📚 Resources to help you get started---from GitHub repos to technical docs - 💬 Live Q&A with the product and engineering teams

This is your chance to go hands-on with AKS Automatic, solve real-world problems, and showcase your work to the Azure community. Let's build the future---together.

📌 Register for the hack here

AKS Automatic Hackathon Kickoff Call AMER/EMEA

Start 2026 with the ClickHouse India community in Gurgaon!

Connect with fellow data practitioners and hear from industry experts through engaging talks focused on lessons learned, best practices, and modern data challenges.

Agenda:

  • 10:30 AM: Registration, light snacks & networking
  • 11:00 AM: Welcome & Introductions
  • 11:10 AM: Inside ClickStack: Engineering Observability for Scale by Rakesh Puttaswamy, Lead Solutions Architect @ ClickHouse
  • 11:35 AM: Supercharging Personalised Notifications At Jobhai With ClickHouse by Sumit Kumar and Arvind Saini, Tech Leads @ Info Edge
  • 12:00 PM: Simplifying CDC: Migrating from Debezium to ClickPipes by Abhash Solanki, DevOps Engineer @ Spyne AI
  • 12:25 PM: Solving Analytics at Scale: From CDC to Actionable Insights by Kunal Sharma, Software Developer @ Samarth eGov
  • 12:50 PM: Q&A
  • 1:30 PM: Lunch & Networking

👉🏼 RSVP to secure your spot!

Interested in speaking at this meetup or future ClickHouse events? 🎤Shoot an email to [email protected] and she'll be in touch.

******** 🎤 Session Details: Inside ClickStack: Engineering Observability for Scale Dive deep into ClickStack, ClickHouse’s fresh approach to observability built for engineers who care about speed, scale, and simplicity. We’ll unpack the technical architecture behind how ClickStack handles metrics, logs, and traces using ClickHouse as the backbone for real-time, high-cardinality analytics. Expect a hands-on look at ingestion pipelines, schema design patterns, query optimization, and the integrations that make ClickStack tick.

Speaker: Rakesh Puttaswamy, Lead Solutions Architect @ ClickHouse

🎤 Session Details: Supercharging Personalised Notifications At Jobhai With ClickHouse Calculating personalized alerts for 2 million users is a data-heavy challenge that requires more than just standard indexing. This talk explores how Jobhai uses ClickHouse to power its morning notification pipeline, focusing on the architectural shifts and query optimizations that made our massive scale manageable and fast.

Speaker: Sumit Kumar and Arvind Saini, Tech Leads @ Info Edge Sumit is a seasoned software engineer with deep expertise in databases, backend systems, and machine learning. For over six years, he has led the Jobhai engineering team, driving continuous improvements across their database infrastructure and user-facing systems while streamlining workflows through ongoing innovation. Connect with Sumit Kumar on LinkedIn.

Arvind is a Tech Lead at Info Edge India Ltd with experience building and scaling backend systems for large consumer and enterprise platforms. Over the years, they have worked across system design, backend optimization, and data-driven services, contributing to initiatives such as notification platforms, workflow automation, and product revamps. Their work focuses on improving reliability, performance, and scalability of distributed systems, and they enjoy solving complex engineering problems while mentoring teams and driving technical excellence.

🎤 Session Details: Simplifying CDC: Migrating from Debezium to ClickPipes In this talk, Abhash will share their engineering team's journey migrating our core MySQL and MongoDB CDC flows to ClickPipes. We will contrast our previous architecture—where every schema change required manual intervention or complex Debezium configurations—with the new reality of ClickPipes' automated schema evolution, which seamlessly handles upstream schema changes and ingests flexible data without breaking pipelines.

Speaker: Abhash Solanki, DevOps Engineer @ Spyne AI Abhash serves as a DevOps Engineer at Spyne, orchestrating the AWS infrastructure behind the company's data warehouse and CDC pipelines. Having managed complex self-hosted Debezium and Kafka clusters, he understands the operational overhead of running stateful data stacks in the cloud. He recently led the architectural shift to ClickHouse Cloud, focusing on eliminating engineering toil and automating schema evolution handling.

🎤 Session Details: Solving Analytics at Scale: From CDC to Actionable Insights As SAMARTH’s data volumes grew rapidly, our analytics systems faced challenges with frequent data changes and near real-time reporting. These challenges were compounded by the platform’s inherently high cardinality in multidimensional data models - spanning institutions, programmes, states, categories, workflow stages, and time, resulting in highly complex and dynamic query patterns.

This talk describes how we evolved from basic CDC pipelines to a fast, reliable, and scalable near real-time analytics platform using ClickHouse. We share key design and operational learnings that enabled us to process continuous high-volume transactional data and deliver low-latency analytics for operational monitoring and policy-level decision-making.

Speaker: Kunal Sharma, Software Developer @ Samarth eGov Kunal Sharma is a data-focused professional with experience in building scalable data pipelines. His work includes designing and implementing robust ETL/ELT workflows, data-driven decision engines, and large-scale analytics platforms. At SAMARTH, he has contributed to building near real-time analytics systems, including the implementation of ClickHouse for large-scale, low-latency analytics.

ClickHouse Gurgaon/Delhi Meetup
Brian Allbee – author

Grow your software engineering discipline, incorporating and mastering design, development, testing, and deployment best practices examples in a realistic Python project structure. Key Features Understand what makes Software Engineering a discipline, distinct from basic programming Gain practical insight into updating, refactoring, and scaling an existing Python system Implement robust testing, CI/CD pipelines, and cloud-ready architecture decisions Book Description Software engineering is more than coding; it’s the strategic design and continuous improvement of systems that serve real-world needs. This newly updated second edition of Hands-On Software Engineering with Python expands on its foundational approach to help you grow into a senior or staff-level engineering role. Fully revised for today’s Python ecosystem, this edition includes updated tooling, practices, and architectural patterns. You’ll explore key changes across five minor Python versions, examine new features like dataclasses and type hinting, and evaluate modern tools such as Poetry, pytest, and GitHub Actions. A new chapter introduces high-performance computing in Python, and the entire development process is enhanced with cloud-readiness in mind. You’ll follow a complete redesign and refactor of a multi-tier system from the first edition, gaining insight into how software evolves—and what it takes to do that responsibly. From system modeling and SDLC phases to data persistence, testing, and CI/CD automation, each chapter builds your engineering mindset while updating your hands-on skills. By the end of this book, you'll have mastered modern Python software engineering practices and be equipped to revise and future-proof complex systems with confidence. What you will learn Distinguish software engineering from general programming Break down and apply each phase of the SDLC to Python systems Create system models to plan architecture before writing code Apply Agile, Scrum, and other modern development methodologies Use dataclasses, pydantic, and schemas for robust data modeling Set up CI/CD pipelines with GitHub Actions and cloud build tools Write and structure unit, integration, and end-to-end tests Evaluate and integrate tools like Poetry, pytest, and Docker Who this book is for This book is for Python developers with a basic grasp of software development who want to grow into senior or staff-level engineering roles. It’s ideal for professionals looking to deepen their understanding of software architecture, system modeling, testing strategies, and cloud-aware development. Familiarity with core Python programming is required, as the book focuses on applying engineering principles to maintain, extend, and modernize real-world systems.

software-development programming-languages Python Agile/Scrum CI/CD Cloud Computing Data Modelling Docker GitHub Pydantic
O'Reilly Data Engineering Books

REGISTER BELOW FOR MORE AVAILABLE DATES! ↓↓↓↓↓ https://luma.com/stelios

-----------------------------------------------------------------------------------

Who is this for?

​Students, developers, and anyone interested in using Large Language Models (LLMs) to build real software solutions with ** Python.

Tired of vibe coding with AI tools? Want to actually understand and own your code, instead of relying on black-box magic? This session shows you how to build LLM systems properly, with full control and clear engineering principles. Who is leading the session?

​The session is led by Dr. Stelios Sotiriadis, CEO of Warestack, Associate Professor and MSc Programme Director at Birkbeck, University of London, specialising in cloud computing, distributed systems, and AI engineering.

Stelios holds a PhD from the University of Derby, completed a postdoctoral fellowship at the University of Toronto, and has worked on industry and research projects with Huawei, IBM, Autodesk, and multiple startups. Since moving to London in 2018, he has been teaching at Birkbeck. In 2021, he founded Warestack, building software for startups around the world. What we’ll cover?

​A hands-on introduction to building software with LLMs using Python, Ollama, and LiteLLM, including:

  • ​How LLMs, embeddings, and agents work.
  • ​Calling local models with Ollama or cloud models (OpenAI, Gemini and more).
  • ​Using LiteLLM for custom prompts and tool-calling.
  • ​Building simple agents from scratch.
  • ​Introduction to RAG (Retrieval-Augmented Generation).
  • ​Working with vector databases (ChromaDB) and vector similarity search library (FAISS).
  • ​Storing, searching, and retrieving embeddings.
  • ​Introduction to Streamlit for interactive data apps.
  • ​End-to-end examples you can run on your own machine.

​This session focuses on theory, fundamentals and real code you can re-use.

​Why LiteLLM?

LiteLLM gives you low-level control to build custom LLM solutions your own way, without a heavy framework like LangChain, so you understand how everything works and design your own architecture. A dedicated LangChain session will follow for those who want to go further.

​What are the requirements?

​Bring a laptop with Python installed (Windows, macOS, or Linux), along with Visual Studio Code or a similar IDE, with at least 10GB of free disk space and 8GB of RAM.

This space is needed for running local models during the workshop.If you don’t have a suitable laptop, please contact Stelios ([email protected]) before registering.

​What is the format?

​A 3-hour live session with:

  • ​Interactive theory blocks
  • ​Hands-on coding
  • ​Step-by-step exercises
  • ​Small group support
  • ​Three 10-minute breaks
  • ​Q&A and class quizzes

​This is a highly practical, hands-on class focused on code and building working LLM systems.

​What are the prerequisites?

​A good understanding of programming with Python is required (basic to intermediate level). I assume you are already comfortable writing Python scripts.

​What comes after?

​Participants will receive an optional mini capstone project with one-to-one personalised feedback.

​Is it just one session?

​This is the first session in a new sequence on applied AI, covering agents, RAG systems, vector databases, and production-ready LLM workflows. Later sessions will dive deeper into topics such as embeddings with deep neural networks, LangChain, advanced retrieval, and multi-agent architectures.

You can decide afterwards whether you’d like to join future sessions.

​How many participants?

​To keep this interactive, only 15 spots are available. Please register as soon as possible.

Hands-On LLM Engineering with Python (Part 1)

Designing the Multimodal & Agentic Future of AI

Seniz Gayde Ayata - AIATUS AI Explore the future of artificial intelligence as we move beyond single-modality systems into an era of sophisticated multimodal and agentic AI. This keynote will examine how the convergence of vision, language, and action is reshaping what's possible in AI, and what it means for developers building the next generation of intelligent systems.

Data Unification at Scale: Building a Single Source of Truth from Disparate Sources

Mustafa Barak - EPAM In today's enterprise landscape, data lives everywhere—legacy systems, cloud platforms, databases, and data lakes. This session explores practical strategies and architectural patterns for unifying disparate data sources into a coherent, reliable single source of truth. Learn how to tackle data quality, governance, and integration challenges at scale while maintaining performance and consistency.

Context Engineering

Emre Okcular - OpenAI As large language models become more powerful, the art and science of context engineering has emerged as a critical skill. Discover how to effectively design, structure, and optimize context for LLM applications, from prompt engineering fundamentals to advanced techniques for retrieval-augmented generation and context management in production systems.

PyData Turkiye Conference @2025

When: Thursday 20th November 2025 Time: arrive for 5:45pm with talks starting from 6pm start prompt. Location: Ecosurety, 2nd Floor, 4 Colston Ave, Bristol ​ BS1 4ST

Complimentary drinks & pizza provided by our sponsor Method Resourcing Solutions.

Session 1 - Metadata-Driven Orchestration with Microsoft Fabric - Matt Collins Metadata-driven processing is something of a golden standard when it comes to analytics platforms. We’ll use this session to showcase an orchestration framework in Microsoft Fabric that relies on metadata to create a clean environment with parameterised pipelines, notebooks, and other objects, while considering the future-proofing a framework like this provides.​

About Matt Matt is a Senior Analytics Consultant working on end-to-end Data Platform solutions with the Microsoft Azure tech stack. He currently works at Cloud Formations, currently focusing on Data Engineering in both product development and delivery.

Session 2 - Do you really need a Data Lakehouse? Separating Hype from Business Need - Craig Porteous The Lakehouse promises to blend the best of data lakes and warehouses, but is it the right move for your organisation?​ Join this session to discuss practical decision criteria to help you determine whether a Lakehouse approach fits your goals or if your data platform is already doing everything you need.​

About Craig Craig is a lifelong learner with a passion for creative problem-solving. He shares his expertise and insights with the data community through talks, video content, and bringing people together with the DATA:Scotland event.

We all look forward to seeing you there!!

Nov25 - Metadata-Driven Orchestration & Do you really need a Data Lakehouse?
Simon Floyd – GM Americas, Manufacturing & Mobility @ Microsoft , Heather Kerrick @ Autodesk , Dennis Goetting @ PTC , Alfonso Rodriguez @ Microsoft

Discover how AI, cloud, and robotics are transforming digital engineering across industries. Learn how Microsoft and its partners are enabling faster design, smarter simulation, and scalable automation—from PLM modernization to autonomous systems. See how Azure, GitHub, and agentic AI are powering the next wave of industrial innovation.

AI/ML Azure Cloud Computing GitHub Microsoft
Microsoft Ignite 2025
Preeti Somal – EVP of Engineering @ Temporal , Tobias Macey – host

Summary  In this episode Preeti Somal, EVP of Engineering at Temporal, talks about the durable execution model and how it reshapes the way teams build reliable, stateful systems for data and AI. She explores Temporal’s code‑first programming model—workflows, activities, task queues, and replay—and how it eliminates hand‑rolled retry, checkpoint, and error‑handling scaffolding while letting data remain where it lives. Preeti shares real-world patterns for replacing DAG-first orchestration, integrating application and data teams through signals and Nexus for cross-boundary calls, and using Temporal to coordinate long-running, human-in-the-loop, and agentic AI workflows with full observability and auditability. Shee also discusses heuristics for choosing Temporal alongside (or instead of) traditional orchestrators, managing scale without moving large datasets, and lessons from running durable execution as a cloud service. 

Announcements  Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Preeti Somal about how to incorporate durable execution and state management into AI application architectures Interview   IntroductionHow did you get involved in the area of data management?Can you describe what durable execution is and how it impacts system architecture?With the strong focus on state maintenance and high reliability, what are some of the most impactful ways that data teams are incorporating tools like Temporal into their work?One of the core primitives in Temporal is a "workflow". How does that compare to similar primitives in common data orchestration systems such as Airflow, Dagster, Prefect, etc.?  What are the heuristics that you recommend when deciding which tool to use for a given task, particularly in data/pipeline oriented projects? Even if a team is using a more data-focused orchestration engine, what are some of the ways that Temporal can be applied to handle the processing logic of the actual data?AI applications are also very dependent on reliable data to be effective in production contexts. What are some of the design patterns where durable execution can be integrated into RAG/agent applications?What are some of the conceptual hurdles that teams experience when they are starting to adopt Temporal or other durable execution frameworks?What are the most interesting, innovative, or unexpected ways that you have seen Temporal/durable execution used for data/AI services?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Temporal?When is Temporal/durable execution the wrong choice?What do you have planned for the future of Temporal for data and AI systems? Contact Info   LinkedIn Parting Question   From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements   Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story. Links   TemporalDurable ExecutionFlinkMachine Learning EpochSpark StreamingAirflowDirected Acyclic Graph (DAG)Temporal NexusTensorZeroAI Engineering Podcast Episode The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA  

AI/ML Airflow Cloud Computing Dagster Data Engineering Data Management Data Quality Datafold dbt ETL/ELT Prefect Python RAG SQL Data Streaming
Ariel Pohoryles – guest @ Rivery , Tobias Macey – host

Summary In this episode of the Data Engineering Podcast Ariel Pohoryles, head of product marketing for Boomi's data management offerings, talks about a recent survey of 300 data leaders on how organizations are investing in data to scale AI. He shares a paradox uncovered in the research: while 77% of leaders trust the data feeding their AI systems, only 50% trust their organization's data overall. Ariel explains why truly productionizing AI demands broader, continuously refreshed data with stronger automation and governance, and highlights the challenges posed by unstructured data and vector stores. The conversation covers the need to shift from manual reviews to automated pipelines, the resurgence of metadata and master data management, and the importance of guardrails, traceability, and agent governance. Ariel also predicts a growing convergence between data teams and application integration teams and advises leaders to focus on high-value use cases, aggressive pipeline automation, and cataloging and governing the coming sprawl of AI agents, all while using AI to accelerate data engineering itself.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Ariel Pohoryles about data management investments that organizations are making to enable them to scale AI implementationsInterview IntroductionHow did you get involved in the area of data management?Can you start by describing the motivation and scope of your recent survey on data management investments for AI across your respondents?What are the key takeaways that were most significant to you?The survey reveals a fascinating paradox: 77% of leaders trust the data used by their AI systems, yet only half trust their organization's overall data quality. For our data engineering audience, what does this suggest about how companies are currently sourcing data for AI? Does it imply they are using narrow, manually-curated "golden datasets," and what are the technical challenges and risks of that approach as they try to scale?The report highlights a heavy reliance on manual data quality processes, with one expert noting companies feel it's "not reliable to fully automate validation" for external or customer data. At the same time, maturity in "Automated tools for data integration and cleansing" is low, at only 42%. What specific technical hurdles or organizational inertia are preventing teams from adopting more automation in their data quality and integration pipelines?There was a significant point made that with generative AI, "biases can scale much faster," making automated governance essential. From a data engineering perspective, how does the data management strategy need to evolve to support generative AI versus traditional ML models? What new types of data quality checks, lineage tracking, or monitoring for feedback loops are required when the model itself is generating new content based on its own outputs?The report champions a "centralized data management platform" as the "connective tissue" for reliable AI. How do you see the scale and data maturity impacting the realities of that effort?How do architectural patterns in the shape of cloud warehouses, lakehouses, data mesh, data products, etc. factor into that need for centralized/unified platforms?A surprising finding was that a third of respondents have not fully grasped the risk of significant inaccuracies in their AI models if they fail to prioritize data management. In your experience, what are the biggest blind spots for data and analytics leaders?Looking at the maturity charts, companies rate themselves highly on "Developing a data management strategy" (65%) but lag significantly in areas like "Automated tools for data integration and cleansing" (42%) and "Conducting bias-detection audits" (24%). If you were advising a data engineering team lead based on these findings, what would you tell them to prioritize in the next 6-12 months to bridge the gap between strategy and a truly scalable, trustworthy data foundation for AI?The report states that 83% of companies expect to integrate more data sources for their AI in the next year. For a data engineer on the ground, what is the most important capability they need to build into their platform to handle this influx?What are the most interesting, innovative, or unexpected ways that you have seen teams addressing the new and accelerated data needs for AI applications?What are some of the noteworthy trends or predictions that you have for the near-term future of the impact that AI is having or will have on data teams and systems?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links BoomiData ManagementIntegration & Automation DemoAgentstudioData Connector Agent WebinarSurvey ResultsData GovernanceShadow ITPodcast EpisodeThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

AI/ML Analytics Cloud Computing Data Engineering Data Management Data Quality Datafold dbt ETL/ELT GenAI Marketing Master Data Management Prefect Python SQL Data Streaming
Ido Bronstein – CEO @ Upriver , Omri Lifshitz – CTO @ Upriver , Tobias Macey – host

Summary In this episode of the Data Engineering Podcast Omri Lifshitz (CTO) and Ido Bronstein (CEO) of Upriver talk about the growing gap between AI's demand for high-quality data and organizations' current data practices. They discuss why AI accelerates both the supply and demand sides of data, highlighting that the bottleneck lies in the "middle layer" of curation, semantics, and serving. Omri and Ido outline a three-part framework for making data usable by LLMs and agents: collect, curate, serve, and share challenges of scaling from POCs to production, including compounding error rates and reliability concerns. They also explore organizational shifts, patterns for managing context windows, pragmatic views on schema choices, and Upriver's approach to building autonomous data workflows using determinism and LLMs at the right boundaries. The conversation concludes with a look ahead to AI-first data platforms where engineers supervise business semantics while automation stitches technical details end-to-end.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Omri Lifshitz and Ido Bronstein about the challenges of keeping up with the demand for data when supporting AI systemsInterview IntroductionHow did you get involved in the area of data management?We're here to talk about "The Growing Gap Between Data & AI". From your perspective, what is this gap, and why do you think it's widening so rapidly right now?How does this gap relate to the founding story of Upriver? What problems were you and your co-founders experiencing that led you to build this?The core premise of new AI tools, from RAG pipelines to LLM agents, is that they are only as good as the data they're given. How does this "garbage in, garbage out" problem change when the "in" is not a static file but a complex, high-velocity, and constantly changing data pipeline?Upriver is described as an "intelligent agent system" and an "autonomous data engineer." This is a fascinating "AI to solve for AI" approach. Can you describe this agent-based architecture and how it specifically works to bridge that data-AI gap?Your website mentions a "Data Context Layer" that turns "tribal knowledge" into a "machine-usable mode." This sounds critical for AI. How do you capture that context, and how does it make data "AI-ready" in a way that a traditional data catalog or quality tool doesn't?What are the most innovative or unexpected ways you've seen companies trying to make their data "AI-ready"? And where are the biggest points of failure you observe?What has been the most challenging or unexpected lesson you've learned while building an AI system (Upriver) that is designed to fix the data foundation for other AI systems?When is an autonomous, agent-based approach not the right solution for a team's data quality problems? What organizational or technical maturity is required to even start closing this data-AI gap?What do you have planned for the future of Upriver? And looking more broadly, how do you see this gap between data and AI evolving over the next few years?Contact Info Ido - LinkedInOmri - LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links UpriverRAG == Retrieval Augmented GenerationAI Engineering Podcast EpisodeAI AgentContext WindowModel Finetuning)The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

AI/ML Cloud Computing Data Engineering Data Management Data Quality Datafold dbt ETL/ELT LLM Prefect Python RAG SQL Data Streaming

Azure Data Factory (ADF) is Microsoft’s central orchestration and integration service for reliable and scalable data movement. Combined with Azure Databricks as a powerful platform for data processing and transformation, it enables a flexible and future-proof data engineering architecture. In this session, participants will gain a practical, real-world overview of how to implement complex end-to-end data loading processes with ADF and Databricks - whether on-premises, in the cloud, or in hybrid scenarios. We will cover the full lifecycle of a Data Factory and its integration with Databricks:

· Architecture & best practices for high-performance, scalable, and maintainable pipelines · Source and target connectivity (SQL Server, Azure Storage, APIs, Data Lake, and more) · Databricks integration within ADF pipelines as the central transformation tool · Transformations with Databricks notebooks – from simple cleansing to complex business logic · Parameterization & reusability of pipelines, datasets, and notebook executions · DevOps integration: Building a complete CI/CD pipeline in Azure DevOps or GitHub Actions · Infrastructure as Code with Bicep and Terraform for repeatable, parameterized deployments · Monitoring & error handling – from alerting to automated remediation

The session guides participants through a real-world end-to-end solution based on an actual customer project. The focus is on practical architecture decisions, proven best practices, and common pitfalls - rather than abstract concepts. Participants will leave with a clear, actionable roadmap that enables them to successfully implement similar data integration scenarios with Azure Data Factory, Azure Databricks, and DevOps-based deployment.

RG Treffen Datamonsters Münsterland 2025/10 - ADF & Databricks
Manasi Vartak – Chief AI Architect and VP of Product Management (AI Platform) @ Cloudera , Richie – host @ DataCamp

The promise of AI in enterprise settings is enormous, but so are the privacy and security challenges. How do you harness AI's capabilities while keeping sensitive data protected within your organization's boundaries? Private AI—using your own models, data, and infrastructure—offers a solution, but implementation isn't straightforward. What governance frameworks need to be in place? How do you evaluate non-deterministic AI systems? When should you build in-house versus leveraging cloud services? As data and software teams evolve in this new landscape, understanding the technical requirements and workflow changes is essential for organizations looking to maintain control over their AI destiny. Manasi Vartak is Chief AI Architect and VP of Product Management (AI Platform) at Cloudera. She is a product and AI leader with more than a decade of experience at the intersection of AI infrastructure, enterprise software, and go-to-market strategy. At Cloudera, she leads product and engineering teams building low-code and high-code generative AI platforms, driving the company’s enterprise AI strategy and enabling trusted AI adoption across global organizations. Before joining Cloudera through its acquisition of Verta, Manasi was the founder and CEO of Verta, where she transformed her MIT research into enterprise-ready ML infrastructure. She scaled the company to multi-million ARR, serving Fortune 500 clients in finance, insurance, and capital markets, and led the launch of enterprise MLOps and GenAI products used in mission-critical workloads. Manasi earned her PhD in Computer Science from MIT, where she pioneered model management systems such as ModelDB — foundational work that influenced the development of tools like MLflow. Earlier in her career, she held research and engineering roles at Twitter, Facebook, Google, and Microsoft. In the episode, Richie and Manasi explore AI's role in financial services, the challenges of AI adoption in enterprises, the importance of data governance, the evolving skills needed for AI development, the future of AI agents, and much more. Links Mentioned in the Show: ClouderaCloudera Evolve ConferenceCloudera Agent StudioConnect with ManasiCourse: Introduction to AI AgentsRelated Episode: RAG 2.0 and The New Era of RAG Agents with Douwe Kiela, CEO at Contextual AI & Adjunct Professor at Stanford UniversityRewatch RADAR AI  New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

AI/ML Cloud Computing Computer Science Data Governance GenAI Microsoft MLOps RAG Cyber Security
DataFramed

AI-Driven Software Testing explores how Artificial Intelligence (AI) and Machine Learning (ML) are revolutionizing quality engineering (QE), making testing more intelligent, efficient, and adaptive. The book begins by examining the critical role of QE in modern software development and the paradigm shift introduced by AI/ML. It traces the evolution of software testing, from manual approaches to AI-powered automation, highlighting key innovations that enhance accuracy, speed, and scalability. Readers will gain a deep understanding of quality engineering in the age of AI, comparing traditional and AI-driven testing methodologies to uncover their advantages and challenges. Moving into practical applications, the book delves into AI-enhanced test planning, execution, and defect management. It explores AI-driven test case development, intelligent test environments, and real-time monitoring techniques that streamline the testing lifecycle. Additionally, it covers AI’s impact on continuous integration and delivery (CI/CD), predictive analytics for failure prevention, and strategies for scaling AI-driven testing across cloud platforms. Finally, it looks ahead to the future of AI in software testing, discussing emerging trends, ethical considerations, and the evolving role of QE professionals in an AI-first world. With real-world case studies and actionable insights, AI-Driven Software Testing is an essential guide for QE engineers, developers, and tech leaders looking to harness AI for smarter, faster, and more reliable software testing. What you will learn: • What are the key principles of AI/ML-driven quality engineering • What is intelligent test case generation and adaptive test automation • Explore predictive analytics for defect prevention and risk assessment • Understand integration of AI/ML tools in CI/CD pipelines Who this book is for: Quality Engineers looking to enhance software testing with AI-driven techniques. Data Scientists exploring AI applications in software quality assurance and engineering. Software Developers – Engineers seeking to integrate AI/ML into testing and automation workflows.

data ai-ml artificial-intelligence-ai artificial intelligence (ai) AI/ML Analytics CI/CD Cloud Computing
O'Reilly AI & ML Books

Agentic AI isn't just a buzzword; it's a fundamental shift in software. These intelligent, goal-driven applications operate autonomously, presenting unprecedented challenges for infrastructure teams. The old ways of building and running apps simply won't scale. Join us for this technical deep dive where we'll explore how to harness the power of Kubernetes and modern platform engineering principles to create a scalable, resilient, and observable foundation for your next-generation intelligent applications.

In this webinar, you'll learn:

  • The Shift: Understand why agentic apps require a new approach to infrastructure.
  • Kubernetes as the Orchestration Layer: See how core Kubernetes primitives can be used to manage the complete, often complex, lifecycle of AI agents.
  • The Power of a Platform Mindset: Discover how building an internal platform streamlines workflows, ensures security, and empowers developers to build and deploy intelligent software faster than ever.

By the end of this session, you'll have a clear, actionable roadmap for transforming your infrastructure into a powerful "Agent Fabric" capable of supporting the future of AI.

After a 30-minute talk there’ll be a 15-minute Q&A, for which we encourage you to submit questions in advance. A webinar recording and related materials will be shared with all attendees after the event.


Speaker: Abdel Sghiouar - Cloud Developer Advocate @ Google Cloud

Abdel is a senior Cloud Developer Advocate @ Google Cloud. A co-host of the Kubernetes Podcast by Google and a CNCF Ambassador. His focused areas are GKE/Kubernetes, Service Mesh, and Serverless. Abdel started his career in data centers and infrastructure in Morocco, where he is originally from, before moving to Google's largest EU data center in Belgium. Then in Sweden, he joined Google Cloud Professional Services and spent five years working with Google Cloud customers on architecting and designing large-scale distributed systems before turning to advocacy and community work.

Building and running agentic AI platforms on Kubernetes

Agentic AI isn't just a buzzword; it's a fundamental shift in software. These intelligent, goal-driven applications operate autonomously, presenting unprecedented challenges for infrastructure teams. The old ways of building and running apps simply won't scale. Join us for this technical deep dive where we'll explore how to harness the power of Kubernetes and modern platform engineering principles to create a scalable, resilient, and observable foundation for your next-generation intelligent applications.

In this webinar, you'll learn:

  • The Shift: Understand why agentic apps require a new approach to infrastructure.
  • Kubernetes as the Orchestration Layer: See how core Kubernetes primitives can be used to manage the complete, often complex, lifecycle of AI agents.
  • The Power of a Platform Mindset: Discover how building an internal platform streamlines workflows, ensures security, and empowers developers to build and deploy intelligent software faster than ever.

By the end of this session, you'll have a clear, actionable roadmap for transforming your infrastructure into a powerful "Agent Fabric" capable of supporting the future of AI.

After a 30-minute talk there’ll be a 15-minute Q&A, for which we encourage you to submit questions in advance. A webinar recording and related materials will be shared with all attendees after the event.


Speaker: Abdel Sghiouar - Cloud Developer Advocate @ Google Cloud

Abdel is a senior Cloud Developer Advocate @ Google Cloud. A co-host of the Kubernetes Podcast by Google and a CNCF Ambassador. His focused areas are GKE/Kubernetes, Service Mesh, and Serverless. Abdel started his career in data centers and infrastructure in Morocco, where he is originally from, before moving to Google's largest EU data center in Belgium. Then in Sweden, he joined Google Cloud Professional Services and spent five years working with Google Cloud customers on architecting and designing large-scale distributed systems before turning to advocacy and community work.

Building and running agentic AI platforms on Kubernetes
Brijesh Tripathi – CEO @ Flex AI , Tobias Macey – host

Summary In this crossover episode of the AI Engineering Podcast, host Tobias Macey interviews Brijesh Tripathi, CEO of Flex AI, about revolutionizing AI engineering by removing DevOps burdens through "workload as a service". Brijesh shares his expertise from leading AI/HPC architecture at Intel and deploying supercomputers like Aurora, highlighting how access friction and idle infrastructure slow progress. Join them as they discuss Flex AI's innovative approach to simplifying heterogeneous compute, standardizing on consistent Kubernetes layers, and abstracting inference across various accelerators, allowing teams to iterate faster without wrestling with drivers, libraries, or cloud-by-cloud differences. Brijesh also shares insights into Flex AI's strategies for lifting utilization, protecting real-time workloads, and spanning the full lifecycle from fine-tuning to autoscaled inference, all while keeping complexity at bay.

Pre-amble I hope you enjoy this cross-over episode of the AI Engineering Podcast, another show that I run to act as your guide to the fast-moving world of building scalable and maintainable AI systems. As generative AI models have grown more powerful and are being applied to a broader range of use cases, the lines between data and AI engineering are becoming increasingly blurry. The responsibilities of data teams are being extended into the realm of context engineering, as well as designing and supporting new infrastructure elements that serve the needs of agentic applications. This episode is an example of the types of work that are not easily categorized into one or the other camp.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. Your host is Tobias Macey and today I'm interviewing Brijesh Tripathi about FlexAI, a platform offering a service-oriented abstraction for AI workloadsInterview IntroductionHow did you get involved in machine learning?Can you describe what FlexAI is and the story behind it?What are some examples of the ways that infrastructure challenges contribute to friction in developing and operating AI applications?How do those challenges contribute to issues when scaling new applications/businesses that are founded on AI?There are numerous managed services and deployable operational elements for operationalizing AI systems. What are some of the main pitfalls that teams need to be aware of when determining how much of that infrastructure to own themselves?Orchestration is a key element of managing the data and model lifecycles of these applications. How does your approach of "workload as a service" help to mitigate some of the complexities in the overall maintenance of that workload?Can you describe the design and architecture of the FlexAI platform?How has the implementation evolved from when you first started working on it?For someone who is going to build on top of FlexAI, what are the primary interfaces and concepts that they need to be aware of?Can you describe the workflow of going from problem to deployment for an AI workload using FlexAI?One of the perennial challenges of making a well-integrated platform is that there are inevitably pre-existing workloads that don't map cleanly onto the assumptions of the vendor. What are the affordances and escape hatches that you have built in to allow partial/incremental adoption of your service?What are the elements of AI workloads and applications that you are explicitly not trying to solve for?What are the most interesting, innovative, or unexpected ways that you have seen FlexAI used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on FlexAI?When is FlexAI the wrong choice?What do you have planned for the future of FlexAI?Contact Info LinkedInParting Question From your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Links Flex AIAurora Super ComputerCoreWeaveKubernetesCUDAROCmTensor Processing Unit (TPU)PyTorchTritonTrainiumASIC == Application Specific Integrated CircuitSOC == System On a ChipLoveableFlexAI BlueprintsTenstorrentThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

AI/ML Aurora Cloud Computing Data Engineering Datafold DevOps ETL/ELT GenAI Kubernetes Prefect Data Streaming
Data Engineering Podcast

Comprehensive guide offering actionable strategies for enhancing human-centered AI, efficiency, and productivity in industrial and systems engineering through the power of AI. Advances in Artificial Intelligence Applications in Industrial and Systems Engineering is the first book in the Advances in Industrial and Systems Engineering series, offering insights into AI techniques, challenges, and applications across various industrial and systems engineering (ISE) domains. Not only does the book chart current AI trends and tools for effective integration, but it also raises pivotal ethical concerns and explores the latest methodologies, tools, and real-world examples relevant to today’s dynamic ISE landscape. Readers will gain a practical toolkit for effective integration and utilization of AI in system design and operation. The book also presents the current state of AI across big data analytics, machine learning, artificial intelligence tools, cloud-based AI applications, neural-based technologies, modeling and simulation in the metaverse, intelligent systems engineering, and more, and discusses future trends. Written by renowned international contributors for an international audience, Advances in Artificial Intelligence Applications in Industrial and Systems Engineering includes information on: Reinforcement learning, computer vision and perception, and safety considerations for autonomous systems (AS) (NLP) topics including language understanding and generation, sentiment analysis and text classification, and machine translation AI in healthcare, covering medical imaging and diagnostics, drug discovery and personalized medicine, and patient monitoring and predictive analysis Cybersecurity, covering threat detection and intrusion prevention, fraud detection and risk management, and network security Social good applications including poverty alleviation and education, environmental sustainability, and disaster response and humanitarian aid. Advances in Artificial Intelligence Applications in Industrial and Systems Engineering is a timely, essential reference for engineering, computer science, and business professionals worldwide.

data ai-ml artificial-intelligence-ai artificial intelligence (ai) AI/ML Analytics Big Data Cloud Computing Computer Science Data Analytics NLP Cyber Security
O'Reilly AI & ML Books

When: Thursday 18th September 2025 Time: arrive for 5:45pm with talks starting from 6pm start prompt. Location: Ecosurety, 2nd Floor, 4 Colston Ave, Bristol, BS1 4ST

Complimentary drinks & pizza provided by our sponsor Method Resourcing Solutions.

Sit tight for venue details, we'll have them for you ASAP!

Session 1 - Fabricating Your Move: A Tailored Path from Power BI Premium to Fabric - David Mitchell As Microsoft Power BI evolves into Fabric, organizations gain powerful new capabilities—but also face the challenge of transitioning from Premium capacities. In this session, discover a streamlined, automated approach to migrating from Power BI Premium to Fabric capacities using tools developed within Semantic Link Labs. We’ll cover:

  • Why Fabric capacities matter and the benefits they unlock
  • How to plan and execute your migration with minimal disruption
  • A live demonstration of the migration process in action

Learn how to future-proof your analytics environment with confidence, efficiency, and continuity for your users.

About David David Mitchell is a Cloud Solutions Architect at Microsoft, and is a key contributor to the migration solution. He brings over 10 years’ experience working in Information Analysis, Data Warehousing and Data Engineering.

Session 2 - Getting Started with Terraform and CI/CD for your Fabric Projects - Anna-Maria Wykes Microsoft Fabric is transforming how organizations build unified data platforms for analytics, data science, and business intelligence. Until recently, deploying and managing Fabric resources required manual effort or adhoc automation. That changed with the release of the Terraform provider for Microsoft Fabric last year, enabling teams to manage Fabric infrastructure as code.

In this session, you'll learn how to get started using Terraform to provision and manage Microsoft Fabric components — including workspaces, pipelines, dataflows, and more — in a repeatable and scalable way. We'll cover core Terraform concepts, walk through practical examples, and share best practices for integrating with Azure and CI/CD workflows. By the end of the session, you'll be equipped to bring automation, consistency, and governance to your Microsoft Fabric environments using Terraform.

About Anna Anna is a veteran software and data engineer as well as a Microsoft AI MVP, boasting over 18 years of experience. She has undertaken various projects, including real-time analytics with Scala and Kafka, constructing Data Lakes with Spark, and applying engineering to Data Science. Anna currently serves as a RSA (Resident Solution Architect) and consultant. With a genuine passion for data, she endeavours to bridge the gap between Software Development and Data Science. Anna's other areas of interest include DevOps (DataOps, MLOps, LLMOps), Agile methodologies, and organizing or participating in local Code Clubs.

We all look forward to seeing you there!!

September 2025 - Fabricating Your Move & Terraform and CI/CD for Fabric Projects