SQL

SQL Server 2025: The AI-ready enterprise database

2025-11-18 · Microsoft Ignite 2025 Watch

breakout

by Bob Ward (Azure Data) , Sirjad Parakkat (Ivanti) , Abhinav Tiwari (Ivanti)

AI/ML Analytics API JSON Fabric

SQL Server 2025 redefines what's possible for the enterprise data platform. With developer-first features and seamless integration with analytics and AI models, SQL Server 2025 accelerates AI innovation using the data you already own. Build modern apps with native JSON and REST APIs and harness AI with built-in vector search. Increase application availability with optimized locking and use Fabric mirroring for near real-time analytics. Join us to see why this is the most advanced SQL Server.

Use Azure Migrate for AI assisted insights and cloud transformation

2025-11-18 · Microsoft Ignite 2025 Watch

breakout

by Vishal Jain (Microsoft) , Anant Raigaga (Itron Inc) , Shashank Bansal (Microsoft)

AI/ML Azure Cloud Computing Java Linux postgresql

Discover how you can make the most of your IT estate migrations and modernizations with the newest AI capabilities. This session guides IT teams through assessing current environments, setting goals, and creating a business case with Azure Migrate for all of your workload types like Windows Server, SQL Server, .NET, Linux, PostgreSQL, Java, and more. We’ll explore tools to inventory workloads, map dependencies, and create actionable migration roadmaps.

Build real‑time analytics with Cosmos DB in Microsoft Fabric

2025-11-18 · Microsoft Ignite 2025

talk

by Mark Brown (Microsoft) , Jasmine Greenaway (Microsoft)

Analytics Cosmos ETL/ELT Microsoft Fabric Spark

In this lab you'll help a coffee shop unify their operational and analytical workloads with Cosmos DB in Microsoft Fabric. You'll blend operational data with curated sources using cross-database SQL, stream and visualize real-time POS events, and create a gold layer for personalization. Finally, you'll implement reverse ETL to Cosmos for lightning-fast serving and train a lightweight Spark notebook model to deliver the right offer at the right time before your customer’s order is ready.

Please RSVP and arrive at least 5 minutes before the start time, at which point remaining spaces are open to standby attendees.

Innovation Session: Microsoft Fabric and Azure Databases - the data estate for AI

2025-11-18 · Microsoft Ignite 2025 Watch

breakout

by Arun Ulag (Microsoft) , Michel Porter (EY)

AI/ML Analytics Azure BI Microsoft Fabric Power BI postgresql

In today’s AI-driven economy, data isn’t just an asset, it’s your differentiator. Join us for major announcements across Microsoft Fabric, Power BI, Azure SQL, and Azure PostgreSQL, and see how these innovations come together to deliver a unified, intelligent data foundation for AI. Experience customer success stories, live demos, and an inside look at the roadmap shaping the future of analytics, transactional workloads, and real-time insights all in one integrated experience.

Elevate SQL development with VSCode, GitHub Copilot and new drivers

2025-11-18 · Microsoft Ignite 2025

theater

by Anna Hoffman (Azure Data) , David Levy (Microsoft)

AI/ML Azure GitHub Fabric Python Cyber Security

This demo-heavy session highlights the enhanced MSSQL extension for Visual Studio Code, now more robust than ever with new AI-driven enhancements to streamline your SQL development experience. With GitHub Copilot, you can move faster from schema to code, generate sample data, explore relationships, and help your app and backend stay in sync. With our latest mssql-python driver, you can develop with ease, security, and performance, across SQL Server, Azure SQL and SQL database in Fabric.

Power intelligent data management from on-premises to the Cloud

2025-11-18 · Microsoft Ignite 2025

theater

by David Stamen (Pure Storage) , Anthony Nocentino (Pure Storage)

Azure Cloud Computing Data Management Microsoft

Discover how Pure partners with Microsoft to deliver a transformative, future-ready data management approach that empowers customers to innovate and efficiently scale from on-premises to cloud. This session will highlight Pure's latest capabilities and product offerings, specifically focusing on advancements in our SQL Server 2025 integration for FlashArray and Pure Storage Cloud Azure Native.

Build new AI Applications with Azure SQL Databases

2025-11-18 · Microsoft Ignite 2025

talk

by Brian Spendolini (Microsoft) , Aaron S Saidi (Microsoft)

AI/ML Azure GenAI Microsoft RAG

Join us for a hands-on workshop showcasing the latest Azure SQL innovations to supercharge your applications. Learn how to harness generative AI alongside Azure SQL Database to elevate your data strategies with AI concepts like language models, prompt engineering, Retrieval Augmented Generation (RAG) and streamline with Microsoft Copilot. Whether you're a developer, architect, or IT professional, this workshop is your ticket to mastering SQL and AI to stay ahead in the data-driven landscape.

Please RSVP and arrive at least 5 minutes before the start time, at which point remaining spaces are open to standby attendees.

Community Connection Pod: SQL Server Management Studio: Help Drive What's Next

2025-11-18 · Microsoft Ignite 2025

talk

by Makena Watanabe Barickman (Microsoft)

It's been a big year for SSMS with two releases and a lot of new features, and we want to hear what you think. What have you liked? What's been a challenge? What's still missing? Bring your questions and feedback for some open and constructive conversations about SSMS and what's next.

Connection Pods accommodate up to 15 people. Please RSVP and arrive at least 5 minutes before the start time, at which point remaining spaces are open to standby attendees.

Migrate and Modernize Windows and SQL Server Workloads to Azure

2025-11-18 · Microsoft Ignite 2025

talk

by Raunak Pandya (Microsoft) , Vikram Bansal (Microsoft)

Azure Cloud Computing Microsoft

Learn more about migration and modernization of Windows Server, SQL Server, and .NET apps. Learn how to assess and migrate Windows Server and SQL Server workloads using Azure Migrate and Azure Database Migration Service (DMS). This lab covers discovery, dependency mapping, migration execution, and post-migration optimization as well as leveraging Microsoft Defender for cloud for securing workloads post migration.

Please RSVP and arrive at least 5 minutes before the start time, at which point remaining spaces are open to standby attendees.

Modern data, modern apps: Innovation with Microsoft Databases

2025-11-18 · Microsoft Ignite 2025 Watch

breakout

by Kirill Gavrylyuk (Microsoft) , Shivakumar Vaithyanathan (Adobe) , Shireesh Thota (Microsoft) , Priya Sathy (Azure Data) , Charles Feddersen (Microsoft)

AI/ML Microsoft NoSQL

Whether you’re modernizing for peak performance and AI readiness or building the next generation of intelligent apps and agents, the Microsoft database portfolio fuels your vision. Join CVP Shireesh Thota for an inside look at the latest innovations across SQL, NoSQL, and open-source databases—featuring dynamic demos that reveal how to boost productivity, deliver rich, personalized experiences, and unlock new possibilities from your data.

State, Scale, and Signals: Rethinking Orchestration with Durable Execution

2025-11-16 · Data Engineering Podcast Listen

podcast_episode

by Preeti Somal (Temporal) , Tobias Macey

AI/ML Airflow Cloud Computing Dagster Data Engineering Data Management Data Quality Datafold dbt ETL/ELT Prefect Python +2 more

Summary In this episode Preeti Somal, EVP of Engineering at Temporal, talks about the durable execution model and how it reshapes the way teams build reliable, stateful systems for data and AI. She explores Temporal’s code‑first programming model—workflows, activities, task queues, and replay—and how it eliminates hand‑rolled retry, checkpoint, and error‑handling scaffolding while letting data remain where it lives. Preeti shares real-world patterns for replacing DAG-first orchestration, integrating application and data teams through signals and Nexus for cross-boundary calls, and using Temporal to coordinate long-running, human-in-the-loop, and agentic AI workflows with full observability and auditability. Shee also discusses heuristics for choosing Temporal alongside (or instead of) traditional orchestrators, managing scale without moving large datasets, and lessons from running durable execution as a cloud service.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Preeti Somal about how to incorporate durable execution and state management into AI application architectures Interview IntroductionHow did you get involved in the area of data management?Can you describe what durable execution is and how it impacts system architecture?With the strong focus on state maintenance and high reliability, what are some of the most impactful ways that data teams are incorporating tools like Temporal into their work?One of the core primitives in Temporal is a "workflow". How does that compare to similar primitives in common data orchestration systems such as Airflow, Dagster, Prefect, etc.? What are the heuristics that you recommend when deciding which tool to use for a given task, particularly in data/pipeline oriented projects? Even if a team is using a more data-focused orchestration engine, what are some of the ways that Temporal can be applied to handle the processing logic of the actual data?AI applications are also very dependent on reliable data to be effective in production contexts. What are some of the design patterns where durable execution can be integrated into RAG/agent applications?What are some of the conceptual hurdles that teams experience when they are starting to adopt Temporal or other durable execution frameworks?What are the most interesting, innovative, or unexpected ways that you have seen Temporal/durable execution used for data/AI services?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Temporal?When is Temporal/durable execution the wrong choice?What do you have planned for the future of Temporal for data and AI systems? Contact Info LinkedIn Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story. Links TemporalDurable ExecutionFlinkMachine Learning EpochSpark StreamingAirflowDirected Acyclic Graph (DAG)Temporal NexusTensorZeroAI Engineering Podcast Episode The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Extending Microsoft Copilot Studio with Connectors, Plugins, and Custom Logic

2025-11-12 · M365UK Extend Copilot Studio, Recall in Windows11 & Power Platform docs Nov 2025

talk

by Shadrack Kiprotich

ai ethics api calls copilot studio custom connectors dataverse deployment document summarization external apis governance knowledge mining plugins power platform connectors +3 more

Microsoft Copilot Agents are quickly becoming a core part of modern enterprise applications, blending AI with workflow automation to accelerate digital transformation. With Microsoft Copilot Studio, developers and solution architects can design, extend, and integrate custom AI-powered assistants that operate securely within the Microsoft ecosystem. This session takes a deep dive into the technical capabilities of Copilot Studio and demonstrates how to build Copilot agents that go far beyond simple Q&A. We’ll cover end-to-end development patterns authoring conversational logic, integrating with Power Platform connectors, calling APIs and plugins, and leveraging Dataverse for secure data access. Attendees will also learn how to apply responsible AI principles, manage lifecycle deployment, and optimize performance in real-world scenarios. Technical Takeaways: By the end of this session, attendees will be able to: 1. Author and Customize Copilot Agents – Build a Copilot agent from scratch in Copilot Studio, design conversation flows, and implement prompt engineering patterns. 2. Integrate with Power Platform – Automate approvals, orchestrate workflows, and trigger Power Automate flows directly from Copilot interactions. 3. Connect to Data Sources – Use Dataverse, SharePoint, SQL, and external APIs to fetch, update, and process business-critical data securely. 4. Extend Functionality – Implement custom connectors, plugins, and API calls to extend Copilot beyond Microsoft 365 and tailor it for industry-specific use cases. 5. Enhance Productivity with AI – Embed capabilities like document summarization, knowledge mining, translation, and report generation into enterprise workflows. 6. Manage Governance and Deployment – Apply AI ethics, responsible usage, security, and monitoring practices to ensure compliance and scalable adoption. This session is designed for developers, solution architects, and IT professionals who want to move past demos and actually build enterprise-grade Copilot agents. Through real-world use cases and technical walkthroughs, attendees will leave with a blueprint for integrating Copilot Studio into modern business solutions.

The AI Data Paradox: High Trust in Models, Low Trust in Data

2025-11-09 · Data Engineering Podcast Listen

podcast_episode

by Ariel Pohoryles (Rivery) , Tobias Macey

AI/ML Analytics Cloud Computing Data Engineering Data Management Data Quality Datafold dbt ETL/ELT GenAI Marketing Master Data Management +3 more

Summary In this episode of the Data Engineering Podcast Ariel Pohoryles, head of product marketing for Boomi's data management offerings, talks about a recent survey of 300 data leaders on how organizations are investing in data to scale AI. He shares a paradox uncovered in the research: while 77% of leaders trust the data feeding their AI systems, only 50% trust their organization's data overall. Ariel explains why truly productionizing AI demands broader, continuously refreshed data with stronger automation and governance, and highlights the challenges posed by unstructured data and vector stores. The conversation covers the need to shift from manual reviews to automated pipelines, the resurgence of metadata and master data management, and the importance of guardrails, traceability, and agent governance. Ariel also predicts a growing convergence between data teams and application integration teams and advises leaders to focus on high-value use cases, aggressive pipeline automation, and cataloging and governing the coming sprawl of AI agents, all while using AI to accelerate data engineering itself.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Ariel Pohoryles about data management investments that organizations are making to enable them to scale AI implementationsInterview IntroductionHow did you get involved in the area of data management?Can you start by describing the motivation and scope of your recent survey on data management investments for AI across your respondents?What are the key takeaways that were most significant to you?The survey reveals a fascinating paradox: 77% of leaders trust the data used by their AI systems, yet only half trust their organization's overall data quality. For our data engineering audience, what does this suggest about how companies are currently sourcing data for AI? Does it imply they are using narrow, manually-curated "golden datasets," and what are the technical challenges and risks of that approach as they try to scale?The report highlights a heavy reliance on manual data quality processes, with one expert noting companies feel it's "not reliable to fully automate validation" for external or customer data. At the same time, maturity in "Automated tools for data integration and cleansing" is low, at only 42%. What specific technical hurdles or organizational inertia are preventing teams from adopting more automation in their data quality and integration pipelines?There was a significant point made that with generative AI, "biases can scale much faster," making automated governance essential. From a data engineering perspective, how does the data management strategy need to evolve to support generative AI versus traditional ML models? What new types of data quality checks, lineage tracking, or monitoring for feedback loops are required when the model itself is generating new content based on its own outputs?The report champions a "centralized data management platform" as the "connective tissue" for reliable AI. How do you see the scale and data maturity impacting the realities of that effort?How do architectural patterns in the shape of cloud warehouses, lakehouses, data mesh, data products, etc. factor into that need for centralized/unified platforms?A surprising finding was that a third of respondents have not fully grasped the risk of significant inaccuracies in their AI models if they fail to prioritize data management. In your experience, what are the biggest blind spots for data and analytics leaders?Looking at the maturity charts, companies rate themselves highly on "Developing a data management strategy" (65%) but lag significantly in areas like "Automated tools for data integration and cleansing" (42%) and "Conducting bias-detection audits" (24%). If you were advising a data engineering team lead based on these findings, what would you tell them to prioritize in the next 6-12 months to bridge the gap between strategy and a truly scalable, trustworthy data foundation for AI?The report states that 83% of companies expect to integrate more data sources for their AI in the next year. For a data engineer on the ground, what is the most important capability they need to build into their platform to handle this influx?What are the most interesting, innovative, or unexpected ways that you have seen teams addressing the new and accelerated data needs for AI applications?What are some of the noteworthy trends or predictions that you have for the near-term future of the impact that AI is having or will have on data teams and systems?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links BoomiData ManagementIntegration & Automation DemoAgentstudioData Connector Agent WebinarSurvey ResultsData GovernanceShadow ITPodcast EpisodeThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Using dbt to manage your data warehouse

2025-11-05 · Belgium dbt Meetup #12 (in-person)

talk

by Kobe Thuwis (Infofarm)

Airflow dbt Git Trino outsystems

In this twofold session, I'll cover how we've used dbt to bring order in heaps of SQL statements used to manage a datawarehouse. I'd like to share how dbt made our team more efficient and our data warehouse more resilient. Secondly, I'll highlight why dbt enabled a way forward on supporting low-code applications: by leveraging our data warehouse as a backend. I'll dive into systemic design, application architecture & data modelling. Tools/tech covered will be SQL, Trino, Outsystems, GIT, Airflow and of course dbt! Expect practical insights, architectural patterns, and lessons learned from a real-world implementation.

Building Distributed DuckDB Processing for Lakes

2025-11-05 · Small Data SF 2025

talk

by George Fraser (Fivetran)

Data Lake Delta DuckDB Fivetran Iceberg

DuckDB is the best way to execute SQL on a single node. But with its embedding-friendly nature, it makes an excellent foundation for building distributed systems. George Fraser, CEO of Fivetran, will tell us how Fivetran used DuckDB to power its Iceberg data lake writer—coordinating thousands of small, parallel tasks across a fleet of workers, each running DuckDB queries on bounded datasets. The result is a high-throughput, dual-format (Iceberg + Delta) data lake architecture where every write scales linearly, snapshots stay perfectly in sync, and performance rivals a commercial database while remaining open and portable.

From Zero to "Query": Building Your First Serverless Lakehouse with DuckLake

2025-11-04 · Small Data SF 2025

workshop

by Jacob Matson (MotherDuck)

Big Data Cloud Computing Data Lakehouse Motherduck

The lakehouse promised to unify our data, but popular formats can feel bloated and hard to use for most real-world workloads. If you've ever felt that the complexity and operational overhead of "Big Data" tools are overkill, you're not alone. What if your lakehouse could be simple, fast, and maybe even a little fun? Enter DuckLake , the native lakehouse format, managed on MotherDuck. It delivers the powerful features you need like ACID transactions, time travel, and schema evolution without the heavyweight baggage. This approach truly makes massive data sets feel like Small Data. This workshop is a practical, step-by-step walkthrough for the data practitioner. We'll get straight to the point and show you how to build a fully functional, serverless lakehouse from scratch. You will learn: The Architecture: We’ll explore how DuckLake's design choices make it fundamentally simpler and faster for analytical queries compared to its JVM-based cousins. The Workflow: Through hands-on examples, you'll create a DuckLake table, perform atomic updates, and use time travel—all with the simple SQL you already know. The MotherDuck Advantage: Discover how the serverless platform makes it easy to manage, share, and query your DuckLake tables, enabling a seamless hybrid workflow between your laptop and the cloud.

Bridging the AI–Data Gap: Collect, Curate, Serve

2025-11-02 · Data Engineering Podcast Listen

podcast_episode

by Ido Bronstein (Upriver) , Omri Lifshitz (Upriver) , Tobias Macey

AI/ML Cloud Computing Data Engineering Data Management Data Quality Datafold dbt ETL/ELT LLM Prefect Python RAG +1 more

Summary In this episode of the Data Engineering Podcast Omri Lifshitz (CTO) and Ido Bronstein (CEO) of Upriver talk about the growing gap between AI's demand for high-quality data and organizations' current data practices. They discuss why AI accelerates both the supply and demand sides of data, highlighting that the bottleneck lies in the "middle layer" of curation, semantics, and serving. Omri and Ido outline a three-part framework for making data usable by LLMs and agents: collect, curate, serve, and share challenges of scaling from POCs to production, including compounding error rates and reliability concerns. They also explore organizational shifts, patterns for managing context windows, pragmatic views on schema choices, and Upriver's approach to building autonomous data workflows using determinism and LLMs at the right boundaries. The conversation concludes with a look ahead to AI-first data platforms where engineers supervise business semantics while automation stitches technical details end-to-end.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Omri Lifshitz and Ido Bronstein about the challenges of keeping up with the demand for data when supporting AI systemsInterview IntroductionHow did you get involved in the area of data management?We're here to talk about "The Growing Gap Between Data & AI". From your perspective, what is this gap, and why do you think it's widening so rapidly right now?How does this gap relate to the founding story of Upriver? What problems were you and your co-founders experiencing that led you to build this?The core premise of new AI tools, from RAG pipelines to LLM agents, is that they are only as good as the data they're given. How does this "garbage in, garbage out" problem change when the "in" is not a static file but a complex, high-velocity, and constantly changing data pipeline?Upriver is described as an "intelligent agent system" and an "autonomous data engineer." This is a fascinating "AI to solve for AI" approach. Can you describe this agent-based architecture and how it specifically works to bridge that data-AI gap?Your website mentions a "Data Context Layer" that turns "tribal knowledge" into a "machine-usable mode." This sounds critical for AI. How do you capture that context, and how does it make data "AI-ready" in a way that a traditional data catalog or quality tool doesn't?What are the most innovative or unexpected ways you've seen companies trying to make their data "AI-ready"? And where are the biggest points of failure you observe?What has been the most challenging or unexpected lesson you've learned while building an AI system (Upriver) that is designed to fix the data foundation for other AI systems?When is an autonomous, agent-based approach not the right solution for a team's data quality problems? What organizational or technical maturity is required to even start closing this data-AI gap?What do you have planned for the future of Upriver? And looking more broadly, how do you see this gap between data and AI evolving over the next few years?Contact Info Ido - LinkedInOmri - LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links UpriverRAG == Retrieval Augmented GenerationAI Engineering Podcast EpisodeAI AgentContext WindowModel Finetuning)The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

How Boels Unlocks the Power of Snowflake & Fivetran to Improve the Customers’ Rental Journey

2025-10-30 · Snowflake World Tour Amsterdam

session

Analytics Fivetran SAP Snowflake

Boels Rental is one of Europe’s leading providers of equipment and tool hire. The Data & Analytics program sought to unify data on Snowflake, standardize analytics, and establish a cohesive framework for data-driven decision-making.

In this session, Roy Louvenberg and Ralph Knoops at Boels Rental, will share how the team leveraged Fivetran to rapidly and securely connect diverse data sources into a single central platform—empowering business operations, driving insights, and maximizing ROI.

Join this session to discover:

•Why fresh, reliable data is essential for analytics and business processes at Boels Rental •How Fivetran enables seamless integration from a wide range of sources, including SAP S/4HANA, Db2, and SQL Server

Transforming Documents, Images and Audio in Snowflake Cortex AI

2025-10-30 · Snowflake World Tour Amsterdam

session

AI/ML Analytics Snowflake

In this session, discover how organizations are extracting actionable insights from text, documents, images and audio — all in Snowflake Cortex AI. This session reveals practical techniques for building integrated multimodal analytics pipelines using Cortex AI SQL functions and Document AI. Learn how to orchestrate complex, multi-step data analysis across previously siloed data types — simply, with SQL.

The Lego Approach to Data Transformation: Building Modular DBT Pipelines at Scale

2025-10-28 · New York dbt Meetup | The Lego Approach

talk

by Aravind Ramesh (DoubleVerify) , Mahima Arya (DoubleVerify)

dbt

Overview of how DoubleVerify applied core programming principles (abstraction, modularity, DRY) to transform scattered SQL into reusable DBT packages. A three-layer architecture—raw data, standardized signals, and modular packages—enables building scalable, reusable DBT pipelines that work with any conforming input and reduce onboarding time.

talk-data.com

Activity Trend

Top Events

Top Speakers

SQL Server 2025: The AI-ready enterprise database

Use Azure Migrate for AI assisted insights and cloud transformation

Build real‑time analytics with Cosmos DB in Microsoft Fabric

Innovation Session: Microsoft Fabric and Azure Databases - the data estate for AI

Elevate SQL development with VSCode, GitHub Copilot and new drivers

Power intelligent data management from on-premises to the Cloud

Build new AI Applications with Azure SQL Databases

Community Connection Pod: SQL Server Management Studio: Help Drive What's Next

Migrate and Modernize Windows and SQL Server Workloads to Azure

Modern data, modern apps: Innovation with Microsoft Databases

State, Scale, and Signals: Rethinking Orchestration with Durable Execution

Extending Microsoft Copilot Studio with Connectors, Plugins, and Custom Logic

The AI Data Paradox: High Trust in Models, Low Trust in Data

Using dbt to manage your data warehouse

Building Distributed DuckDB Processing for Lakes

From Zero to "Query": Building Your First Serverless Lakehouse with DuckLake

Bridging the AI–Data Gap: Collect, Curate, Serve

How Boels Unlocks the Power of Snowflake & Fivetran to Improve the Customers’ Rental Journey

Transforming Documents, Images and Audio in Snowflake Cortex AI

The Lego Approach to Data Transformation: Building Modular DBT Pipelines at Scale