talk-data.com
Activities & events
| Title & Speakers | Event |
|---|---|
|
Bridging the AI–Data Gap: Collect, Curate, Serve
2025-11-02 · 19:31
Summary In this episode of the Data Engineering Podcast Omri Lifshitz (CTO) and Ido Bronstein (CEO) of Upriver talk about the growing gap between AI's demand for high-quality data and organizations' current data practices. They discuss why AI accelerates both the supply and demand sides of data, highlighting that the bottleneck lies in the "middle layer" of curation, semantics, and serving. Omri and Ido outline a three-part framework for making data usable by LLMs and agents: collect, curate, serve, and share challenges of scaling from POCs to production, including compounding error rates and reliability concerns. They also explore organizational shifts, patterns for managing context windows, pragmatic views on schema choices, and Upriver's approach to building autonomous data workflows using determinism and LLMs at the right boundaries. The conversation concludes with a look ahead to AI-first data platforms where engineers supervise business semantics while automation stitches technical details end-to-end. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Omri Lifshitz and Ido Bronstein about the challenges of keeping up with the demand for data when supporting AI systemsInterview IntroductionHow did you get involved in the area of data management?We're here to talk about "The Growing Gap Between Data & AI". From your perspective, what is this gap, and why do you think it's widening so rapidly right now?How does this gap relate to the founding story of Upriver? What problems were you and your co-founders experiencing that led you to build this?The core premise of new AI tools, from RAG pipelines to LLM agents, is that they are only as good as the data they're given. How does this "garbage in, garbage out" problem change when the "in" is not a static file but a complex, high-velocity, and constantly changing data pipeline?Upriver is described as an "intelligent agent system" and an "autonomous data engineer." This is a fascinating "AI to solve for AI" approach. Can you describe this agent-based architecture and how it specifically works to bridge that data-AI gap?Your website mentions a "Data Context Layer" that turns "tribal knowledge" into a "machine-usable mode." This sounds critical for AI. How do you capture that context, and how does it make data "AI-ready" in a way that a traditional data catalog or quality tool doesn't?What are the most innovative or unexpected ways you've seen companies trying to make their data "AI-ready"? And where are the biggest points of failure you observe?What has been the most challenging or unexpected lesson you've learned while building an AI system (Upriver) that is designed to fix the data foundation for other AI systems?When is an autonomous, agent-based approach not the right solution for a team's data quality problems? What organizational or technical maturity is required to even start closing this data-AI gap?What do you have planned for the future of Upriver? And looking more broadly, how do you see this gap between data and AI evolving over the next few years?Contact Info Ido - LinkedInOmri - LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links UpriverRAG == Retrieval Augmented GenerationAI Engineering Podcast EpisodeAI AgentContext WindowModel Finetuning)The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA |
Data Engineering Podcast |
|
Building Flexible RAG Systems
2025-09-19 · 16:00
As part of the Future of Data and AI: Agentic AI Conference, join us for an immersive, hands-on workshop that equips you with practical strategies to build and optimize RAG retrievers for high-performing Retrieval-Augmented Generation systems. You'll:
📌 Registration is required. Register now to secure your spot. |
Building Flexible RAG Systems
|
|
Building Flexible RAG Systems
2025-09-19 · 16:00
As part of the Future of Data and AI: Agentic AI Conference, join us for an immersive, hands-on workshop that equips you with practical strategies to build and optimize RAG retrievers for high-performing Retrieval-Augmented Generation systems. You'll:
📌 Registration is required. Register now to secure your spot. |
Building Flexible RAG Systems
|
|
Building Flexible RAG Systems
2025-09-19 · 16:00
As part of the Future of Data and AI: Agentic AI Conference, join us for an immersive, hands-on workshop that equips you with practical strategies to build and optimize RAG retrievers for high-performing Retrieval-Augmented Generation systems. You'll:
📌 Registration is required. Register now to secure your spot. |
Building Flexible RAG Systems
|
|
From RAG to Relational: How Agentic Patterns Are Reshaping Data Architecture
2025-09-18 · 00:24
Mark Brooker
– VP and Distinguished Engineer
@ AWS
,
Tobias Macey
– host
Summary In this episode of the AI Engineering Podcast Mark Brooker, VP and Distinguished Engineer at AWS, talks about how agentic workflows are transforming database usage and infrastructure design. He discusses the evolving role of data in AI systems, from traditional models to more modern approaches like vectors, RAG, and relational databases. Mark explains why agents require serverless, elastic, and operationally simple databases, and how AWS solutions like Aurora and DSQL address these needs with features such as rapid provisioning, automated patching, geodistribution, and spiky usage. The conversation covers topics including tool calling, improved model capabilities, state in agents versus stateless LLM calls, and the role of Lambda and AgentCore for long-running, session-isolated agents. Mark also touches on the shift from local MCP tools to secure, remote endpoints, the rise of object storage as a durable backplane, and the need for better identity and authorization models. The episode highlights real-world patterns like agent-driven SQL fuzzing and plan analysis, while identifying gaps in simplifying data access, hardening ops for autonomous systems, and evolving serverless database ergonomics to keep pace with agentic development. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Marc Brooker about the impact of agentic workflows on database usage patterns and how they change the architectural requirements for databasesInterview IntroductionHow did you get involved in the area of data management?Can you describe what the role of the database is in agentic workflows?There are numerous types of databases, with relational being the most prevalent. How does the type and purpose of an agent inform the type of database that should be used?Anecdotally I have heard about how agentic workloads have become the predominant "customers" of services like Neon and Fly.io. How would you characterize the different patterns of scale for agentic AI applications? (e.g. proliferation of agents, monolithic agents, multi-agent, etc.)What are some of the most significant impacts on workload and access patterns for data storage and retrieval that agents introduce?What are the categorical differences in that behavior as compared to programmatic/automated systems?You have spent a substantial amount of time on Lambda at AWS. Given that LLMs are effectively stateless, how does the added ephemerality of serverless functions impact design and performance considerations around having to "re-hydrate" context when interacting with agents?What are the most interesting, innovative, or unexpected ways that you have seen serverless and database systems used for agentic workloads?What are the most interesting, unexpected, or challenging lessons that you have learned while working on technologies that are supporting agentic applications?Contact Info BlogLinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links AWS Aurora DSQLAWS LambdaThree Tier ArchitectureVector DatabaseGraph DatabaseRelational DatabaseVector EmbeddingRAG == Retrieval Augmented GenerationAI Engineering Podcast EpisodeGraphRAGAI Engineering Podcast EpisodeLLM Tool CallingMCP == Model Context ProtocolA2A == Agent 2 Agent ProtocolAWS Bedrock AgentCoreStrandsLangChainKiroThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA |
Data Engineering Podcast |
|
MongoDB 8.0 in Action, Third Edition
2025-07-10
Arkadiusz Borucki
– author
Deliver flexible, scalable, and high-performance data storage that's perfect for AI and other modern applications with MongoDB 8.0 and MongoDB Atlas multi-cloud data platform. In MongoDB 8.0 in Action, Third Edition you'll find comprehensive coverage of the latest version of MongoDB 8.0 and the MongoDB Atlas multi-cloud data platform. Learn to utilize MongoDB’s flexible schema design for data modeling, scale applications effectively using advanced sharding features, integrate full-text and vector-based semantic search, and more. This totally revised new edition delivers engaging hands-on tutorials and examples that put MongoDB into action! In MongoDB 8.0 in Action, Third Edition you'll: Master new features in MongoDB 8.0 Create your first, free Atlas cluster using the Atlas CLI Design scalable NoSQL databases with effective data modeling techniques Master Vector Search for building GenAI-driven applications Utilize advanced search capabilities in MongoDB Atlas, including full-text search Build Event-Driven Applications with Atlas Stream Processing Deploy and manage MongoDB Atlas clusters both locally and in the cloud using the Atlas CLI Leverage the Atlas SQL interface for familiar SQL querying Use MongoDB Atlas Online Archive for efficient data management Establish robust security practices including encryption Master backup and restore strategies Optimize database performance and identify slow queries MongoDB 8.0 in Action, Third Edition offers a clear, easy-to-understand introduction to everything in MongoDB 8.0 and MongoDB Atlas—including new advanced features such as embedded config servers in sharded clusters, or moving an unsharded collection to a different shard. The book also covers Atlas stream processing, full text search, and vector search capabilities for generative AI applications. Each chapter is packed with tips, tricks, and practical examples you can quickly apply to your projects, whether you're brand new to MongoDB or looking to get up to speed with the latest version. About the Technology MongoDB is the database of choice for storing structured, semi-structured, and unstructured data like business documents and other text and image files. MongoDB 8.0 introduces a range of exciting new features—from sharding improvements that simplify the management of distributed data, to performance enhancements that stay resilient under heavy workloads. Plus, MongoDB Atlas brings vector search and full-text search features that support AI-powered applications. About the Book MongoDB 8.0 in Action, Third Edition you’ll learn how to take advantage of all the new features of MongoDB 8.0, including the powerful MongoDB Atlas multi-cloud data platform. You’ll start with the basics of setting up and managing a document database. Then, you’ll learn how to use MongoDB for AI-driven applications, implement advanced stream processing, and optimize performance with improved indexing and query handling. Hands-on projects like creating a RAG-based chatbot and building an aggregation pipeline mean you’ll really put MongoDB into action! What's Inside The new features in MongoDB 8.0 Get familiar with MongoDB’s Atlas cloud platform Utilizing sharding enhancements Using vector-based search technologies Full-text search capabilities for efficient text indexing and querying About the Reader For developers and DBAs of all levels. No prior experience with MongoDB required. About the Author Arek Borucki is a MongoDB Champion, certified MongoDB and MongoDB Atlas administrator with expertise in distributed systems, NoSQL databases, and Kubernetes. Quotes An excellent resource with real-world examples and best practices to design, optimize, and scale modern applications. - Advait Patel, Broadcom Essential MongoDB resource. Covers new features such as full-text search, vector search, AI, and RAG applications. - Juan Roy, Credit Suisse Reflects author’s practical experience and clear teaching style. It’s packed with real-world examples and up-to-date insights. - Rajesh Nair, MongoDB Champion & community leader This book will definitely make you a MongoDB star! - Vinicios Wentz, JP Morgan & Chase Co. |
O'Reilly Data Engineering Books
|
|
How Contextual AI deploys specialized RAG agents in production with GCP
2025-04-11 · 19:45
Suds Narasimhan
– Product Manager
@ Google Cloud
,
Douwe Kiela
– CEO
@ Contextual AI
As AI adoption accelerates, many enterprises still face challenges building production-grade AI systems for high-value, knowledge-intensive use cases. RAG 2.0 is Contextual AI’s unique approach for solving mission-critical AI use cases, where accuracy requirements are high and there is a low tolerance for error. In this talk, Douwe Kiela—CEO of Contextual AI and co-inventor of RAG—will share lessons learned from deploying enterprise AI systems at scale. He will shed light on how RAG 2.0 differs from classic RAG, the common pitfalls and limitations while moving into production, and why AI practitioners would benefit from focusing less on individual model components and more on the systems-level perspective. You will also learn how Google Cloud’s flexible, reliable, and performant AI infrastructure enabled Contextual AI to build and operate their end-to-end platform. |
Google Cloud Next '25
|
|
PyDataMCR July!
2024-07-30 · 17:30
PyDataMCR July! THE TALKS Mastering LLM Workflows: Benchmarking, Tools, and Best Practices - Patrick Callery (he/him) Having worked helping businesses adopt LLM workflows such as RAG pipelines and function calling, a common challenge is benchmarking and measuring how well the application is performing. This talk aims to explore some of the existing open-source tooling to help with this and look at some best practices when building out such applications. Patrick is a Machine Learning Engineer at Brainpool.ai with experience in productionising machine learning systems and AI applications focussing on MLOps and integrating software engineering best practices into the domain of ML. BI on a budget - Shaun Hide (he/him) Modern BI tools such as Tableau and PowerBI come with extensive functionality but also a substantial cost per head for organisations. This talk aims to explore the intricacies of adapting streamlit to offer a flexible, functional and budget alternative to modern BI tools. Shaun is Head of Data Science at DailyFeed with experience in solving end to end data problems, often on a budget! His previous experience has spanned leveraging Databricks to deliver a data science feature store and churn models at TalkTalk to interrogating transactional data at card linked offer company, Reward. LOCATION We'll be at Northcoders at Manchester Technology Centre, who are kindly supplying catering. The capacity is limited to 80. EVENT GUIDELINES PyDataMCR is a strictly professional event, as such professional behaviour is expected. PyDataMCR is a chapter of PyData, an educational program of NumFOCUS and thus abides by the NumFOCUS Code of Conduct https://pydata.org/code-of-conduct.html Please take a moment to familiarise yourself with its contents. ACCESSIBILITY Under 16s welcome with a responsible guardian. The venue and toilets are accessible with a lift from reception. A quiet room is available if required. SPONSORS Thank you to NUMFocus for sponsoring Meetup and further support Thank you to AutoTrader for sponsoring PyDataMCR. Thank you to NorthCoders for an awesome venue and catering! |
PyDataMCR July!
|
|
AI Meetup: End of Year Celebration for AI, GenAI, LLMs and ML
2023-12-06 · 17:00
*** RSVP: https://www.aicamp.ai/event/eventdetails/W2023120609 As the year winds down and the holiday spirits ramp up, we invite you to the most electrifying AI meetup (end-of-year edition). Instead of our usual scheduled talks, we will throw the most exciting holiday bash for people who build AI! Join AI developers, ML engineers, Data scientists and practitioners to celebrate all your hard work this year. Agenda:
Tech Talk: Bring your AI application from Prototype to Production Speaker: Philip Vollet, Edward Schmuhl @Weaviate Abstract: In this talk, we will give you insights on how to build ai-native applications with the power of LLMs at scale. Based on a real-life example we will cover: - Core concepts and aspects of (long context) Retrieval Augmented Generation - The role of vector databases in RAG - Easy integration with leading players in the AI open-source ecosystem - Choosing various deployment methods - What to keep in mind for production-ready applications (e.g scalability\, multi-tenancy and more) Tech Talk: Building flexible and production-grade RAG applications Speaker: Mathis Lucka @Deepset Abstract: Learn how Haystack lets you take matters into your own hands when implementing RAG applications. You‘ll walk away with a better understanding of version 2.0 of Haystack and how it streamlines the LLM application building experience. Tech Talk: Machine Learning Project Pitfalls Speaker: Andrey Holz @EPAM Systems Abstract: This presentation addresses the multifaceted challenges of machine learning (ML) projects, with a focus on striking a balance between technical expertise and strategic business alignment. Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking, we invite you to submit topics for consideration: Submit Topics Startup Showcase: We have 5\~10 demo desks available for the community showcases and \~5 minutes quick demo on the stage. We will have judges and attendees vote for projects, with 3 amazing awards (best of innovation, best of technology, popular vote). You are invited to apply here Community Showcase Sponsors: We are actively seeking sponsors to support AI developers community. Whether it is by offering venue spaces, providing food, or cash sponsorship. Sponsors will have the chance to speak at the meetups, receive prominent recognition, and gain exposure to our extensive membership base of 8,000+ local or 300K+ developers worldwide. Community on Slack - Event chat: chat and connect with speakers and attendees - Sharing blogs\, events\, job openings\, projects collaborations Join Slack (browse/search and join #berlin channel) |
AI Meetup: End of Year Celebration for AI, GenAI, LLMs and ML
|