talk-data.com talk-data.com

Topic

Data Streaming

realtime event_processing data_flow

739

tagged

Activity Trend

70 peak/qtr
2020-Q1 2026-Q1

Activities

739 activities · Newest first

Advanced Polars: Lazy Queries and Streaming Mode

Do you find yourself struggling with Pandas' limitations when handling massive datasets or real-time data streams?

Discover Polars, the lightning-fast DataFrame library built in Rust. This talk presents two advanced features of the next-generation dataframe library: lazy queries and streaming mode.

Lazy evaluation in Polars allows you to build complex data pipelines without the performance bottlenecks of eager execution. By deferring computation, Polars optimises your queries using techniques like predicate and projection pushdown, reducing unnecessary computations and memory overhead. This leads to significant performance improvements, particularly with datasets larger than your system’s physical memory.

Polars' LazyFrames form the foundation of the library’s streaming mode, enabling efficient streaming pipelines, real-time transformations, and seamless integration with various data sinks.

This session will explore use cases and technical implementations of both lazy queries and streaming mode. We’ll also include live-coding demonstrations to introduce the tool, showcase best practices, and highlight common pitfalls.

Attendees will walk away with practical knowledge of lazy queries and streaming mode, ready to apply these tools in their daily work as data engineers or data scientists.

podcast_episode
by Michelle Meyer (Mastercard) , Cris deRitis , Mark Zandi (Moody's Analytics) , Marisa DiNatale (Moody's Analytics)

Mastercard Chief Economist Michelle Meyer joins the Inside Economics team to debate the health of the American consumer and the economy’s prospects. The team pushes hard on Michelle’s sanguine perspective, but she holds her ground—and then some. Listen in if you want a preview of the strength of holiday sales and what will power that growth. Guest: Michelle Meyer – Chief Economist, Mastercard (https://www.mastercardservices.com/en/overview/leadership/michelle-meyer) Explore the risks and realities shaping the economy in our new webinar, now streaming for free. U.S. Economic Outlook: Under Unprecedented Uncertainty Watch here: https://events.moodys.com/mc68453-wbn-2025-mau25777-us-macro-outlook-precipice-recession?mkt_tok=OT… Hosts: Mark Zandi – Chief Economist, Moody’s Analytics, Cris deRitis – Deputy Chief Economist, Moody’s Analytics, and Marisa DiNatale – Senior Director - Head of Global Forecasting, Moody’s Analytics Follow Mark Zandi on 'X' and BlueSky @MarkZandi, Cris deRitis on LinkedIn, and Marisa DiNatale on LinkedIn

Questions or Comments, please email us at [email protected]. We would love to hear from you.    To stay informed and follow the insights of Moody's Analytics economists, visit Economic View.

Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

Summary In this crossover episode of the AI Engineering Podcast, host Tobias Macey interviews Brijesh Tripathi, CEO of Flex AI, about revolutionizing AI engineering by removing DevOps burdens through "workload as a service". Brijesh shares his expertise from leading AI/HPC architecture at Intel and deploying supercomputers like Aurora, highlighting how access friction and idle infrastructure slow progress. Join them as they discuss Flex AI's innovative approach to simplifying heterogeneous compute, standardizing on consistent Kubernetes layers, and abstracting inference across various accelerators, allowing teams to iterate faster without wrestling with drivers, libraries, or cloud-by-cloud differences. Brijesh also shares insights into Flex AI's strategies for lifting utilization, protecting real-time workloads, and spanning the full lifecycle from fine-tuning to autoscaled inference, all while keeping complexity at bay.

Pre-amble I hope you enjoy this cross-over episode of the AI Engineering Podcast, another show that I run to act as your guide to the fast-moving world of building scalable and maintainable AI systems. As generative AI models have grown more powerful and are being applied to a broader range of use cases, the lines between data and AI engineering are becoming increasingly blurry. The responsibilities of data teams are being extended into the realm of context engineering, as well as designing and supporting new infrastructure elements that serve the needs of agentic applications. This episode is an example of the types of work that are not easily categorized into one or the other camp.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. Your host is Tobias Macey and today I'm interviewing Brijesh Tripathi about FlexAI, a platform offering a service-oriented abstraction for AI workloadsInterview IntroductionHow did you get involved in machine learning?Can you describe what FlexAI is and the story behind it?What are some examples of the ways that infrastructure challenges contribute to friction in developing and operating AI applications?How do those challenges contribute to issues when scaling new applications/businesses that are founded on AI?There are numerous managed services and deployable operational elements for operationalizing AI systems. What are some of the main pitfalls that teams need to be aware of when determining how much of that infrastructure to own themselves?Orchestration is a key element of managing the data and model lifecycles of these applications. How does your approach of "workload as a service" help to mitigate some of the complexities in the overall maintenance of that workload?Can you describe the design and architecture of the FlexAI platform?How has the implementation evolved from when you first started working on it?For someone who is going to build on top of FlexAI, what are the primary interfaces and concepts that they need to be aware of?Can you describe the workflow of going from problem to deployment for an AI workload using FlexAI?One of the perennial challenges of making a well-integrated platform is that there are inevitably pre-existing workloads that don't map cleanly onto the assumptions of the vendor. What are the affordances and escape hatches that you have built in to allow partial/incremental adoption of your service?What are the elements of AI workloads and applications that you are explicitly not trying to solve for?What are the most interesting, innovative, or unexpected ways that you have seen FlexAI used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on FlexAI?When is FlexAI the wrong choice?What do you have planned for the future of FlexAI?Contact Info LinkedInParting Question From your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Links Flex AIAurora Super ComputerCoreWeaveKubernetesCUDAROCmTensor Processing Unit (TPU)PyTorchTritonTrainiumASIC == Application Specific Integrated CircuitSOC == System On a ChipLoveableFlexAI BlueprintsTenstorrentThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Model Context Protocol: Principles and Practice

Large‑language‑model agents are only as useful as the context and tools they can reach.

Anthropic’s Model Context Protocol (MCP) proposes a universal, bidirectional interface that turns every external system—SQL databases, Slack, Git, web browsers, even your local file‑system—into first‑class “context providers.”

In just 30 minutes we’ll step from high‑level buzzwords to hands‑on engineering details:

  • How MCP’s JSON‑RPC message format, streaming channels, and version‑negotiation work under the hood.
  • Why per‑tool sandboxing via isolated client processes hardens security (and what happens when an LLM tries rm ‑rf /).
  • Techniques for hierarchical context retrieval that stretch a model’s effective window beyond token limits.
  • Real‑world patterns for accessing multiple tools—Postgres, Slack, GitHub—and plugging MCP into GenAI applications.

Expect code snippets and lessons from early adoption.

You’ll leave ready to wire your own services into any MCP‑aware model and level‑up your GenAI applications—without the N×M integration nightmare.

75% of GenAI projects fail to scale—not because the models lack sophistication, but because they’re built on fragmented data. If your systems don’t know who they're talking about, how can your AI deliver reliable insights?

This talk unveils how real-time Entity Resolution (ER) is becoming the silent engine behind trusted, AI-ready data architecture. We will discuss how organizations across financial services, public safety, and digital platforms are embedding ER into modern data stacks—delivering identity clarity, regulatory confidence, and faster outcomes without the drag of legacy MDM.

You’ll learn:

  • Why ER is foundational for AI trust, governance, and analytics
  • Patterns for embedding ER into streaming and event-driven architectures
  • How ecosystem partners and data platforms are amplifying ER value
  • How to build trust at the entity level—without slowing down innovation

Whether you’re modernizing architecture, launching AI programs, or tightening compliance, this session will equip you to embed trust from the ground up.

Data leaders today face a familiar challenge: complex pipelines, duplicated systems, and spiraling infrastructure costs. Standardizing around Kafka for real-time and Iceberg for large-scale analytics has gone some way towards addressing this but still requires separate stacks, leaving teams to stitch them together at high expense and risk.

This talk will explore how Kafka and Iceberg together form a new foundation for data infrastructure. One that unifies streaming and analytics into a single, cost-efficient layer. By standardizing on these open technologies, organizations can reduce data duplication, simplify governance, and unlock both instant insights and long-term value from the same platform.

You will come away with a clear understanding of why this convergence is reshaping the industry, how it lowers operational risk, and advantages it offers for building durable, future-proof data capabilities.

We are entering the Era of Experience, where AI agents will transform customer journeys by learning directly from interactions. But most customer-facing agents today are “senseless,” lacking the real-time context needed to deliver relevant, empathetic, and valuable experiences. This session will explore how real-time streaming architectures and proprietary customer data can power the next generation of intelligent, perceptive agents.

Join Snowplow’s Jon Su as he unpacks:

  • Why brands risk commoditization if they rely on third-party agents
  • How real-time context enables smarter, more personalized customer interactions
  • The key ingredients for building agents that perceive, adapt, and self-optimize
  • How Snowplow Signals provides the real-time customer intelligence foundation for agentic applications

Discover how to shift from static personalization to adaptive, agent-driven experiences that improve customer satisfaction, loyalty, and business outcomes.

Data leaders today face a familiar challenge: complex pipelines, duplicated systems, and spiraling infrastructure costs. Standardizing around Kafka for real-time and Iceberg for large-scale analytics has gone some way towards addressing this but still requires separate stacks, leaving teams to stitch them together at high expense and risk.

This talk will explore how Kafka and Iceberg together form a new foundation for data infrastructure. One that unifies streaming and analytics into a single, cost-efficient layer. By standardizing on these open technologies, organizations can reduce data duplication, simplify governance, and unlock both instant insights and long-term value from the same platform.

You will come away with a clear understanding of why this convergence is reshaping the industry, how it lowers operational risk, and advantages it offers for building durable, future-proof data capabilities.

The most sought-after products don’t just appear on shelves—they arrive at the perfect moment, in perfect condition, thanks to data that works as fast as the business moves.

From premium meats to peak-season produce, Morrisons, one of the UK’s largest retailers, is building a future where shelves are stocked with exactly what customers want, when they want it.

In this session, Peter Laflin, Chief Data Officer at Morrisons, joins Striim to share how real-time data streaming into Google Cloud enables smarter, faster, and more autonomous retail operations. He’ll unpack how Morrisons is moving beyond predictive models to build AI-native, agentic systems that can sense, decide, and act at scale. Topics include:

Live store operations that respond instantly to real-world signals

AI architectures that move from “data-informed” to “data-delegated” decisions

Practical lessons from embedding real-time thinking across teams and tech stacks

This is a session for retail and data leaders who are ready to move beyond dashboards and start building intelligent systems that deliver both customer delight and operational agility.

Data leaders today face a familiar challenge: complex pipelines, duplicated systems, and spiraling infrastructure costs. Standardizing around Kafka for real-time and Iceberg for large-scale analytics has gone some way towards addressing this but still requires separate stacks, leaving teams to stitch them together at high expense and risk.

This talk will explore how Kafka and Iceberg together form a new foundation for data infrastructure. One that unifies streaming and analytics into a single, cost-efficient layer. By standardizing on these open technologies, organizations can reduce data duplication, simplify governance, and unlock both instant insights and long-term value from the same platform.

You will come away with a clear understanding of why this convergence is reshaping the industry, how it lowers operational risk, and advantages it offers for building durable, future-proof data capabilities.

In this session, Paul Wilkinson, Principal Solutions Architect at Redpanda, will demonstrate Redpanda's native Iceberg capability: a game-changing addition that bridges the gap between real-time streaming and analytical workloads, eliminating the complexity of traditional data lake architectures while maintaining the performance and simplicity that Redpanda is known for.

Paul will explore how this new capability enables organizations to seamlessly transition streaming data into analytical formats without complex ETL pipelines or additional infrastructure overhead in a follow-along demo - allowing you to build your own streaming lakehouse and show it to your team!

Moving data between operational systems and analytics platforms is often a painful process. Traditional pipelines that transfer data in and out of warehouses tend to become complex, brittle, and expensive to maintain over time.

Much of this complexity, however, is avoidable. Data in motion and data at rest—Kafka Topics and Iceberg Tables—can be treated as two sides of the same coin. By establishing an equivalence between Topics and Tables, it’s possible to transparently map between them and rethink how pipelines are built.

This talk introduces a declarative approach to bridging streaming and table-based systems. By shifting complexity into the data layer, we can decompose complex, imperative pipelines into simpler, more reliable workflows

We’ll explore the design principles behind this approach, including schema mapping and evolution between Kafka and Iceberg, and how to build a system that can continuously materialize and optimize hundreds of thousands of topics as Iceberg tables.

Whether you're building new pipelines or modernizing legacy systems, this session will provide practical patterns and strategies for creating resilient, scalable, and future-proof data architectures.

In this session, we will explore how organisations can leverage ArcGIS to analyse spatial data within their data platforms, such as Databricks and Microsoft Fabric. We will discuss the importance of spatial data and its impact on decision-making processes. The session will cover various aspects, including the ingestion of streaming data using ArcGIS Velocity, the processing and management of large volumes of spatial data with ArcGIS GeoAnalytics for Microsoft Fabric, and the use of ArcGIS for visualisation and advanced analytics with GeoAI. Join us to discover how these tools can provide actionable insights and enhance operational efficiency.

podcast_episode
by Cris deRitis , Heather Long (Navy Federal Credit Union) , Mark Zandi (Moody's Analytics) , Marisa DiNatale (Moody's Analytics)

Heather Long, Chief Economist at Navy Federal Credit Union, joins the Inside Economics team to discuss what she dubbed the K-Shaped Economy. She describes the reasons why the well-to-do are thriving and the bottom 80% of the income distribution is struggling to make ends meet. The team discusses the origins of this skewing in the income distribution and why we should care about it.  Explore the risks and realities shaping the economy in our new webinar, now streaming for free. U.S. Economic Outlook: Under Unprecedented Uncertainty Watch here: https://events.moodys.com/mc68453-wbn-2025-mau25777-us-macro-outlook-precipice-recession?mkt_tok=OT… Hosts: Mark Zandi – Chief Economist, Moody’s Analytics, Cris deRitis – Deputy Chief Economist, Moody’s Analytics, and Marisa DiNatale – Senior Director - Head of Global Forecasting, Moody’s Analytics Follow Mark Zandi on 'X' and BlueSky @MarkZandi, Cris deRitis on LinkedIn, and Marisa DiNatale on LinkedIn

Questions or Comments, please email us at [email protected]. We would love to hear from you.    To stay informed and follow the insights of Moody's Analytics economists, visit Economic View.

Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.