I’m passionate about building high-throughput distributed systems and making complex data platforms simple, resilient, and scalable. Today, I’m the Co-Founder and CTO of Ryft, building an Iceberg management platform.
talk-data.com
Topic
Iceberg
Apache Iceberg
13
tagged
Activity Trend
Top Events
Data leaders today face a familiar challenge: complex pipelines, duplicated systems, and spiraling infrastructure costs. Standardizing around Kafka for real-time and Iceberg for large-scale analytics has gone some way towards addressing this but still requires separate stacks, leaving teams to stitch them together at high expense and risk.
This talk will explore how Kafka and Iceberg together form a new foundation for data infrastructure. One that unifies streaming and analytics into a single, cost-efficient layer. By standardizing on these open technologies, organizations can reduce data duplication, simplify governance, and unlock both instant insights and long-term value from the same platform.
You will come away with a clear understanding of why this convergence is reshaping the industry, how it lowers operational risk, and advantages it offers for building durable, future-proof data capabilities.
Discover how to build a powerful AI Lakehouse and unified data fabric natively on Google Cloud. Leverage BigQuery's serverless scale and robust analytics capabilities as the core, seamlessly integrating open data formats with Apache Iceberg and efficient processing using managed Spark environments like Dataproc. Explore the essential components of this modern data environment, including data architecture best practices, robust integration strategies, high data quality assurance, and efficient metadata management with Google Cloud Data Catalog. Learn how Google Cloud's comprehensive ecosystem accelerates advanced analytics, preparing your data for sophisticated machine learning initiatives and enabling direct connection to services like Vertex AI.
Get ready for a customer story that’s as bold as it is eye-opening. In this session, Eutelsat and DataOps.live pull back the curtain on what it really takes to deliver business-changing outcomes with a specific focus on the Use Cases addressed by Apache at the core. And these Use Cases are BIG – think about big, big numbers, and you still aren’t even close!
You’ll hear the inside story of how Eutelsat found itself with two “competing” cloud data platforms. What could have been an expensive headache turned out to be an advantage: Iceberg made it not only possible but cheaper and simpler to use both together, unlocking agility and cost savings that no single platform alone could provide.
The impact is already tangible. Telemetry pipelines are live and delivering massive value. Next up: interoperable Data Products seamlessly moving from Snowflake to Cloudera and vice versa, driving cross-platform innovation. And that’s just the start—Eutelsat is also positioning Iceberg as a future-proof standard for data sharing and export.
This is a story of scale, speed, and simplification—the kind of transformation only possible when a visionary team meets the right technology.
We’re excited to be back at Big Data LDN this year—huge thanks to the organisers for hosting Databricks London once more!
Join us for an evening of insights, networking, and community with the Databricks Team and Advancing Analytics!
🎤 Agenda:
6:00 PM – 6:10 PM | Kickoff & Warm Welcome
Grab a drink, say hi, and get the lowdown on what’s coming up. We’ll set the scene for an evening of learning and laughs.
6:10 PM – 6:50 PM | The Metadata Marathon: How three projects are racing forward – Holly Smith (Staff Developer Advocate, Databricks)
With the enormous amount of discussion about open storage formats between nerds and even not-nerds, it can be hard to keep track of who’s doing what and how this actually makes any impact on day to day data projects.
Holly will take a closer look at the three big projects in this space; Delta, Hudi and Iceberg. They’re all trying to solve for similar data problems and have tackled the various challenges in different ways. Her talk will start with the very basics of how we got here, what the history is before diving deep into the underlying tech, their roadmaps, and their impacts on the data landscape as a whole.
6:50 PM – 7:10 PM | What’s New in Databricks & Databricks AI – Simon Whiteley & Gavi Regunath
Hot off the press! Simon and Gavi will walk you through the latest and greatest from Databricks, including shiny new AI features and platform updates you’ll want to try ASAP.
7:10 PM onwards | Q&A Panel + Networking
Your chance to ask the experts anything—then stick around for drinks, snacks, and some good old-fashioned data geekery.
Data leaders today face a familiar challenge: complex pipelines, duplicated systems, and spiraling infrastructure costs. Standardizing around Kafka for real-time and Iceberg for large-scale analytics has gone some way towards addressing this but still requires separate stacks, leaving teams to stitch them together at high expense and risk.
This talk will explore how Kafka and Iceberg together form a new foundation for data infrastructure. One that unifies streaming and analytics into a single, cost-efficient layer. By standardizing on these open technologies, organizations can reduce data duplication, simplify governance, and unlock both instant insights and long-term value from the same platform.
You will come away with a clear understanding of why this convergence is reshaping the industry, how it lowers operational risk, and advantages it offers for building durable, future-proof data capabilities.
The modern enterprise is increasingly defined by the need for open, governed, and intelligent data access. This session explores how Apache Iceberg, Dremio, and the Model Context Protocol (MCP) come together to enable the Agentic Lakehouse. A data platform that is interoperable, high-performing, and AI-ready.
We’ll begin with Apache Iceberg, which provides the foundation for data interoperability across teams and organisations, ensuring shared datasets can be reliably accessed and evolved. From there, we’ll highlight how Dremio extends Iceberg with turnkey governance, management, and performance acceleration, unifying your lakehouse with databases and warehouses under one platform. Finally, we’ll introduce MCP and showcase how innovations like the Dremio MCP server enable natural-language analytics on your data.
With the power of Dremio’s built-in semantic layer, AI agents and humans alike can ask complex business questions in plain language and receive accurate, governed answers.
Join us to learn how to unlock the next generation of data interaction with the Agentic Lakehouse.
Unlock the true potential of your data with the Qlik Open Lakehouse, a revolutionary approach to Iceberg integration designed for the enterprise. Many organizations face the pain points of managing multiple, costly data platforms and struggling with low-latency ingestion. While Apache Iceberg offers robust features like ACID transactions and schema evolution, achieving optimal performance isn't automatic; it requires sophisticated maintenance. Introducing the Qlik Open Lakehouse, a fully managed and optimized solution built on Apache Iceberg, powered by Qlik's Adaptive Iceberg Optimizer. Discover how you can do data differently and achieve 10x faster queries, a 33-42% reduction in file API overhead, and ultimately, a 50% reduction in costs through streamlined operations and compute savings.
Data leaders today face a familiar challenge: complex pipelines, duplicated systems, and spiraling infrastructure costs. Standardizing around Kafka for real-time and Iceberg for large-scale analytics has gone some way towards addressing this but still requires separate stacks, leaving teams to stitch them together at high expense and risk.
This talk will explore how Kafka and Iceberg together form a new foundation for data infrastructure. One that unifies streaming and analytics into a single, cost-efficient layer. By standardizing on these open technologies, organizations can reduce data duplication, simplify governance, and unlock both instant insights and long-term value from the same platform.
You will come away with a clear understanding of why this convergence is reshaping the industry, how it lowers operational risk, and advantages it offers for building durable, future-proof data capabilities.
In this session, Paul Wilkinson, Principal Solutions Architect at Redpanda, will demonstrate Redpanda's native Iceberg capability: a game-changing addition that bridges the gap between real-time streaming and analytical workloads, eliminating the complexity of traditional data lake architectures while maintaining the performance and simplicity that Redpanda is known for.
Paul will explore how this new capability enables organizations to seamlessly transition streaming data into analytical formats without complex ETL pipelines or additional infrastructure overhead in a follow-along demo - allowing you to build your own streaming lakehouse and show it to your team!
So you’ve heard of Databricks, but still not sure what the fuss is all about. Yes you’ve heard it’s Spark, but then there’s this Delta thing that’s both a data lake and a data warehouse (isn’t that what Iceberg is?) And then there's Unity Catalog, that's not just a catalog, it also does access management but even surprising things like optimise your data and programmatic access to lineage and billing? But then serverless came out and now you don’t even have to learn Spark? And of course there’s a bunch of AI stuff to use or create yourself. So why not spend 30 mins learning the details of what Databricks does, and how it can turn you into a rockstar Data Engineer.
In today’s rapidly evolving data landscape, organisations face increasing pressure to maintain control and sovereignty over their data. After a quick introduction to Apache Iceberg and Apache Polaris (Incubating), this session will dive into a real world use case demonstrating how these technologies can power a robust, governance focused data platform. We’ll explore strategies to secure access to data, discuss upcoming roadmap features like RBAC, FGAC, and ABAC, and show how to build custom extensions to tailor governance to your organisation’s needs.
Moving data between operational systems and analytics platforms is often a painful process. Traditional pipelines that transfer data in and out of warehouses tend to become complex, brittle, and expensive to maintain over time.
Much of this complexity, however, is avoidable. Data in motion and data at rest—Kafka Topics and Iceberg Tables—can be treated as two sides of the same coin. By establishing an equivalence between Topics and Tables, it’s possible to transparently map between them and rethink how pipelines are built.
This talk introduces a declarative approach to bridging streaming and table-based systems. By shifting complexity into the data layer, we can decompose complex, imperative pipelines into simpler, more reliable workflows
We’ll explore the design principles behind this approach, including schema mapping and evolution between Kafka and Iceberg, and how to build a system that can continuously materialize and optimize hundreds of thousands of topics as Iceberg tables.
Whether you're building new pipelines or modernizing legacy systems, this session will provide practical patterns and strategies for creating resilient, scalable, and future-proof data architectures.