talk-data.com
People (151 results)
See all 151 →Activities & events
| Title & Speakers | Event |
|---|---|
|
Spark Meetup NYC - Apache Spark in production
2025-12-09 · 22:30
Calling all Spark enthusiasts in NYC! We’re excited to announce another in-person Spark meetup right here in New York City! 🎉 Join us for an evening packed with technical talks, real-world insights, and great conversations about different ways to deploy Spark. Join us as we bring together top Spark experts and real-world practitioners to dive into insights, share best practices, and network with the community shaping the future of big data. Space is limited, so RSVP soon to grab your spot: https://luma.com/1ciohwxl?tk=gFeBc0 5:30 PM – Mingling, name tags, pizzas and beers 6:00 PM – Meetup begins • Kickoff, intros, and logistics •Preet Mehta, Manager Healthcare Analytics at EXL – "Implementing Medallion Architecture and Unified Data Quality Framework in Databricks for Healthcare data." •Alex Merced, Head of DevRel at Dremio – "Apache Iceberg Spark Procedures: Simplifying Table Management and Optimization." 7:00 PM – Panel: Spark in production (Databricks/ EMR/ K8s...) 7:30 PM – Networking and mingling 8:00 PM – Wrap it up |
Spark Meetup NYC - Apache Spark in production
|
|
Apache Polaris: The Definitive Guide
2025-09-17
Revolutionize your understanding of modern data management with Apache Polaris (incubating), the open source catalog designed for data lakehouse industry standard Apache Iceberg. This comprehensive guide takes you on a journey through the intricacies of Apache Iceberg data lakehouses, highlighting the pivotal role of Iceberg catalogs. Authors Alex Merced, Andrew Madson, and Tomer Shiran explore Apache Polaris's architecture and features in detail, equipping you with the knowledge needed to leverage its full potential. Data engineers, data architects, data scientists, and data analysts will learn how to seamlessly integrate Apache Polaris with popular data tools like Apache Spark, Snowflake, and Dremio to enhance data management capabilities, optimize workflows, and secure datasets. Get a comprehensive introduction to Iceberg data lakehouses Understand how catalogs facilitate efficient data management and querying in Iceberg Explore Apache Polaris's unique architecture and its powerful features Deploy Apache Polaris locally, and deploy managed Apache Polaris from Snowflake and Dremio Perform basic table operations on Apache Spark, Snowflake, and Dremio |
O'Reilly Data Engineering Books
|
|
DataLakehouseHub.com Informal Data Lakehouse Social NYC
2025-07-21 · 22:00
Please join us for this informal time to hang out with Data Professionals in NYC. Join Alex Merced, creator of DataLakehouseHub.com, Head of DevRel at Dremio, and co-author of O'Reilly books "Apache Iceberg: The Definitive Guide" and "Apache Polaris: The Definitive Guide." Alex will be at a TBD location in midtown Manhattan on Monday, July 21st from 6:00 to 8:00 to discuss data and data lakehouses in an informal, casual setting. Get a drink, talk about the latest data tech, and relax! To find out the location for this event, join the DataLakehouseHub Slack Workspace and post in the confirmation thread for the event in the #meetup-nyc channel. Invitation Link https://join.slack.com/t/thedatalakehousehub/shared_invite/zt-274yc8sza-mI2zhCW8LGkOh1uxuf8T5Q RSVP Today! |
DataLakehouseHub.com Informal Data Lakehouse Social NYC
|
|
Iceberg Ahead! Navigating Modern Data Lakes with Streaming Power
2025-05-28 · 18:30
Join us for a live, interactive session with Alex Merced, author of Apache Iceberg: The Definitive Guide and Senior Developer Advocate at Dremio. In this power-packed session, we’ll explore:
By the end of this session, you’ll walk away with: ✅ A clear understanding of how Iceberg fits into modern data stacks ✅ Insights on streaming vs batch in lakehouse architecture ✅ Answers to your burning questions — Alex will be taking live Q&A 📅 Don’t miss this chance to learn from one of the leading voices in the data engineering space! Youtube live stream link: https://www.youtube.com/watch?v=piLXA5NA4l0 If you are interested in hosting / speaking at a meetup, please email [email protected] |
Iceberg Ahead! Navigating Modern Data Lakes with Streaming Power
|
|
Join us at Akamai’s office for refreshments for a few presentations about the Iceberg Lakehouse from 6-9pm. Presentation include: ———— Architecting an Iceberg Lakehouse From Alex Merced, Head of DevRel, Dremio Building a production-grade Iceberg lakehouse takes more than just a table format—it requires a full architectural blueprint. This talk takes a layered view of the modern lakehouse stack, walking through the essential components: cloud object storage, data ingestion pipelines, catalog and governance layers, federation and integration points, and query and consumption engines. We'll explore how Iceberg fits into each layer, how to make key design decisions, and what best practices have emerged for assembling a scalable, open data platform. Whether you're building from scratch or maturing an existing stack, this session is your architectural guide to making Iceberg work in the real world. ——— Building Egnatia: How We’re Powering Next-Gen Data at Akamai with Apache Iceberg From Endi Caushi, Senior Software Engineer, Akamai At Akamai, we’ve embarked on a journey to modernize our data infrastructure with Apache Iceberg at its core. In this session, we’ll share how we’re building Egnatia, our internal data platform designed to be scalable, efficient, and cloud-native—empowering teams across the company to work more effectively with data. We’ll talk about why we chose Iceberg to underpin our lakehouse architecture, and how it’s helping us solve critical challenges like data consistency, delayed processing, and rigid schemas. From replacing legacy Perl-based pipelines to enabling real-time backfilling, Egnatia is transforming how we ingest, manage, and analyze data. We’ll also highlight the technologies surrounding Iceberg in our ecosystem—including DuckDB for preprocessing, Argo Workflows for orchestration, Volcano as a batch scheduler, Hive Metastore as a data catalog and Trino for interactive queries—as well as how we’re running this stack on Kubernetes and S3-compatible object storage. We’ll close with a behind-the-scenes look at our roadmap—how we’re evolving the Egnatia platform to scale further, improve reliability and observability, and accelerate data delivery across the organization with greater accuracy and efficiency. ------- Amazon S3 Sean Sullivan staff engineer, Grubhub Modern systems need to manage an increasing amount of data. Amazon's Simple Storage Service (S3) is an important building block for large data applications. In this talk, we will share our experiences and lessons learned from using S3 in Java based systems. We'll cover technical challenges that Java developers face when writing S3 code with the AWS SDK. We will discuss design decisions, application dependencies, and error handling. We'll share mistakes that we made and bugs that we encountered. ------ Register Today and make sure to include your full name with Registration and ID will be needed to enter venue. Venue details will be sent to all registrants on this meetup.com event |
Boston Lakehouse Meetup: Iceberg in Action (LIVE - Dremio/Akamai - Cambridge, MA
|
|
Data Engineering Meetup - Data & AI
2024-11-21 · 18:00
Welcome to the new edition of Data Engineering London on Data & AI! Join us for the fifth edition of the Data Engineering meetup with a range of talks looking at data & AI. You'll have the chance to network and meet fellow data engineers (and other data enthusiasts)! When? 18:00 - 18:30 Networking with food and drinks from Mastercard 18:30 - 19:45 Talks 19:45 - 20:30 More networking Where? Mastercard offices (see address) Speakers and Talks: 1. Building resilient pipelines for MLOps - Lex Avstreikh (Head of Strategy @ Hopsworks) 2. GenAI be your Waze: Navigating the new way of work - Chloé Roumengas (AI Product Manager @ Theodo) 3. Apache Iceberg\, Lakehouses and Their Role in AI - Alex Merced (Senior Tech Evangelist @ Dremio) If you have a topic you're passionate about and wish to see discussed, let us know! We're always looking for more talks for our future events. Places are limited, make sure you register! |
Data Engineering Meetup - Data & AI
|
|
On se retrouve chez Criteo le Mercredi 20 novembre 2024 dès 18h30 pour le prochain meetup Modern Data Stack sur le thème de la Composable Data Platform et de l'un de ses composants essentiels : Apache Iceberg Nous remercions les sociétés Dremio et Criteo qui sponsorisent ce meetup. 👉 Première session à 19h00 : Apache Software Foundation, zoom sur les solutions DATA et plus particulièrement sur Apache Iceberg et Apache Polaris (incubating): aujourd'hui et demain Jean-Baptiste Onofré, Board Member de The Apache Software Foundation nous fait découvrir les coulisses de la fondation, et nous parle des projets tendances qui occupent la scène DATA, de l'ingestion à la dataviz en passant par le stockage et le streaming, on découvre les projets importants et les nouvelles pépites ! Après cet overview "Apache", on zoom sur Apache Iceberg dans un contexte d'architecture Lakehouse, nous verrons les évolutions attendues sur la specification V3 et sur le protocole REST. Ce sera l'occasion de présenter Apache Polaris (incubating) comme implémentation du protocole REST: les bénéfices et les fonctionnalités permettant d'adresser de nouveaux cas d'usage. 👉 On enchaîne vers 19h30 avec une session en Anglais: Apache Iceberg REST Catalog: Making Catalog Interoperability Happen Alex Merced, Dremio Tech Evangelist will explore the transformative impact of the Apache Iceberg REST Catalog specification, detailing how it fosters greater compatibility and interoperability with various tools across the data ecosystem. Attendees will understand the challenges associated with disparate catalog systems and how the REST Catalog Interface effectively addresses these issues. By standardizing catalog interactions, the REST Catalog specification enhances the robustness and flexibility of the Iceberg ecosystem, enabling seamless integration and management of diverse data sources. Alex will also discuss real-world applications and best practices for leveraging the REST Catalog to optimize data workflows and improve operational efficiency. Talks are a must-attend for data engineers, architects, and practitioners eager to drive forward the capabilities of their composable data platforms. 👉 De 20h00 à 21h00 drinks & causeries au coin de la DATA (merci à nos sponsors Dremio et CRITEO). Renseignements : stephane (at) datanosco.com If you plan to join the party, please like/comment the following LinkedIn post : https://www.linkedin.com/posts/modern-data-stack-france_data-lakehouse-dataengineers-activity-7255158211882708992-r3Hl/ A savoir, clôture des inscriptions 24h avant l'événement. La liste des inscrits sera transmise à Criteo. Pensez à mettre à jour votre profil meetup avec votre nom ET prénom. Une carte d'identité vous sera demandée à l'entrée. |
Composable Data Platform, zoom sur Apache Iceberg et Apache Polaris, chez CRITEO
|
|
Open Source Data Deep Dive: London
2024-11-19 · 19:00
REGISTER HERE FOR LOCATION: https://lu.ma/2etm1zve Come hang out at the OSS Data Deep Dive in London, where we'll explore some of the coolest and innovative use cases of the Iceberg ecosystem. Whether you're new to Iceberg, data lakehouses, or you’re a seasoned data engineer, discover how these tools can boost your data projects. Plus, there'll be plenty of networking, cool swag, and delicious food. Hope to see you there! Agenda:
|
Open Source Data Deep Dive: London
|
|
Open Source Data Deep Dives
2024-11-19 · 18:00
RSVP HERE: Https://lu.ma/2etm1zve Join us at the OSS Data Deep Dive in London for an in-depth workshop on Data Engineering Best Practices. This event is perfect for professionals who are keen to enhance their skills in handling big data efficiently. Plus, there'll be plenty of networking, cool swag, and delicious food. Hope to see you there. Agenda:
Our expert speakers will delve into topics like data modeling, ETL processes, data pipelines, and database architecture. Whether you are a seasoned data engineer or just starting in the field, this workshop will provide valuable insights and practical tips to streamline your data engineering projects. Don't miss out on this opportunity to network with fellow enthusiasts and take your data engineering skills to the next level! |
Open Source Data Deep Dives
|
|
Apache Iceberg REST Catalog: Making Catalog Interoperability Happen
2024-09-19 · 01:00
alex merced
– Developer Advocate
@ Dremio
apache iceberg
rest catalog
|
|
|
Real-time and Offline Graph Analytics on Apache Iceberg
2024-09-19 · 01:00
weimo liu
– CEO
@ Puppygraph
apache iceberg
|
|
|
Open Source and the Data Lakehouse - Alex Merced
2024-06-27 · 16:00
Presentation Title: Open Source and the Data Lakehouse Description The open data lakehouse offers those frustrated with the costs and complex pipelines of using traditional warehouses an alternative that offers performance with affordability and simpler pipelines. In this talk, we'll be talking about technologies that are making the open data lakehouse possible. In this talk we will learn: What is a data lakehouse What are the components of a data lakehouse What is Apache Arrow What is Apache Iceberg What is Project Nessie |
Open Source and the Data Lakehouse - Alex Merced
|
|
Apache Iceberg: The Definitive Guide
2024-05-02
Traditional data architecture patterns are severely limited. To use these patterns, you have to ETL data into each tool—a cost-prohibitive process for making warehouse features available to all of your data. The lack of flexibility with these patterns requires you to lock into a set of priority tools and formats, which creates data silos and data drift. This practical book shows you a better way. Apache Iceberg provides the capabilities, performance, scalability, and savings that fulfill the promise of an open data lakehouse. By following the lessons in this book, you'll be able to achieve interactive, batch, machine learning, and streaming analytics with this high-performance open source format. Authors Tomer Shiran, Jason Hughes, and Alex Merced from Dremio show you how to get started with Iceberg. With this book, you'll learn: The architecture of Apache Iceberg tables What happens under the hood when you perform operations on Iceberg tables How to further optimize Iceberg tables for maximum performance How to use Iceberg with popular data engines such as Apache Spark, Apache Flink, and Dremio Discover why Apache Iceberg is a foundational technology for implementing an open data lakehouse. |
O'Reilly Data Engineering Books
|
|
Version Your Data Lakehouse Like Your Software With Nessie
2024-03-10 · 15:45
Alex Merced
– Developer Advocate
@ Dremio
,
Tobias Macey
– host
Summary Data lakehouse architectures are gaining popularity due to the flexibility and cost effectiveness that they offer. The link that bridges the gap between data lake and warehouse capabilities is the catalog. The primary purpose of the catalog is to inform the query engine of what data exists and where, but the Nessie project aims to go beyond that simple utility. In this episode Alex Merced explains how the branching and merging functionality in Nessie allows you to use the same versioning semantics for your data lakehouse that you are used to from Git. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. It is an open-source, cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. Your team can get up and running in minutes thanks to Dagster Cloud, an enterprise-class hosted solution that offers serverless and hybrid deployments, enhanced security, and on-demand ephemeral test deployments. Go to dataengineeringpodcast.com/dagster today to get started. Your first 30 days are free! Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Join us at the top event for the global data community, Data Council Austin. From March 26-28th 2024, we'll play host to hundreds of attendees, 100 top speakers and dozens of startups that are advancing data science, engineering and AI. Data Council attendees are amazing founders, data scientists, lead engineers, CTOs, heads of data, investors and community organizers who are all working together to build the future of data and sharing their insights and learnings through deeply technical talks. As a listener to the Data Engineering Podcast you can get a special discount off regular priced and late bird tickets by using the promo code dataengpod20. Don't miss out on our only event this year! Visit dataengineeringpodcast.com/data-council and use code dataengpod20 to register today! Your host is Tobias Macey and today I'm interviewing Alex Merced, developer advocate at Dremio and co-author of the upcoming book from O'reilly, "Apache Iceberg, The definitive Guide", about Nessie, a git-like versioned catalog for data lakes using Apache Iceberg Interview Introduction How did you get involved in the area of data management? Can you describe what Nessie is and the story behind it? What are the core problems/complexities that Nessie is designed to solve? The closest analogue to Nessie that I've seen in the ecosystem is LakeFS. What are the features that would lead someone to choose one or the other for a given use case? Why would someone choose Nessie over native table-level branching in the Apache Iceberg spec? How do the versioning capabilities compare to/augment the data versioning in Iceberg? What are some of the sources of, and challenges in resolving, merge conflicts between table branches? Can you describe the architecture of Nessie? How have the design and goals of the project changed since it was first created? What is involved |
Data Engineering Podcast |