talk-data.com talk-data.com

Filter by Source

Select conferences and events

People (132 results)

See all 132 →
Showing 10 results

Activities & events

Title & Speakers Event
Ryan Blue – guest @ Netflix
AI Council 2025
Ryan Blue – Creator of Apache Iceberg and co-founder @ Tabular , Ali Ghodsi – CEO @ Databricks

Speakers: Ali Ghodsi, Co-founder and CEO, Databricks Ryan Blue, Creator of Apache Iceberg and co-founder of Tabular

AI/ML Data Lakehouse Databricks Iceberg
Bilal Aslam – Sr. Director of Product Management @ Databricks , Yejin Choi – Professor and MacArthur Fellow; Senior Research Director for Commonsense AI at AI2 @ University of Washington; AI2 , Darshana Sivakumar – Staff Product Manager @ Databricks , Ryan Blue – Creator of Apache Iceberg and co-founder @ Tabular , Zeashan Pappa – Staff Product Manager @ Databricks , Ali Ghodsi – CEO @ Databricks , Reynold Xin – Co-founder and Chief Architect @ Databricks , Matei Zaharia – Chief Technologist @ Databricks , Hannes Mühleisen – Creator of DuckDB @ DuckDB Labs , Alexander Booth – Assistant Director of R&D @ Texas Rangers Baseball Club , Tareef Kawaf – President @ Posit Sofware, PBC

Speakers: - Alexander Booth, Asst Director of Research & Development, Texas Rangers - Ali Ghodsi, Co-Founder and CEO, Databricks - Bilal Aslam, Sr. Director of Product Management, Databricks - Darshana Sivakumar, Staff Product Manager, Databricks - Hannes Mühleisen, Creator of DuckDB, DuckDB Labs - Matei Zaharia, Chief Technology Officer and Co-Founder, Databricks - Reynold Xin, Chief Architect and Co-Founder, Databricks - Ryan Blue, CEO, Tabular - Tareef Kawaf, President, Posit Software, PBC - Yejin Choi, Sr Research Director Commonsense AI, AI2, University of Washington - Zeashan Pappa, Staff Product Manager, Databricks

About Databricks Databricks is the Data and AI company. More than 10,000 organizations worldwide — including Block, Comcast, Conde Nast, Rivian, and Shell, and over 60% of the Fortune 500 — rely on the Databricks Data Intelligence Platform to take control of their data and put it to work with AI. Databricks is headquartered in San Francisco, with offices around the globe, and was founded by the original creators of Lakehouse, Apache Spark™, Delta Lake and MLflow.

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data… Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

AI/ML Data Lakehouse Databricks Delta DuckDB Spark

You may join the interview via any of the following links: https://www.youtube.com/watch?v=xjfHgLb9NH8 https://www.linkedin.com/events/7173033569857093632/comments/ https://www.facebook.com/events/417837317590765

If you want to receive recording and other useful links after event, please leave your contact here - https://hubs.li/Q02nXnT_0

Topic: "Open Table Formats Reshaping the Data Industry: A Deep Dive"

Speaker: Ryan Blue, Co-creator of Apache Iceberg Ryan spent the last decade working on big data infrastructure at Netflix, Cloudera, and now Tabular. He is an ASF member and a committer in the Apache Parquet, Avro, and Spark communities.

Abstract: This talk will be livestreamed on all official ODSC social media - Linkedin, Facebook, Youtube etc.

Open table formats are in the process of transforming the data industry. Take a deep dive into the unprecedented change Apache Iceberg presents by enabling data warehouses to share storage and the way it will help shape the future.

Don't miss the chance to ask your questions and share your thoughts during the live Q&A.

ODSC Links: • Get free access to more talks/trainings like this at Ai+ Training platform: https://hubs.li/H0Zycsf0 • ODSC blog: https://opendatascience.com/ • Facebook: https://www.facebook.com/OPENDATASCI • Twitter: https://twitter.com/_ODSC & @odsc • LinkedIn: https://www.linkedin.com/company/open-data-science • Slack Channel: https://hubs.li/Q02mx9-s0 • Code of conduct: https://odsc.com/code-of-conduct/

Interview "Open Table Formats Reshaping the Data Industry: A Deep Dive"

You may join the interview via any of the following links: https://www.youtube.com/watch?v=xjfHgLb9NH8 https://www.linkedin.com/events/7173033569857093632/comments/ https://www.facebook.com/events/417837317590765

If you want to receive recording and other useful links after event, please leave your contact here - https://hubs.li/Q02nXnT_0

Topic: "Open Table Formats Reshaping the Data Industry: A Deep Dive"

Speaker: Ryan Blue, Co-creator of Apache Iceberg Ryan spent the last decade working on big data infrastructure at Netflix, Cloudera, and now Tabular. He is an ASF member and a committer in the Apache Parquet, Avro, and Spark communities.

Abstract: This talk will be livestreamed on all official ODSC social media - Linkedin, Facebook, Youtube etc.

Open table formats are in the process of transforming the data industry. Take a deep dive into the unprecedented change Apache Iceberg presents by enabling data warehouses to share storage and the way it will help shape the future.

Don't miss the chance to ask your questions and share your thoughts during the live Q&A.

ODSC Links: • Get free access to more talks/trainings like this at Ai+ Training platform: https://hubs.li/H0Zycsf0 • ODSC blog: https://opendatascience.com/ • Facebook: https://www.facebook.com/OPENDATASCI • Twitter: https://twitter.com/_ODSC & @odsc • LinkedIn: https://www.linkedin.com/company/open-data-science • Slack Channel: https://hubs.li/Q02mx9-s0 • Code of conduct: https://odsc.com/code-of-conduct/

Interview "Open Table Formats Reshaping the Data Industry: A Deep Dive"

You may join the interview via any of the following links: https://www.youtube.com/watch?v=xjfHgLb9NH8 https://www.linkedin.com/events/7173033569857093632/comments/ https://www.facebook.com/events/417837317590765

If you want to receive recording and other useful links after event, please leave your contact here - https://hubs.li/Q02nXnT_0

Topic: "Open Table Formats Reshaping the Data Industry: A Deep Dive"

Speaker: Ryan Blue, Co-creator of Apache Iceberg Ryan spent the last decade working on big data infrastructure at Netflix, Cloudera, and now Tabular. He is an ASF member and a committer in the Apache Parquet, Avro, and Spark communities.

Abstract: This talk will be livestreamed on all official ODSC social media - Linkedin, Facebook, Youtube etc.

Open table formats are in the process of transforming the data industry. Take a deep dive into the unprecedented change Apache Iceberg presents by enabling data warehouses to share storage and the way it will help shape the future.

Don't miss the chance to ask your questions and share your thoughts during the live Q&A.

ODSC Links: • Get free access to more talks/trainings like this at Ai+ Training platform: https://hubs.li/H0Zycsf0 • ODSC blog: https://opendatascience.com/ • Facebook: https://www.facebook.com/OPENDATASCI • Twitter: https://twitter.com/_ODSC & @odsc • LinkedIn: https://www.linkedin.com/company/open-data-science • Slack Channel: https://hubs.li/Q02mx9-s0 • Code of conduct: https://odsc.com/code-of-conduct/

Interview "Open Table Formats Reshaping the Data Industry: A Deep Dive"
Ryan Blue – guest @ Netflix , Tobias Macey – host

Summary

Cloud data warehouses have unlocked a massive amount of innovation and investment in data applications, but they are still inherently limiting. Because of their complete ownership of your data they constrain the possibilities of what data you can store and how it can be used. Projects like Apache Iceberg provide a viable alternative in the form of data lakehouses that provide the scalability and flexibility of data lakes, combined with the ease of use and performance of data warehouses. Ryan Blue helped create the Iceberg project, and in this episode he rejoins the show to discuss how it has evolved and what he is doing in his new business Tabular to make it even easier to implement and maintain.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Hey there podcast listener, are you tired of dealing with the headache that is the 'Modern Data Stack'? We feel your pain. It's supposed to make building smarter, faster, and more flexible data infrastructures a breeze. It ends up being anything but that. Setting it up, integrating it, maintaining it—it’s all kind of a nightmare. And let's not even get started on all the extra tools you have to buy to get it to do its thing. But don't worry, there is a better way. TimeXtender takes a holistic approach to data integration that focuses on agility rather than fragmentation. By bringing all the layers of the data stack together, TimeXtender helps you build data solutions up to 10 times faster and saves you 70-80% on costs. If you're fed up with the 'Modern Data Stack', give TimeXtender a try. Head over to timextender.com/dataengineering where you can do two things: watch us build a data estate in 15 minutes and start for free today. Your host is Tobias Macey and today I'm interviewing Ryan Blue about the evolution and applications of the Iceberg table format and how he is making it more accessible at Tabular

Interview

Introduction How did you get involved in the area of data management? Can you describe what Iceberg is and its position in the data lake/lakehouse ecosystem?

Since it is a fundamentally a specification, how do you manage compatibility and consistency across implementations?

What are the notable changes in the Iceberg project and its role in the ecosystem since our last conversation October of 2018? Around the time that Iceberg was first created at Netflix a number of alternative table formats were also being developed. What are the characteristics of Iceberg that lead teams to adopt it for their lakehouse projects?

Given the constant evolution of the various table formats it can be difficult to determine an up-to-date comparison of their features, particularly earlier in their development. What are the aspects of this problem space that make it so challenging to establish unbiased and comprehensive comparisons?

For someone who wants to manage their data in Iceberg tables, what does the implementation look like?

How does that change based on the type of query/processing engine being used?

Once a table has been created, what are the capabilities of Iceberg that help to support ongoing use and maintenance? What are the most interesting, innovative, or unexpected ways that you have seen Iceberg used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Iceberg/Tabular? When is Iceberg/Tabular the wrong choice? What do you have planned for the future of Iceberg/Tabular?

Contact Info

LinkedIn rdblue on GitHub

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the

AI/ML Cloud Computing Data Engineering Data Lake Data Lakehouse Data Management GitHub Iceberg Modern Data Stack Python
Data Engineering Podcast
Ryan Wade – Senior Solution Consultant @ Blue Granite , Mico Yuk – Co-Founder @ Data Storytelling Academy

Ryan Wade joins us on AOF today to talk about how to use advanced analytics in your organization! Ryan has been in the analytics game for the last 20 years and is now a Senior Solution Consultant at Blue Granite, based in Indianapolis, Indiana. He recently authored the amazing must-read book, Advanced Analytics in Power BI with R and Python, and in today's chat, we get to hear all about why he wrote the book, who it is for and how you can use it to accelerate your data journey! I met Ryan while speaking at a few conferences and was always impressed with his knowledge and great sense of humor! A professional football player turned data scientist, Ryan has a passion for breaking down advanced analytics in a way anyone can understand. Whether you're already using advanced analytics or researching how to get started Ryan's knowledge on the topic will help you. Tune in with a pencil and paper in hand!   In this episode, you'll learn: [0:09:22] The rise of the R and Python programming languages in the data world. [0:16:44] The necessary, well-thought-out preparatory steps for a project utilizing advanced analytics. [0:19:39] Why attention-grabbing visuals are not the most important part of data storytelling! [0:23:13] Creating a sufficient team for data analytics and the vital roles of the database administrator, active directory administrator, and more! [0:39:07] Client conversations around shortcomings and hurdles in advanced analytics.  For full show notes, and the links mentioned visit: https://bibrainz.com/podcast/82   Enjoyed the Show?  Please leave us a review on iTunes.

Analytics BI Data Analytics Power BI Python
Analytics on Fire
Ryan Blue – guest @ Netflix , Tobias Macey – host

Summary

With the growth of the Hadoop ecosystem came a proliferation of implementations for the Hive table format. Unfortunately, with no formal specification, each project works slightly different which increases the difficulty of integration across systems. The Hive format is also built with the assumptions of a local filesystem which results in painful edge cases when leveraging cloud object storage for a data lake. In this episode Ryan Blue explains how his work on the Iceberg table format specification and reference implementation has allowed Netflix to improve the performance and simplify operations for their S3 data lake. This is a highly detailed and technical exploration of how a well-engineered metadata layer can improve the speed, accuracy, and utility of large scale, multi-tenant, cloud-native data platforms.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Ryan Blue about Iceberg, a Netflix project to implement a high performance table format for batch workloads

Interview

Introduction How did you get involved in the area of data management? Can you start by explaining what Iceberg is and the motivation for creating it?

Was the project built with open-source in mind or was it necessary to refactor it from an internal project for public use?

How has the use of Iceberg simplified your work at Netflix? How is the reference implementation architected and how has it evolved since you first began work on it?

What is involved in deploying it to a user’s environment?

For someone who is interested in using Iceberg within their own environments, what is involved in integrating it with their existing query engine?

Is there a migration path for pre-existing tables into the Iceberg format?

How is schema evolution managed at the file level?

How do you handle files on disk that don’t contain all of the fields specified in a table definition?

One of the complicated problems in data modeling is managing table partitions. How does Iceberg help in that regard? What are the unique challenges posed by using S3 as the basis for a data lake?

What are the benefits that outweigh the difficulties?

What have been some of the most challenging or contentious details of the specification to define?

What are some things that you have explicitly left out of the specification?

What are your long-term goals for the Iceberg specification?

Do you anticipate the reference implementation continuing to be used and maintained?

Contact Info

rdblue on GitHub LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Iceberg Reference Implementation Iceberg Table Specification Netflix Hadoop Cloudera Avro Parquet Spark S3 HDFS Hive ORC S3mper Git Metacat Presto Pig DDL (Data Definition Language) Cost-Based Optimization

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast

API Avro Big Data Cloud Computing Data Engineering Data Lake Data Management Data Modelling Git GitHub Hadoop HDFS Hive Iceberg ORC Parquet Presto S3 Spark
Data Engineering Podcast
Real World XML 2003-01-15
Steven Holzner – author

Steven Holzner's friendly, easy-to-read style has turned this book (formerly known as Inside XML) into the leading reference on XML. Unlike other XML books, this one is packed with hundreds of real-world examples, fully tested and ready to use! Holzner teaches you XML like no other author can, covering every major XML topic today and detailing the ways XML is used now--connecting XML to databases (both locally and on web servers), stying XML for viewing in today's web browsers, reading and parsing XML documents in browsers, writing and using XML schemas, creating graphical XML browsers, working with the Simple Object Access Protocol (SOAP), and a great deal more. Real World XML is designed to be the standard in XML coverage--more complete, and more accessible, than any other. "The author's approach is definitely bottom up, written in a highly personable tone. He makes efficient use of example code, which sets this book apart from many I have read in the past. His examples bring to life the code without overwhelming the reader, and he does not present any examples for which the reader has not been prepared. In addition, no prior knowledge of XML is assumed. As such, this is an excellent book for both beginners and intermediate level web designers and programmers. Experts, too, will find this book of value, due to its emphasis on real world applicability. Overall, this book will benefit all web developers and programmers, with a special emphasis on beginner and intermediate developers."--Donna A. Dulo, MS, MA, Senior Systems Engineer, U.S. Department of Defense "This book will provide a brilliant basis for anyone wishing to keep up to speed with the new XML developments."--Mr. Andrew Madden, Department of Computer Science, University of Wales "I found this book's strengths to be: its exhaustive specification reference for the conscientious developer; access to the official specs, which is key; the wide variety of choices provided for all aspects of XML; several alternatives provided for each editor, browser, parser, stylesheet transform engine, and programming language; and working examples that show the power of the tools used."--Jaime Ryan, Software Developer/Documentation Manager, Blue Titan Software

data data-engineering storage-formats XML Computer Science
O'Reilly Data Engineering Books
Showing 10 results