talk-data.com talk-data.com

Matei Zaharia

Speaker

Matei Zaharia

17

talks

Chief Technologist Databricks

Matei Zaharia is the CTO and co-founder of Databricks and an Associate Professor of Computer Science at UC Berkeley. He initiated the Apache Spark project during his PhD at UC Berkeley in 2009 and has contributed to MLflow, Delta Lake, and DBRX. His recent research focuses on combining large language models with external data sources to improve efficiency and result quality. His work has been recognized with the 2014 ACM Doctoral Dissertation Award and the U.S. Presidential Early Career Award for Scientists and Engineers (PECASE).

Bio from: Databricks DATA + AI Summit 2023

Frequent Collaborators

Filter by Event / Source

Talks & appearances

17 activities · Newest first

Search activities →
Founder discussion: Matei on UC, Data Intelligence and AI Governance

Matei is a legend of open source: he started the Apache Spark project in 2009, co-founded Databricks, and worked on other widely used data and AI software, including MLflow, Delta Lake, and Dolly. His most recent research is about combining large language models (LLMs) with external data sources, such as search systems, and improving their efficiency and result quality. This will be a conversation coverering the latest and greatest of UC, Data Intelligence, AI Governance, and more.

keynote
with Bilal Aslam (Databricks) , Michael Armbrust (Databricks) , Arsalan Tavakoli-Shiraji (Databricks) , Miranda Luna (Databricks) , Michael Flynn (Rivian) , Ken Wong (Databricks) , Ali Ghodsi (Databricks) , Keegan Dubbs (Databricks) , Matei Zaharia (Databricks) , Michelle Leon (Databricks) , Michael Piatek (Databricks)

Discover the latest advances on the Data Intelligence Platform and hear from the companies who are already enjoying success.

talk
with Jonathan Hsieh (LanceDB) , Cathy Yin (Databricks) , Andrew Shieh (Databricks) , Ziyi Yang (Databricks) , Andy Konwinski (Databricks) , Denny Lee (Databricks) , Asfandyar Qureshi (Databricks) , Yuki Watanabe (Databricks) , Brandon Cui (Databricks) , Andrew Drozdov (Databricks) , Anand Kannappan (Patronus AI) , Harsh Panchal (Databricks) , Tomu Hirata (Databricks) , Daya Khudia (Databricks) , Jose Javier Gonzalez (Databricks) , Jasmine Collins (Databricks) , MAHESWARAN SATHIAMOORTHY (Bespoke Labs) , Jonathan Chang (Databricks) , Matei Zaharia (Databricks) , Alexander Trott (Databricks) , Tejas Sundaresan (Databricks) , Pallavi Koppol (Databricks) , Jonathan Frankle (Databricks) , Erich Elsen (Databricks) , Ivan Zhou (Databricks) , Davis Blalock , Gayathri Murali (META)

https://bit.ly/devconnectdais

Announcing Databricks Clean Rooms with Live Demo. Presented by Matei Zaharia and Darshana Sivakumar

Speakers: Matei Zaharia, Original Creator of Apache Spark™ and MLflow; Chief Technologist, Databricks Darshana Sivakumar, Staff Product Manager, Databricks

Organizations are looking for ways to securely exchange their data and collaborate with external partners to foster data-driven innovations. In the past, organizations had limited data sharing solutions, relinquishing control over how their sensitive data was shared with partners and little to no visibility into how their data was consumed. This created the risk for potential data misuse and data privacy breaches. Customers who tried using other clean room solutions have told us these solutions are limited and do not meet their needs, as they often require all parties to copy their data into the same platform, do not allow sophisticated analysis beyond basic SQL queries, and have limited visibility or control over their data.

Organizations need an open, flexible, and privacy-safe way to collaborate on data, and Databricks Clean Rooms meets these critical needs.

See a demo of Databricks Clean Rooms, now in Public Preview on AWS + Azure

Data Sharing and Cross-Organization Collaboration. Presented by Matei Zaharia at Data + AI Summit

Speaker: Matei Zaharia, Original Creator of Apache Spark™ and MLflow; Chief Technologist, Databricks

Summary: Data sharing and collaboration are important aspects of the data space. Matei Zaharia explains the evolution of the Databricks data platform to facilitate data sharing and collaboration for customers and their partners.

Delta Sharing allows you to share parts of your table with third parties authorized to view them. Over 16,000 data recipients use Delta Sharing, and 40% are not on Databricks—a testament to the open nature.

Databricks Marketplace has been growing rapidly and now has over 2,000 data listings, making it one of the largest data marketplaces available. New Marketplace partners include T-Mobile, Tableau, Atlassian, Epsilon, Shutterstock and more.

To learn more about Delta Sharing features and the expansion of partner sharing ecosystem, see the recent blog: https://www.databricks.com/blog/whats-new-data-sharing-and-collaboration

Data + AI Summit 2024 - Keynote Day 2 - Full
video
with Bilal Aslam (Databricks) , Yejin Choi (University of Washington; AI2) , Darshana Sivakumar (Databricks) , Ryan Blue (Tabular) , Zeashan Pappa (Databricks) , Ali Ghodsi (Databricks) , Reynold Xin (Databricks) , Matei Zaharia (Databricks) , Hannes Mühleisen (DuckDB Labs) , Alexander Booth (Texas Rangers Baseball Club) , Tareef Kawaf (Posit Sofware, PBC)

Speakers: - Alexander Booth, Asst Director of Research & Development, Texas Rangers - Ali Ghodsi, Co-Founder and CEO, Databricks - Bilal Aslam, Sr. Director of Product Management, Databricks - Darshana Sivakumar, Staff Product Manager, Databricks - Hannes Mühleisen, Creator of DuckDB, DuckDB Labs - Matei Zaharia, Chief Technology Officer and Co-Founder, Databricks - Reynold Xin, Chief Architect and Co-Founder, Databricks - Ryan Blue, CEO, Tabular - Tareef Kawaf, President, Posit Software, PBC - Yejin Choi, Sr Research Director Commonsense AI, AI2, University of Washington - Zeashan Pappa, Staff Product Manager, Databricks

About Databricks Databricks is the Data and AI company. More than 10,000 organizations worldwide — including Block, Comcast, Conde Nast, Rivian, and Shell, and over 60% of the Fortune 500 — rely on the Databricks Data Intelligence Platform to take control of their data and put it to work with AI. Databricks is headquartered in San Francisco, with offices around the globe, and was founded by the original creators of Lakehouse, Apache Spark™, Delta Lake and MLflow.

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data… Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Live from the Lakehouse: AI governance, Unity Catalog, Ethics in AI, and Industry Perspectives

Hear from three guests. First, Matei Zaharia (co-founder and Chief Technologist, Databricks) on AI governance and Unity Catalog. Second guest, Scott Starbird (General Counsel, Public Affairs and Strategic Partnerships, Databricks) on Ethics in AI. Third guest, Bryan Saftler (Industry Solutions Marketing Director, Databricks) on industry perspectives and solution accelerators. Hosted by Ari Kaplan (Head of Evangelism, Databricks) and Pearl Ubaru (Sr Technical Marketing Engineer, Databricks)

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Data + AI Summit Keynote Thursday
video
with Michael Armbrust (Databricks) , Marc Andreessen (Andreessen Horowitz) , Arsalan , Jitendra Malik (University of California, Berkeley) , Eric Schmidt (Google (Alphabet)) , Hannes Muhleisen (DuckDB Labs) , Ali Ghodsi (Databricks) , Reynold Xin (Databricks) , Matei Zaharia (Databricks) , Lin Qiao (Fireworks AI) , Harrison Chase (LangChain)

0:00 Open 6:08 Ali Ghodsi & Marc Andreessen 32:06 Reynold Xin 48:09 Michael Armbrust 1:00:00 Matei Zaharia & Panel 1:27:10 Hannes Muhleisen 01:37:43 Harrison Chase 01:49:15 Lin Qiao 02:05:03 Jitendra Malik 02:21:15 Arsalan & Eric Schmidt

Data + AI Summit Keynote Wednesday
video
with Larry Feinsmith (JP Morgan Chase) , Kasey Uhlenhuth (Databricks) , Zaheera Valani (Databricks) , Wassym Bensaid (Rivian) , Satya Nadella (Microsoft) , Weston Hutchins (Databricks) , Ali Ghodsi (Databricks) , Reynold Xin (Databricks) , Sai Pradhan Ravuru (Jetblue) , Matei Zaharia (Databricks) , Caryl Yuhas (Databricks) , Patrick Wendell (Databricks) , Naveen Rao (Databricks)

0:00 Opener 01:18- Ali Ghodsi, Databricks 06:53 - Satya Nadella, Microsoft 15:50 Ali Ghodsi, Databricks 20:40 Larry Feinsmith, JP Morgan Chase 41:09 Ali Ghodsi, Databricks 45:07 Matei Zaharia, Databricks 52:31 Weston Hutchins, Databricks 58:36 Ali Ghodsi, Databricks 1:02:05 Naveen Rao, MosaicML 1:12:15 Patrick Wendell, Databricks 1:27:57 Kasey Uhlenhuth, Databricks 1:39:18 Sai Pradhan Ravuru, Jetblue 01:47 Ali Ghodsi, Databricks 1:49:20 Reynold Xin, Databricks 2:05:07 Ali Ghodsi, Databricks 2:09:26 Matei Zaharia, Databricks 2:17:24 Caryl Yuhas, Databricks 2:24:12 Zaheera Valani, Databricks 2:39:55 Wassym Bensaid, Rivian

Data Governance and Sharing on Lakehouse | Matei Zaharia | Keynote Data + AI Summit 2022

Data + AI Summit Keynote talk from Matei Zaharia on Data Governance and Sharing on Lakehouse

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Welcome &  Destination Lakehouse    Ali Ghodsi   Keynote Data + AI Summit 2022

Join the Day 1 keynote to hear from Databricks co-founders - and original creators of Apache Spark and Delta Lake - Ali Ghodsi, Matei Zaharia, and Reynold Xin on how Databricks and the open source community is taking on the biggest challenges in data. The talks will address the latest updates on the Apache Spark and Delta Lake projects, the evolution of data lakehouse architecture, and how companies like Adobe and Amgen are using lakehouse architecture to advance their data goals.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Day 1 Morning Keynote | Data + AI Summit 2022

Day 1 Morning Keynote | Data + AI Summit 2022 Welcome & "Destination Lakehouse" | Ali Ghodsi Apache Spark Community Update | Reynold Xin Streaming Lakehouse | Karthik Ramasamy Delta Lake | Michael Armbrust How Adobe migrated to a unified and open data Lakehouse to deliver personalization at unprecedented scale | Dave Weinstein Data Governance and Sharing on Lakehouse |Matei Zaharia Analytics Engineering and the Great Convergence | Tristan Handy Data Warehousing | Shant Hovespian Unlocking the power of data, AI & analytics: Amgen’s journey to the Lakehouse | Kerby Johnson

Get insights on how to launch a successful lakehouse architecture in Rise of the Data Lakehouse by Bill Inmon, the father of the data warehouse. Download the ebook: https://dbricks.co/3ER9Y0K

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Spark: The Definitive Guide

Learn how to use, deploy, and maintain Apache Spark with this comprehensive guide, written by the creators of the open-source cluster-computing framework. With an emphasis on improvements and new features in Spark 2.0, authors Bill Chambers and Matei Zaharia break down Spark topics into distinct sections, each with unique goals. Youâ??ll explore the basic operations and common functions of Sparkâ??s structured APIs, as well as Structured Streaming, a new high-level API for building end-to-end streaming applications. Developers and system administrators will learn the fundamentals of monitoring, tuning, and debugging Spark, and explore machine learning techniques and scenarios for employing MLlib, Sparkâ??s scalable machine-learning library. Get a gentle overview of big data and Spark Learn about DataFrames, SQL, and Datasetsâ??Sparkâ??s core APIsâ??through worked examples Dive into Sparkâ??s low-level APIs, RDDs, and execution of SQL and DataFrames Understand how Spark runs on a cluster Debug, monitor, and tune Spark clusters and applications Learn the power of Structured Streaming, Sparkâ??s stream-processing engine Learn how you can apply MLlib to a variety of problems, including classification or recommendation

Learning Spark

Data in all domains is getting bigger. How can you work with it efficiently? Recently updated for Spark 1.3, this book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. With Spark, you can tackle big datasets quickly through simple APIs in Python, Java, and Scala. This edition includes new information on Spark SQL, Spark Streaming, setup, and Maven coordinates.