Search – talk-data.com

Title & Speakers	Event
Event Spark Meetup NYC - Present and future! 2025-06-17
Compaction with Spark: The Fine Print 2025-06-17 · 21:30 Gilad Tal – Co-founder & CTO @ Dualbird Spark
The Future of Big Data Engines 2025-06-17 · 21:30 Meni Shmueli – Co-founder & CEO @ DataFlint Spark Big Data
Panel: Spark & AI – Where Is This Going? 2025-06-17 · 21:30 Spark ai

Fixing small files performance issues in Apache Spark, using DataFlint 2024-04-10 · 15:00 One of the big challenges in big data is interacting with the storage layer, especially in the data lake where we are the one who manages the files and partitions. One of the most common performance problems in data lakes is working with small files. In this lecture we will learn about: * Why it's important to read and write files in best-practice size * How Apache Spark under the hood interact with files, and how it relates to Spark Tasks * How we can easily detect and fix small files problem (by using the open source library DataFlint) * How to handle small files problems when using storage formats such as delta lake & iceberg. Lecturer: Meni Shmueli- founder and author of DataFlint.(https://github.com/dataflint/spark). Ex-81 unit, Ex-Ziprecruiter and Ex-Granulate. Passionate about everything related to Big Data, and about working with data teams to solve their day-to-day challenges. Over the years helped dozens of companies improve performance, debug issues and improve dev velocity in the big data world, and is currently trying to solve performance observability in big data with DataFlint.	Fixing small files performance issues in Apache Spark, using DataFlint
Fixing small files performance issues in Apache Spark, using DataFlint 2024-04-10 · 15:00 One of the big challenges in big data is interacting with the storage layer, especially in the data lake where we are the one who manages the files and partitions. One of the most common performance problems in data lakes is working with small files. In this lecture we will learn about: * Why it's important to read and write files in best-practice size * How Apache Spark under the hood interact with files, and how it relates to Spark Tasks * How we can easily detect and fix small files problem (by using the open source library DataFlint) * How to handle small files problems when using storage formats such as delta lake & iceberg. Lecturer: Meni Shmueli- founder and author of DataFlint (https://github.com/dataflint/spark). Ex-81 unit, Ex-Ziprecruiter and Ex-Granulate. Passionate about everything related to Big Data, and about working with data teams to solve their day-to-day challenges. Over the years helped dozens of companies improve performance, debug issues and improve dev velocity in the big data world, and is currently trying to solve performance observability in big data with DataFlint.	Fixing small files performance issues in Apache Spark, using DataFlint

talk-data.com

People (3 results)

Activities & events