talk-data.com talk-data.com

YouTube 2022-07-19 at 16:48

Improving Interactive Querying Experience on Spark SQL

Description

Being a data driven company, interactive querying on 100s of petabytes of data is a common and important function at Pinterest. Interactive querying has different requirements and challenges from batch querying.

In this talk, we will talk about various architectural alternatives one can choose from to perform interactive querying with Spark SQL. Through discussion on trade-offs of those architectures and requirements for interactive querying, we will elaborate on our design choice. We will share enhancements we made to open source projects including Apache Spark, Apache Livy and Dr. Elephant along with in-house technologies we built to improve interactive querying experience at Pinterest. We will share enhancements like DDL query speed ups, spark session caching, spark session sharing, Apache Yarn’s diagnostic message improvements, query failure handling and tuning recommendations. We will also discuss some challenges we faced along the way and future improvements we are working on.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/