talk-data.com talk-data.com

Topic

Presto

distributed_sql_query_engine big_data analytics

5

tagged

Activity Trend

6 peak/qtr
2020-Q1 2026-Q1

Activities

5 activities · Newest first

Presto On Spark: A Unified SQL Experience

Presto was originally designed to run interactive queries against data warehouses, but now it has evolved into a unified SQL engine on top of open data lake analytics for both interactive and batch workloads. However, Presto doesn't scale to very large and complex batch pipelines. Presto Unlimited was designed to address such scalability challenges but it didn’t fully solve fault tolerance, isolation, and resource management.

Spark is the tool of choice across the industry for running large scale complex batch ETL pipelines. This motivated the development of Presto On Spark. Presto on Spark runs Presto as a library that is submitted with spark-submit to a Spark cluster. It leverages Spark for scaling shuffle, worker execution, and resource management. It thereby eliminates any query conversion between interactive and batch use cases. This solution helps enable a performant and scalable platform with seamless end-to-end experience to explore and process data.

Many analysts at Intuit use Presto to explore data in the Data Lake/S3 and use Spark for batch processing. These analysts would earlier spend several hours converting these exploration SQLs written for Presto to Spark SQL to operationalize/schedule them as data pipelines. Presto On Spark is now used by analysts at Intuit to run thousands of critical jobs. No query conversion is required here, improved analysts' productivity and empowered them to deliver insights at high speed.

Benefits from session: Attendees will learn about Presto On Spark architecture Attendees will learn when To Use Spark's Execution Engine With Presto Attendees will learn how Intuit runs thousands of presto jobs daily leveraging databricks platform which they can apply to their own work

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Diving into Delta Lake 2.0

The Delta ecosystem rapidly expanded with the release of Delta Lake 1.2 which included integrations with Apache Spark™, Apache Flink, Presto, Trino, features such as OPTIMIZE, data skipping using column statistics, restore APIs, S3 multi-cluster writes, and more.

Join this session to learn about how the wider Delta community collaborated together to bring these features and integrations together; as well as the current roadmap. This will be an interactive session so come prepared with your questions—we should have answers!

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Delta Lake 2.0 Overview

After three years of hard work by the Delta community, we are proud to announce the release of Delta Lake 2.0. Completing the work to open-source all of Delta Lake while tens of thousands of organizations were running in production was no small feat and we have the ever-expanding Delta community to thank! Join this session to learn about how the wider Delta community collaborated together to bring these features and integrations together.

Join this session to learn about how the wider Delta community collaborated together to bring these features and integrations together. This includes the Integrations with Apache Spark™, Apache Flink, Apache Pulsar, Presto, Trino, and more.

Features such as OPTIMIZE ZORDER, data skipping using column stats, S3 multi-cluster writes, Change Data Feed, and more.

Language APIs including Rust, Python, Ruby, GoLang, Scala, and Java.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Presto 101: An Introduction to Open Source Presto

Presto is a widely adopted distributed SQL engine for data lake analytics. With Presto, you can perform ad hoc querying of data in place, which helps solve challenges around time to discover and the amount of time it takes to do ad hoc analysis. Additionally, new features like the disaggregated coordinator, Presto-on-Spark, scan optimizations, a reusable native engine, and a Pinot connector enable added benefits around performance, scale, and ecosystem.

In this session, Philip and Rohan will introduce the Presto technology and share why it’s becoming so popular – in fact, companies like Facebook, Uber, Twitter, Alibaba, and much more use Presto for interactive ad hoc queries, reporting & dashboarding data lake analytics, and much more. We’ll also show a quick demo on getting Presto running in AWS.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Interactive BI Analytics with Presto by Łukasz Osipiuka and Karol Sobczak

Big Data Europe Onsite and online on 22-25 November in 2022 Learn more about the conference: https://bit.ly/3BlUk9q

Join our next Big Data Europe conference on 22-25 November in 2022 where you will be able to learn from global experts giving technical talks and hand-on workshops in the fields of Big Data, High Load, Data Science, Machine Learning and AI. This time, the conference will be held in a hybrid setting allowing you to attend workshops and listen to expert talks on-site or online.