talk-data.com
Coral and Transport Portable SQL and UDFs for the Interoperability of Spark and Other Engines
Topics
Description
In this talk, we present two open source projects, Coral and Transport, that enable deep SQL and UDF interoperability between Spark and other engines, such as Trino and Hive. Coral is a SQL analysis, rewrite, and translation engine that enables compute engines to interoperate and analyze different SQL dialects and plans, through the conversion to a common relational algebraic intermediate representation. Transport is a UDF framework that enables users to write UDFs against a single API but execute them as native UDFs of multiple engines, such as Spark, Trino, and Hive. Further, we discuss how LinkedIn leverages Coral and Transport, and present a production use case for accessing views of other engines in Spark as well as enhancing Spark DataFrame and Dataset view schema. We discuss other potential applications such as automatic data governance and data obfuscation, query optimization, materialized view selection, incremental compute, and data source SQL and UDF communication.
Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/