talk-data.com talk-data.com

Topic

Hive

Apache Hive

data_warehouse sql hadoop

13

tagged

Activity Trend

9 peak/qtr
2020-Q1 2026-Q1

Activities

13 activities · Newest first

Databricks + Apache Iceberg™: Managed and Foreign Tables in Unity Catalog

Unity Catalog support for Apache Iceberg™ brings open, interoperable table formats to the heart of the Databricks Lakehouse. In this session, we’ll introduce new capabilities that allow you to write Iceberg tables from any REST-compatible engine, apply fine-grained governance across all data, and unify access to external Iceberg catalogs like AWS Glue, Hive Metastore, and Snowflake Horizon. Learn how Databricks is eliminating data silos, simplifying performance with Predictive Optimization, and advancing a truly open lakehouse architecture with Delta and Iceberg side by side.

Embracing Unity Catalog and Empowering Innovation With Genie Room

Bagelcode, a leader in the social casino industry, has utilized Databricks since 2018 and manages over 10,000 tables via Hive Metastore. In 2024, we embarked on a transformative journey to resolve inefficiencies and unlock new capabilities. Over five months, we redesigned ETL pipelines with Delta Lake, optimized partitioned table logs and executed a seamless migration with minimal disruption. This effort improved governance, simplified management and unlocked Unity Catalog’s advanced features. Post-migration, we integrated the Genie Room with Slack to enable natural language queries, accelerating decision-making and operational efficiency. Additionally, a lineage-powered internal tool allowed us to quickly identify and resolve issues like backfill needs or data contamination. Unity Catalog has revolutionized our data ecosystem, elevating governance and innovation. Join us to learn how Bagelcode unlocked its data’s full potential and discover strategies for your own transformation.

American Airlines Flies to New Heights with Data Intelligence

American Airlines migrated from Hive Metastore to Unity Catalog using automated processes with Databricks APIs and GitHub Actions. This automation streamlined the migration for many applications within AA, ensuring consistency, efficiency and minimal disruption while enhancing data governance and disaster recovery capabilities.

Modernizing Critical Infrastructure: AI and Data-Driven Solutions in Nuclear and Utility Operations

This session showcases how both Westinghouse Electric and Alabama Power Company are leveraging cloud-based tools, advanced analytics, and machine learning to transform operational resilience across the energy sector. In the first segment, we'll explore AI's crucial role in enhancing safety, efficiency, and compliance in nuclear operations through technologies like HiVE and Bertha, focusing on the unique reliability and credibility requirements of the regulated nuclear industry. We’ll then highlight how Alabama Power Company has modernized its grid management and storm preparedness by using Databricks to develop SPEAR and RAMP—applications that combine real-time data and predictive analytics to improve reliability, efficiency, and customer service.

Revolutionizing Nuclear AI With HiVE and Bertha on Databricks Architecture

In this session we will explore the revolutionary advancements in nuclear AI capabilities with HiVE and Bertha on Databricks architecture. HiVE, developed by Westinghouse, leverages over a century of proprietary data to deliver unparalleled AI capabilities. At its core is Bertha, a generative AI model designed to tackle the unique challenges of the nuclear industry. This session will delve into the technical architecture of HiVE and Bertha, showcasing how Databricks' scalable environment enhances their performance. We will discuss the secure data infrastructure supporting HiVE, ensuring data integrity and compliance. Real-world applications and use cases will demonstrate the impact of HiVE and Bertha on improving efficiency, innovation and safety in nuclear operations. Discover how the fusion of HiVE and Bertha with Databricks architecture is transforming the nuclear AI landscape and driving the future of nuclear technology.

Unify Your Data and Governance With Lakehouse Federation

In today's data landscape, organizations often grapple with fragmented data spread across various databases, data warehouses and catalogs. Lakehouse Federation addresses this challenge by enabling seamless discovery, querying, and governance of distributed data without the need for duplication or migration. This session will explore how Lakehouse Federation integrates external data sources like Hive Metastore, Snowflake, SQL Server and more into a unified interface, providing consistent access controls, lineage tracking and auditing across your entire data estate. Learn how to streamline analytics and AI workloads, enhance compliance and reduce operational complexity by leveraging a single, cohesive platform for all your data needs.

Transforming Credit Analytics With a Compliant Lakehouse at Rabobank

This presentation outlines Rabobank Credit analytics transition to a secure, audit-ready data architecture using Unity Catalog (UC), addressing critical regulatory challenges in credit analytics for IRB and IFRS9 regulatory modeling. Key technical challenges included legacy infrastructure (Hive metastore, ADLS mounts using Service Principals and Credential passthrough) lacking granular access controls, data access auditing and limited visibility into lineage, creating governance and compliance gaps. Details cover a framework for phased migration to UC. Outcomes include data lineage mapping demonstrating compliance with regulatory requirements, granular role based access control and unified audit trails. Next steps involve a lineage visualization toolkit (custom app for impact analysis and reporting) and lineage expansion to incorporate upstream banking systems.

Lakehouse Federation: Access and Governance of External Data Sources from Unity Catalog

Are you tired of spending time and money moving data across multiple sources and platforms to access the right data at the right time? Join our session and discover Databricks new Lakehouse Federation feature, which allows you to access, query, and govern your data in place without leaving the Lakehouse. Our experts will demonstrate how you can leverage the latest enhancements in Unity Catalog, including query federation, Hive interface, and Delta Sharing, to discover and govern all your data in one place, regardless of where it lives.

Talk by: Can Efeoglu and Todd Greenstein

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

A Technical Deep Dive into Unity Catalog's Practitioner Playbook

Get ready to take a deep dive into Unity Catalog and explore how it can simplify data, analytics and AI governance across multiple clouds. In this session, take a deep dive into Unity Catalog and the expert Databricks team will guide you through a hands-on demo, showcasing the latest features and best practices for data governance. You'll learn how to master Unity Catalog and gain a practical understanding of how it can streamline your analytics and AI initiatives. Whether you're migrating from Hive Metastore or just looking to expand your knowledge of Unity Catalog, this session is for you. Join us for a practical, hands-on deep dive into Unity Catalog and learn how to achieve seamless data governance while following best practices for data, analytics and AI governance.

Talk by: Zeashan Pappa and Ifigeneia Derekli

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin

Advanced Migrations: From Hive to SparkSQL

Learn how Pinterest moved over 6000 Hive queries to SparkSQL, achieved a 2x runtime-weighted speed up and made significant savings in compute resources. In order to do migrations at this scale. Companies often take one of two approaches, either employ hundreds of engineers to manually migrate or completely change the query engine to be compatible with Hive both of which take significant engineering time. In this session you will learn how Pinterest took a hybrid approach and the tools and tricks Pinterest used to safely migrate thousands of queries at scale.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Coral and Transport Portable SQL and UDFs for the Interoperability of Spark and Other Engines

In this talk, we present two open source projects, Coral and Transport, that enable deep SQL and UDF interoperability between Spark and other engines, such as Trino and Hive. Coral is a SQL analysis, rewrite, and translation engine that enables compute engines to interoperate and analyze different SQL dialects and plans, through the conversion to a common relational algebraic intermediate representation. Transport is a UDF framework that enables users to write UDFs against a single API but execute them as native UDFs of multiple engines, such as Spark, Trino, and Hive. Further, we discuss how LinkedIn leverages Coral and Transport, and present a production use case for accessing views of other engines in Spark as well as enhancing Spark DataFrame and Dataset view schema. We discuss other potential applications such as automatic data governance and data obfuscation, query optimization, materialized view selection, incremental compute, and data source SQL and UDF communication.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Self-Serve, Automated and Robust CDC pipeline using AWS DMS, DynamoDB Streams and Databricks Delta

Many companies are trying to solve the challenges of ingesting transactional data in Data lake and dealing with late-arriving updates and deletes.

To address this at Swiggy, we have built CDC(Change Data Capture) system, an incremental processing framework to power all business-critical data pipelines at low latency and high efficiency.

It offers: Freshness: It operates in near real-time with configurable latency requirements. Performance: Optimized read and write performance with tuned compaction parameters and partitions and delta table optimization. Consistency: It supports reconciliation based on transaction types. Basically applying insert, update, and delete on existing data.

To implement this system, AWS DMS helped us with initial bootstrapping and CDC replication for Mysql sources. AWS Lambda and DynamoDB streams helped us to solve the bootstrapping and CDC replication for DynamoDB source.

After setting up the bootstrap and cdc replication process we have used Databricks delta merge to reconcile the data based on the transaction types.

To support the merge we have implemented supporting features - * Deduplicating multiple mutations of the same record using log offset and time stamp. * Adding optimal partition of the data set. * Infer schema and apply proper schema evolutions(Backward compatible schema) * We have extended the delta table snapshot generation technique to create a consistent partition for partitioned delta tables.

FInally to read the data we are using Spark sql with Hive metastore and Snowflake. Delta tables read with Spark sql have implicit support for hive metastore. We have built our own implementation of the snowflake sync process to create external, internal tables and materialized views on Snowflake.

Stats: 500m CDC logs/day 600+ tables

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/