talk-data.com talk-data.com

Event

Databricks DATA + AI Summit 2023

2026-01-11 YouTube Visit website ↗

Activities tracked

50

Filtering by: DWH ×

Sessions & talks

Showing 1–25 of 50 · Newest first

Search within this event →
The Evolution of Delta Lake from Data + AI Summit 2024

The Evolution of Delta Lake from Data + AI Summit 2024

2024-06-17 Watch
video
Shant Hovsepian (Databricks)

Shant Hovsepian, Chief Technology Officer of Data Warehousing at Databricks explains why Delta Lake is the most adopted open lakehouse format.

Includes: - Delta Lake UniForm GA (support for and compatibility with Hudi, Apache Iceberg, Delta) - Delta Lake Liquid Clustering - Delta Lake production-ready catalog (Iceberg REST API) - The growth and strength of the Delta ecosystem - Delta Kernel - DuckDB integration with Delta - Delta 4.0

Announcing Delta Lake 4.0 with Liquid Clustering. Presented by Shant Hovsepian at Data + AI Summit

Announcing Delta Lake 4.0 with Liquid Clustering. Presented by Shant Hovsepian at Data + AI Summit

2024-06-16 Watch
video
Shant Hovsepian (Databricks)

Shant Hovsepian, CTO of Data Warehousing at Databricks announced the biggest Delta Lake release to date, Delta 4.0, during the Data + AI Summit 2024 in San Francisco.

Speaker: Shant Hovsepian, Chief Technology Officer of Data Warehousing, Databricks

Lakehouse Format Interoperability With UniForm. Shant Hovsepian presents at Data + AI Summit 2024

Lakehouse Format Interoperability With UniForm. Shant Hovsepian presents at Data + AI Summit 2024

2024-06-16 Watch
video
Shant Hovsepian (Databricks)

Shant Hovsepian, Chief Technology Officer of Data Warehousing at Databricks, discusses the UniForm data format and its interoperability with other data formats. Shant explains that Delta Lake is the most adopted open lakehouse format.

Speaker: Shant Hovsepian, Chief Technology Officer of Data Warehousing, Databricks

The Best Data Warehouse is a Lakehouse

The Best Data Warehouse is a Lakehouse

2024-06-14 Watch
video
Pearl Ubaru (Databricks) , Reynold Xin (Databricks)

Reynold Xin, Co-founder and Chief Architect at Databricks, presented during Data + AI Summit 2024 on Databricks SQL and its advancements and how to drive performance improvements with the Databricks Data Intelligence Platform.

Speakers: Reynold Xin, Co-founder and Chief Architect, Databricks Pearl Ubaru, Technical Product Engineer, Databricks

Main Points and Key Takeaways (AI-generated summary)

Introduction of Databricks SQL: - Databricks SQL was announced four years ago and has become the fastest-growing product in Databricks history. - Over 7,000 customers, including Shell, AT&T, and Adobe, use Databricks SQL for data warehousing.

Evolution from Data Warehouses to Lakehouses: - Traditional data architectures involved separate data warehouses (for business intelligence) and data lakes (for machine learning and AI). - The lakehouse concept combines the best aspects of data warehouses and data lakes into a single package, addressing issues of governance, storage formats, and data silos.

Technological Foundations: - To support the lakehouse, Databricks developed Delta Lake (storage layer) and Unity Catalog (governance layer). - Over time, lakehouses have been recognized as the future of data architecture.

Core Data Warehousing Capabilities: - Databricks SQL has evolved to support essential data warehousing functionalities like full SQL support, materialized views, and role-based access control. - Integration with major BI tools like Tableau, Power BI, and Looker is available out-of-the-box, reducing migration costs.

Price Performance: - Databricks SQL offers significant improvements in price performance, which is crucial given the high costs associated with data warehouses. - Databricks SQL scales more efficiently compared to traditional data warehouses, which struggle with larger data sets.

Incorporation of AI Systems: - Databricks has integrated AI systems at every layer of their engine, improving performance significantly. - AI systems automate data clustering, query optimization, and predictive indexing, enhancing efficiency and speed.

Benchmarks and Performance Improvements: - Databricks SQL has seen dramatic improvements, with some benchmarks showing a 60% increase in speed compared to 2022. - Real-world benchmarks indicate that Databricks SQL can handle high concurrency loads with consistent low latency.

User Experience Enhancements: - Significant efforts have been made to improve the user experience, making Databricks SQL more accessible to analysts and business users, not just data scientists and engineers. - New features include visual data lineage, simplified error messages, and AI-driven recommendations for error fixes.

AI and SQL Integration: - Databricks SQL now supports AI functions and vector searches, allowing users to perform advanced analysis and query optimizations with ease. - The platform enables seamless integration with AI models, which can be published and accessed through the Unity Catalog.

Conclusion: - Databricks SQL has transformed into a comprehensive data warehousing solution that is powerful, cost-effective, and user-friendly. - The lakehouse approach is presented as a superior alternative to traditional data warehouses, offering better performance and lower costs.

Data + AI Summit Keynote Day 1 - Full

Data + AI Summit Keynote Day 1 - Full

2024-06-14 Watch
video
Patrick Wendall (Databricks) , Fei-Fei Li (Stanford University) , Brian Ames (General Motors) , Ken Wong (Databricks) , Ali Ghodsi (Databricks) , Jackie Brosamer (Block) , Reynold Xin (Databricks) , Jensen Huang (NVIDIA)

Databricks Data + AI Summit 2024 Keynote Day 1

Experts, researchers and open source contributors — from Databricks and across the data and AI community gathered in San Francisco June 10 - 13, 2024 to discuss the latest technologies in data management, data warehousing, data governance, generative AI for the enterprise, and data in the era of AI.

Hear from Databricks Co-founder and CEO Ali Ghodsi on building generative AI applications, putting your data to work, and how data + AI leads to data intelligence.

Plus a fireside chat between Ali Ghodsi and Nvidia Co-founder and CEO, Jensen Huang, on the expanded partnership between Nvidia and Databricks to accelerate enterprise data for the era of generative AI

Product announcements in the video include: - Databricks Data Intelligence Platform - Native support for NVIDIA GPU acceleration on the Databricks Data Intelligence Platform - Databricks open source model DBRX available as an NVIDIA NIM microservice - Shutterstock Image AI powered by Databricks - Databricks AI/BI - Databricks LakeFlow - Databricks Mosaic AI - Mosaic AI Agent Framework - Mosaic AI Agent Evaluation - Mosaic AI Tools Catalog - Mosaic AI Model Training - Mosaic AI Gateway

In this keynote hear from: - Ali Ghodsi, Co-founder and CEO, Databricks (1:45) - Brian Ames, General Motors (29:55) - Patrick Wendall, Co-founder and VP of Engineering, Databricks (38:00) - Jackie Brosamer, Head of AI, Data and Analytics, Block (1:14:42) - Fei Fei Li, Professor, Stanford University and Denning Co-Director, Stanford Institute for Human-Centered AI (1:23:15) - Jensen Huang, Co-founder and CEO of NVIDIA with Ali Ghodsi, Co-founder and CEO of Databricks (1:42:27) - Reynold Xin, Co-founder and Chief Architect, Databricks (2:07:43) - Ken Wong, Senior Director, Product Management, Databricks (2:31:15) - Ali Ghodsi, Co-founder and CEO, Databricks (2:48:16)

Data Warehousing using Fivetran, dbt and DBSQL

Data Warehousing using Fivetran, dbt and DBSQL

2023-08-03 Watch
video

In this video you will learn how to use Fivetran to ingest data from Salesforce into your Lakehouse. After the data has been ingested, you will then learn how you can transform your data using dbt. Then we will use Databricks SQL to query, visualize and govern your data. Lastly, we will show you how you can use AI functions in Databricks SQL to call language learning models.

Read more about Databricks SQL https://docs.databricks.com/en/sql/index.html#what-is-databricks-sql

Internet-Scale Analytics: Migrating a Mission Critical Product to the Cloud

Internet-Scale Analytics: Migrating a Mission Critical Product to the Cloud

2023-07-28 Watch
video

While we may not all agree on a “If it ain’t broke, don’t fix it” approach, we can all agree that “If it shows any crack, migrate it to the cloud and completely re-architect it.” Akamai’s CSI (Cloud Security Intelligence) group is responsible for processing massive amounts of security events arriving from our edge network, which is estimated to process 30% of internet traffic, making it accessible by various internal consumers powering customer-facing products.

In this session, we will visit the reasons for migrating one of our mission critical security products and its 10GB ingest pipeline to the cloud, examine our new architecture and its benefits and touch on the challenges we faced during the process (and still do). While our requirements are unique and our solution contains a few proprietary components, this session will provide you with several concepts involving popular off-the-shelf products you can easily use in your own cloud environment.

Talk by: Yaniv Kunda

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

If a Duck Quacks in the Forest and Everyone Hears, Should You Care?

If a Duck Quacks in the Forest and Everyone Hears, Should You Care?

2023-07-28 Watch
video
Ryan Boyd (Databricks)

YES! "Duck posting" has become an internet meme for praising DuckDB on Twitter. Nearly every quack using DuckDB has done it once or twice. But, why all the fuss? With advances in CPUs, memory, SSDs, and the software that enables it all, our personal machines are powerful beasts relegated to handling a few Chrome tabs and sitting 90% idle. As data engineers and data analysts, this seems like a waste that's not only expensive, but also impacting the environment.

In this session, you will see how DuckDB brings SQL analytics capabilities to a 2MB standalone executable on your laptop that only recently required a large cluster. This session will explain the architecture of DuckDB that enables high performance analytics on a laptop: great query optimization, vectorized execution, continuous improvements in compression and more. We will show its capabilities using live demos, from the pandas library to WASM, to the command-line. We'll demonstrate performance on large datasets, and talk about how we're exploring using the laptop to augment cloud analytics workloads.

Talk by: Ryan Boyd

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Using Lakehouse to Fight Cancer:Ontada’s Journey to Establish a RWD Platform on Databricks Lakehouse

Using Lakehouse to Fight Cancer:Ontada’s Journey to Establish a RWD Platform on Databricks Lakehouse

2023-07-28 Watch
video

Ontada, a McKesson business, is an oncology real-world data and evidence, clinical education and provider of technology business dedicated to transforming the fight against cancer. Core to Ontada’s mission is using real-world data (RWD) and evidence generation to improve patient health outcomes and to accelerate life science research.

To support its mission, Ontada embarked on a journey to migrate its enterprise data warehouse (EDW) from an on-premise Oracle database to Databricks Lakehouse. This move allows Ontada to now consume data from any source, including structured and unstructured data from its own EHR and genomics lab results, and realize faster time to insight. In addition, using the Lakehouse has helped Ontada eliminate data silos, enabling the organization to realize the full potential of RWD – from running traditional descriptive analytics to extracting biomarkers from unstructured data. The session will cover the following topics:

  • Oracle to Databricks: migration best practices and lessons learned
  • People, process, and tools: expediting innovation while protecting patient information using Unity Catalog
  • Getting the most out of the Databricks Lakehouse: from BI to genomics, running all analytics under one platform
  • Hyperscale biomarker abstraction: reducing the manual effort needed to extract biomarkers from large unstructured data (medical notes, scanned/faxed documents) using spaCY and John Snow Lab NLP libraries

Join this session to hear how Ontada is transforming RWD to deliver safe and effective cancer treatment.

Talk by: Donghwa Kim

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Data Democratization at Michelin

Too often business decisions in large organizations are based on time consuming and labor-intensive data extracts, fragile Excel or access sheets that require significant manual intervention. The teams that prepare these manual reports have invaluable heuristic knowledge that, when combined with meaningful data and tools, can make smart business decisions. Imagine a world where these business teams are empowered with tools that help them build meaningful reports despite their limited technical expertise.

In this session, we will discuss: - The value derived from investing in developing citizen data personas within a business organization - How we successfully built a citizen data analytics culture within Michelin - Real examples of the impact of this initiative on the business and on the people themselves

The audience will walk away with some convincing arguments for building a citizen data culture in their organization and a how-to cookbook that they can use to cultivate citizen data personas. Finally, they can interactively uncover key success factors in the case of Michelin that can help drive a similar initiative in their respective companies.

Talk by: Philippe Leonhart and Fabien Cochet

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Delta-rs, Apache Arrow, Polars, WASM: Is Rust the Future of Analytics?

Delta-rs, Apache Arrow, Polars, WASM: Is Rust the Future of Analytics?

2023-07-27 Watch
video

Rust is a unique language whose traits make it very appealing for data engineering. In this session, we'll walk through the different aspects of the language that make it such a good fit for big data processing including: how it improves performance and how it provides greater safety guarantees and compatibility with a wide range of existing tools that make it well positioned to become a major building block for the future of analytics.

We will also take a hands-on look through real code examples at a few emerging technologies built on top of Rust that utilize these capabilities, and learn how to apply them to our modern lakehouse architecture.

Talk by: Oz Katz

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Making Travel More Accessible for Customers Bringing Mobility Devices

Making Travel More Accessible for Customers Bringing Mobility Devices

2023-07-27 Watch
video
Madhan Venkatesan (American Airlines) , Teja Tangeda (American Airlines)

American Airlines takes great pride in caring for customers travel, and recognize the importance of supporting the dignity and independence of everyone who travels with us. As we work to improve the customer experience, we're committed to making our airline more accessible to everyone. Our work to ensure that travel that is accessible to all is well underway. We have been particularly focused on making the journey smoother for customers who rely on wheelchairs or other mobility devices. We have implemented the use of a bag tag specifically for wheelchairs and scooters that gives team members more information, like the mobility device’s weight and battery type, or whether it needs to be returned to a customer before a connecting flight.

As a data engineering and analytics team, we at American Airlines are building a passenger service request data product that will provide timely insights on expected mobility device traffic at each airport so that the front-line team members can provide seamless travel experience to the passengers.

Talk by: Teja Tangeda and Madhan Venkatesan

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: Alation | Unlocking the Power of Real-Time Data to Maximize Data Insights

Sponsored by: Alation | Unlocking the Power of Real-Time Data to Maximize Data Insights

2023-07-27 Watch
video

It’s no secret that access to the right data at the right time is critical for data-driven decision making. In fact, as data culture becomes more and more ingrained in the enterprise, business users increasingly demand real-time, actionable data. But, what happens when it takes up to 24 hours to access your point-of-sale data? RaceTrac faced many of these data accessibility challenges as it sought to derive intelligence from its retail transaction data, specifically the data from their stores, information from their fuel purchasing arms, and delivery data for their fleet.

Through a combination of the Databricks Lakehouse and the lineage and self-discovery capabilities of the Alation Data Intelligence Platform, RaceTrac rose to the challenge. Hear from Raghu Jayachandran, Senior Manager of Enterprise Data at RaceTrac, and discover how RaceTrac gained real-time access to their transaction data in Databricks, and uses Alation to provide insight into which data can drive the business insights they needed.

Talk by: Diby Malakar and Raghu Jayachandran

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin

Sponsored: Sisense-Developing Data Products: Infusion & Composability Are Changing Expectations

Sponsored: Sisense-Developing Data Products: Infusion & Composability Are Changing Expectations

2023-07-27 Watch
video

Composable analytics is the next progression of business intelligence. We will discuss how current analytics rely on two key principles: composability and agility. Through modularizing our analytics capabilities, we can rapidly “compose” new data applications. An organization uses these building blocks to deliver customized analytics experiences at a customer level.

This session will orientate business intelligence leaders to composable data and analytics.

  • How data teams can use composable analytics to decrease application development time.
  • How an organization can leverage existing and new tools to maximize value-based, data-driven insights.
    • Requirements for effectively deploying composable analytics.
    • Utilizing no, low-code and high-code analytics capabilities.
    • Extracting full value from your customer data and metadata.
    • Leveraging analytics building blocks to create new products and revenue streams.

Talk by: Scott Castle

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

The Future is Open: Data Streaming in an Omni-Cloud Reality

The Future is Open: Data Streaming in an Omni-Cloud Reality

2023-07-27 Watch
video

This session begins with data warehouse trivia and lessons learned from production implementations of multicloud data architecture. You will learn to design future-proof low latency data systems that focus on openness and interoperability. You will also gain a gentle introduction to Cloud FinOps principles that can help your organization reduce compute spend and increase efficiency. 

Most enterprises today are multicloud. While an assortment of low-code connectors boasts the ability to make data available for analytics in real time, they post long-lasting challenges:

  • Inefficient EDW targets
  • Inability to evolve schema
  • Forbiddingly expensive data exports due to cloud and vendor lock-in

The alternative is an open data lake that unifies batch and streaming workloads. Bronze landing zones in open format eliminate the data extraction costs required by proprietary EDW. Apache Spark™ Structured Streaming provides a unified ingestion interface. Streaming triggers allow us to switch back and forth between batch and stream with one-line code changes. Streaming aggregation enables us to incrementally compute on data that arrives near each other.

Specific examples are given on how to use Autoloader to discover newly arrived data and ensure exactly once, incremental processing. How DLT can be configured effectively to further simplify streaming jobs and accelerate the development cycle. How to apply SWE best practices to Workflows and integrate with popular Git providers, either using the Databricks Project or Databricks Terraform provider. 

Talk by: Christina Taylor

Here’s more to explore: Big Book of Data Engineering: 2nd Edition: https://dbricks.co/3XpPgNV The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Learnings From the Field: Migration From Oracle DW and IBM DataStage to Databricks on AWS

Learnings From the Field: Migration From Oracle DW and IBM DataStage to Databricks on AWS

2023-07-26 Watch
video

Legacy data warehouses are costly to maintain, unscalable and cannot deliver on data science, ML and real-time analytics use cases. Migrating from your enterprise data warehouse to Databricks lets you scale as your business needs grow and accelerate innovation by running all your data, analytics and AI workloads on a single unified data platform.

In the first part of this session we will guide you through the well-designed process and tools that will help you from the assessment phase to the actual implementation of an EDW migration project. Also, we will address ways to convert PL/SQL proprietary code to an open standard python code and take advantage of PySpark for ETL workloads and Databricks SQL’s data analytics workload power.

The second part of this session will be based on an EDW migration project of SNCF (French national railways); one of the major enterprise customers of Databricks in France. Databricks partnered with SNCF to migrate its real estate entity from Oracle DW and IBM DataStage to Databricks on AWS. We will walk you through the customer context, urgency to migration, challenges, target architecture, nitty-gritty details of implementation, best practices, recommendations, and learnings in order to execute a successful migration project in a very accelerated time frame.

Talk by: Himanshu Arora and Amine Benhamza

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

How Coinbase Built and Optimized SOON, a Streaming Ingestion Framework

How Coinbase Built and Optimized SOON, a Streaming Ingestion Framework

2023-07-26 Watch
video

Data with low latency is important for real-time incident analysis and metrics. Though we have up-to-date data in OLTP databases, they cannot support those scenarios. Data need to be replicated to a data warehouse to serve queries using GroupBy and Join across multiple tables from different systems. At Coinbase, we designed SOON (Spark cOntinuOus iNgestion) based on Kafka, Kafka Connect, and Apache Spark™ as an incremental table replication solution to replicate tables of any size from any database to Delta Lake in a timely manner. It also supports Kafka events ingestion naturally.

SOON incrementally ingests Kafka events as appends, updates, and deletes to an existing table on Delta Lake. The events are grouped into two categories: CDC (change data capture) events generated by Kafka Connect source connectors, and non-CDC events by the frontend or backend services. Both types can be appended or merged into the Delta Lake. Non-CDC events can be in any format, but CDC events must be in the standard SOON CDC schema. We implemented Kafka Connect SMTs to transform raw CDC events into this standardized format. SOON unifies all streaming ingestion scenarios such that users only need to learn one onboarding experience and the team only needs to maintain one framework.

We care about the ingestion performance. The biggest append-only table onboarded has ingress traffic at hundreds of thousands events per second; the biggest CDC-merge table onboarded has a snapshot size of a few TBs and CDC update traffic at hundreds of thousands events per second. A lot of innovative ideas are incorporated in SOON to improve its performance, such as min-max range merge optimization, KMeans merge optimization, no-update merge for deduplication, generated columns as partitions, etc.

Talk by: Chen Guo

Here’s more to explore: Big Book of Data Engineering: 2nd Edition: https://dbricks.co/3XpPgNV The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: ThoughtSpot | Drive Self-Service Adoption Through the Roof with Embedded Analytics

Sponsored by: ThoughtSpot | Drive Self-Service Adoption Through the Roof with Embedded Analytics

2023-07-26 Watch
video

When it comes to building stickier apps and products to grow your business, there's no greater opportunity than embedded analytics. Data apps that deliver superior user engagement and business value do analytics differently. They take a user-first approach and know how to deliver real-time, AI-powered insights - not just to internal employees - but to an organization’s customers and partners, as well.

Learn how ThoughtSpot Everywhere is helping companies like Emerald natively integrate analytics with other tools in their modern data stack to deliver a blazing-fast and instantly available analytics experience across all the data their users love. Join this session to learn how you can leverage embedded analytics to: Drive higher app engagement Get your app to market faster And create new revenue streams

Talk by: Krishti Bikal and Vika Smilansky

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Best Exploration of Columnar Shuffle Design

Best Exploration of Columnar Shuffle Design

2023-07-26 Watch
video

To significantly improve the performance of Spark SQL, there is a trend to offload Spark SQL execution to highly optimized native libraries or accelerators in past several years, like Photon from Databricks, Nvidia's Rapids plug-in, and Intel and Kyligence's initiated open source Gluten project. By the multi-fold performance improvement from these solutions, more and more Apache Spark™ users have started to adopt the new technology. One characteristics of native libraries is that they all use columnar data format as the basic data format. It's because the columnar data format has the intrinsic affinity to vectorized data processing using SIMD instructions. While vanilla Spark's shuffle is based on spark's internal row data format. The high overhead of the columnar to row and row to columnar conversion during the shuffle makes reusing current shuffle not possible. Due to the importance of shuffle service in Spark, we have to implement an efficient columnar shuffle, which brings couple of new challenges, like the split of columnar data, or the dictionary support during shuffle.

In this session, we will share the exploration process of the columnar shuffle design during our Gazelle and Gluten development, and best practices for implementing the columnar shuffle service. We will also share how we learned from the development of vanilla Spark's shuffle, for example, how to address the small files issue then we will propose the new shuffle solution. We will show the performance comparison between Columnar shuffle and vanilla Spark's row-based shuffle. Finally, we will share how the new built-in accelerators like QAT and IAA in the latest Intel processor are used in our columnar shuffle service and boost the performance.

Talk by: Binwei Yang and Rong Ma

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Databricks SQL: Why the Best Serverless Data Warehouse is a Lakehouse

Databricks SQL: Why the Best Serverless Data Warehouse is a Lakehouse

2023-07-26 Watch
video

Many organizations rely on complex cloud data architectures that create silos between applications, users and data. This fragmentation makes it difficult to access accurate, up-to-date information for analytics, often resulting in the use of outdated data. Enter the lakehouse, a modern data architecture that unifies data, AI, and analytics in a single location.

This session explores why the lakehouse is the best data warehouse, featuring success stories, use cases and best practices from industry experts. You'll discover how to unify and govern business-critical data at scale to build a curated data lake for data warehousing, SQL and BI. Additionally, you'll learn how Databricks SQL can help lower costs and get started in seconds with on-demand, elastic SQL serverless warehouses, and how to empower analytics engineers and analysts to quickly find and share new insights using their preferred BI and SQL tools such as Fivetran, dbt, Tableau, or Power BI.

Talk by: Miranda Luna and Cyrielle Simeone

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: dbt Labs | Modernizing the Data Stack: Lessons Learned From Evolution at Zurich Insurance

Sponsored: dbt Labs | Modernizing the Data Stack: Lessons Learned From Evolution at Zurich Insurance

2023-07-26 Watch
video
Jose L Sanchez Ros (Zurich Insurance) , Gerard Sola (Zurich Insurance)

In this session, we will explore the path Zurich Insurance took to modernize its data stack and data engineering practices, and the lessons learned along the way. We'll touch on how and why the team chose to:

  • Adopt community standards in code quality, code coverage, code reusability, and CI/CD
  • Rebuild the way data engineering collaborates with business teams
  • Explore data tools accessible to non-engineering users, with considerations for code-first and no-code interfaces
  • Structure our dbt project and orchestration — and the factors that played into our decisions

Talk by: Jose L Sanchez Ros and Gerard Sola

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: Kyvos | Analytics 100x Faster Lowest Cost w/ Kyvos & Databricks, Even on Trillions Rows

Sponsored: Kyvos | Analytics 100x Faster Lowest Cost w/ Kyvos & Databricks, Even on Trillions Rows

2023-07-26 Watch
video

Databricks and Kyvos together are helping organizations build their next-generation cloud analytics platform. A platform that can process and analyze massive amounts of data, even trillions of rows, and provide multidimensional insights instantly. Combining the power of Databricks with the speed, scale and cost optimization capabilities of Kyvos Analytics Acceleration Platform, customers can go beyond the limit of their analytics boundaries. Join our session to know how and also learn about a real-world use case.

Talk by: Leo Duncan

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin

Best Data Warehouse is a Lakehouse: Databricks Achieves Ops Efficiency w/ Lakehouse Architecture

Best Data Warehouse is a Lakehouse: Databricks Achieves Ops Efficiency w/ Lakehouse Architecture

2023-07-26 Watch
video
Naveen Zutshi (Databricks) , Romit Jadhwani (Databricks)

At Databricks, we use the Lakehouse architecture to build an optimized data warehouse that drives better insights, increased operational efficiency, and reduces costs. In this session, Naveen Zutshi, CIO at Databricks and Romit Jadhwani, Senior Director Analytics and Integrations at Databricks will discuss the Databricks journey and provide technical and business insights into how these results were achieved.

The session will cover topics such as medallion architecture, building efficient third party integrations, how Databricks built various data products/services on the data warehouse, and how to use governance to break down data silos and achieve consistent sources of truth.

Talk by: Naveen Zutshi and Romit Jadhwani

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin

Building Apps on the Lakehouse with Databricks SQL

Building Apps on the Lakehouse with Databricks SQL

2023-07-26 Watch
video

BI applications are undoubtedly one of the major consumers of a data warehouse. Nevertheless, the prospect of accessing data using standard SQL is appealing to many more stakeholders than just the data analysts. We’ve heard from customers that they experience an increasing demand to provide access to data in their lakehouse platforms from external applications beyond BI, such as e-commerce platforms, CRM systems, SaaS applications, or custom data applications developed in-house. These applications require an “always on” experience, which makes Databricks SQL Serverless a great fit.

In this session, we give an overview of the approaches available to application developers to connect to Databricks SQL and create modern data applications tailored to needs of users across an entire organization. We discuss when to choose one of the Databricks native client libraries for languages such as Python, Go, or node.js and when to use the SQL Statement Execution API, the newest addition to the toolset. We also explain when ODBC and JDBC might not be the best for the task and when they are your best friends. Live demos are included.

Talk by: Adriana Ispas and Chris Stevens

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Combining Privacy Solutions to Solve Data Access at Scale

Combining Privacy Solutions to Solve Data Access at Scale

2023-07-26 Watch
video

The trend that has made data easier to collect and analyze has only aggravated privacy risks. Luckily, a range of privacy technologies have emerged to enable private data management; differential privacy, synthetic data, confidential computing. In isolation, those technologies have had a limited impact because they did not always bring the 10x improvement expected by data leaders.

Combining these privacy technologies has been the real game changer. We will demonstrate that the right mix of technologies brings the optimal balance of privacy and flexibility at the scale of the data warehouse. We will illustrate this by real-life applications of Sarus in three domains:

  • Healthcare: how to make hospital data available for research at scale in full compliance
  • Finance: how to pool data between several banks to fight criminal transactions
  • Marketing: how to build insights on combined data from partners and distributors

The examples will be illustrated using data stored in Databricks and queried using Sarus differential privacy engine.

Talk by: Maxime Agostini

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc