talk-data.com talk-data.com

Topic

Data Lakehouse

data_architecture data_warehouse data_lake

489

tagged

Activity Trend

118 peak/qtr
2020-Q1 2026-Q1

Activities

489 activities · Newest first

Democratize AI & ML in a Large Company: The Importance of User Enablement & Technical Training

The biggest critical factor to success in a cloud transformation is people. As such, having a change management process in place to manage the impact of the transformation and user enablement is foundational to any large program. In this session, we will dive into how TD bank democratizes data, mobilizes a community of over 2000 analytics users and the tactics we used to successfully enable new use cases on Cloud. The session will focus on the following:

To democratize data: - Centralize a data platform that is accessible to all employees and allow for easy data sharing - Implement privacy and security to protect data and use data ethically - Compliance and governance for using data in responsible and compliant way - Simplification of processes and procedures to reduce redundancy and faster adoption

To mobilize end users: - Increase data literacy: provide training and resources for employees to increase their abilities and skills - Foster a culture of collaboration and openness: cross-functional teams to collaborate and share ideas - Encourage exploration of innovative ideas that impact the organization's values and customers technical enablement and adoption tactics we've used at TD Bank:

  1. Hands-on training for over 1300+ analytics users with emphasis on learn by doing, to relate to real-life situations
  2. Online tutorials and documentations to be used as self-paced study
  3. Workshops and office hours on specific topics to empower business users
  4. Coaching to work with teams on a specific use case/complex issue and provide recommendations for a faster, cost effective solutions
  5. Offer certification and encourage continuous education for employees to keep up to date with latest
  6. Feedback loop: get user feedback on training and user experience to improve future trainings

Talk by: Ellie Hajarian

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Five Things You Didn't Know You Could Do with Databricks Workflows

Databricks workflows has come a long way since the initial days of orchestrating simple notebooks and jar/wheel files. Now we can orchestrate multi-task jobs and create a chain of tasks with lineage and DAG with either fan-in or fan-out among multiple other patterns or even run another Databricks job directly inside another job.

Databricks workflows takes its tag: “orchestrate anything anywhere” pretty seriously and is a truly fully-managed, cloud-native orchestrator to orchestrate diverse workloads like Delta Live Tables, SQL, Notebooks, Jars, Python Wheels, dbt, SQL, Apache Spark™, ML pipelines with excellent monitoring, alerting and observability capabilities as well. Basically, it is a one-stop product for all orchestration needs for an efficient lakehouse. And what is even better is, it gives full flexibility of running your jobs in a cloud-agnostic and cloud-independent way and is available across AWS, Azure and GCP.

In this session, we will discuss and deep dive on some of the very interesting features and will showcase end-to-end demos of the features which will allow you to take full advantage of Databricks workflows for orchestrating the lakehouse.

Talk by: Prashanth Babu

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Improving Hospital Operations with Streaming Data and Real Time AI/ML

Over the past two years, Providence has developed a robust streaming data platform (SDP) leveraging Databricks in Azure. The SDP enables us to ingest and process real-time data reflecting clinical operations across our 52 hospitals and roughly 1000 ambulatory clinics. The HL7 messages generated by Epic are parsed using Databricks in our secure cloud environment and used to generate an up-to-the minute picture of exactly what is happening at the point of care.

We are already leveraging this information to minimize hospital overcrowding and have been actively integrating AI/ML to accurately forecast future conditions (e.g., arrivals, length of stay, acuity, and discharge requirements.) This allows us to both improve resource utilization (e.g., nurse staffing levels) and to optimize patient throughput. The result is both improved patient care and operational efficiency.

In this session, we will share how these outcomes are only possible with the power and elegance afforded by our investments in Azure, Databricks, and increasingly Lakehouse. We will demonstrate Providence's blueprint for enabling real-time analytics which can be generalized to other healthcare providers.

Talk by: Lindsay Mico and Deylo Woo

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Jet Streaming Data & Predictive Analytics: How Collins Aerospace to Keep Aircraft Flying

Most have experienced the frustration and disappointment of a flight delay or cancelation due to aircraft issues. The Collins Aerospace business unit at Raytheon Technologies is committed to redefining aerospace by using data to deliver a more reliable, sustainable, efficient, and enjoyable aviation industry.

Ascentia is a product example of this with focus on helping airlines make smarter and more sustainable decisions by anticipating aircraft maintenance issues in advance, leading to more reliable flight schedules and fewer delays. Over the past five years a variety of products from the Databricks technology suite were employed to achieve this. Leveraging cloud infrastructure and harnessing the Databricks Lakehouse, Apache Spark™ development, and Databricks’ dynamic platform, Collins has been able to accelerate development and deployment of predictive health monitoring (PHM) analytics to generate Ascentia’s aircraft maintenance recommendations.

Labcorp Data Platform Journey: From Selection to Go-Live in Six Months

Join this session to learn about the Labcorp data platform transformation from on-premises Hadoop to AWS Databricks Lakehouse. We will share best practices and lessons learned from cloud-native data platform selection, implementation, and migration from Hadoop (within six months) with Unity Catalog.

We will share steps taken to retire several legacy on-premises technologies and leverage Databricks native features like Spark streaming, workflows, job pools, cluster policies and Spark JDBC within Databricks platform. Lessons learned in Implementing Unity Catalog and building a security and governance model that scales across applications. We will show demos that walk you through batch frameworks, streaming frameworks, data compare tools used across several applications to improve data quality and speed of delivery.

Discover how we have improved operational efficiency, resiliency and reduced TCO, and how we scaled building workspaces and associated cloud infrastructure using Terraform provider.

Talk by: Mohan Kolli and Sreekanth Ratakonda

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Making Travel More Accessible for Customers Bringing Mobility Devices

American Airlines takes great pride in caring for customers travel, and recognize the importance of supporting the dignity and independence of everyone who travels with us. As we work to improve the customer experience, we're committed to making our airline more accessible to everyone. Our work to ensure that travel that is accessible to all is well underway. We have been particularly focused on making the journey smoother for customers who rely on wheelchairs or other mobility devices. We have implemented the use of a bag tag specifically for wheelchairs and scooters that gives team members more information, like the mobility device’s weight and battery type, or whether it needs to be returned to a customer before a connecting flight.

As a data engineering and analytics team, we at American Airlines are building a passenger service request data product that will provide timely insights on expected mobility device traffic at each airport so that the front-line team members can provide seamless travel experience to the passengers.

Talk by: Teja Tangeda and Madhan Venkatesan

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Multicloud Data Governance on the Databricks Lakehouse

Across industries, a multicloud setup has quickly become the reality for large organizations. Multi-cloud introduces new governance challenges as permissions models often do not translate from one cloud to the other and if they do, are insufficiently granular to accommodate privacy requirements and principles of least privilege. This problem can be especially acute for data and AI workloads that rely on sharing and aggregating large and diverse data sources across business unit boundaries and where governance models need to incorporate assets such as table rows/columns and ML features and models.

In this session, we will provide guidelines on how best to overcome these challenges for companies that have adopted the Databricks Lakehouse as their collaborative space for data teams across the organization, by exploiting some of the unique product features of the Databricks platform. We will focus on a common scenario: a data platform team providing data assets to two different ML teams, one using the same cloud and the other one using a different cloud.

We will explain the step-by-step setup of a unified governance model by leveraging the following components and conventions:

  • Unity Catalog for implementing fine-grained access control across all data assets: files in cloud storage, rows and columns in tables and ML features and models
  • The Databricks Terraform provider to automatically enforce guardrails and permissions across clouds
  • Account level SSO Integration and identity federation to centralize administer access across workspaces
  • Delta sharing to seamlessly propagate changes in provider data sets to consumers in near real-time
  • Centralized audit logging for a unified view on what asset was accessed by whom

Talk by: Ioannis Papadopoulos and Volker Tjaden

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Nebula: The Journey of Scaling Instacart’s Data Pipelines with Apache Spark™ and Lakehouse

Instacart has gone through immense growth during the pandemic and the trend continues. Instacart ads is no exception in this growth story. We have launched many new product lines including display and video ads covering the full advertising funnel to address the increasing demand of our retail partners. We have built advanced models to auto-suggest optimal bidding to increase the ROI for our CPG partners. Advertisers’ trust is the utmost priority and thus the quest to build a top-class ads measurement platform.

Ads data processing requires complex data verifications to update ads serving stats. In ETL pipelines these were implemented through files containing thousands of lines of raw SQL which were hard to scale, test, and iterate upon. Our data engineers used to spend hours testing small changes due to a lack of local testing mechanisms. These pain points stress our need for better tools. After some research, we chose Apache Spark™ as our preferred tool to rebuild ETLs, and the Databricks platform made this move easier. In this session, We'll share our journey to move our pipelines to Spark and Delta Lake on Databricks. With Spark, Scala, and Delta we solved many problems which were slowing the team’s productivity. Some key areas that will be covered include:

  • Modular and composable code
  • Unit testing framework
  • Incremental event processing with spark structured streaming
  • Granular resource tuning for better performance and cost efficacy

Other than the domain business logic, the problems discussed here are quite common for performing data processing at scale. We hope that sharing our learnings will benefit others who are going through similar growth challenges or migrating to Lakehouse.

Talk by: Devlina Das and Arthur Li

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Practical Pipelines: A Houseplant Alerting System with ksqlDB

Taking care of houseplants can be difficult; in many cases, over-watering and under-watering can have the same symptoms. Remove the guesswork involved in caring for your houseplants while also gaining valuable experience in building a practical, event-driven pipeline in your own home! This session explores the process of building a houseplant monitoring and alerting system using a Raspberry Pi and Apache Kafka. Moisture and temperature readings are captured from sensors in the soil and streamed into Kafka. From there, we use stream processing to transform the data, create a summary view of the current state, and drive real-time push alerts through Telegram.

In this session, we will talk about how to ingest the data followed by the tools, including ksqlDB and Kafka Connect, that help transform the raw data into useful information, and finally, You'll be shown how to use Kafka Producers and Consumers to make the entire application more interactive. By the end of this session, you’ll have everything you need to start building practical streaming pipelines in your own home. Roll up your sleeves – let’s get our hands dirty!

Talk by: Danica Fine

Here’s more to explore: Big Book of Data Engineering: 2nd Edition: https://dbricks.co/3XpPgNV The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: Alation | Unlocking the Power of Real-Time Data to Maximize Data Insights

It’s no secret that access to the right data at the right time is critical for data-driven decision making. In fact, as data culture becomes more and more ingrained in the enterprise, business users increasingly demand real-time, actionable data. But, what happens when it takes up to 24 hours to access your point-of-sale data? RaceTrac faced many of these data accessibility challenges as it sought to derive intelligence from its retail transaction data, specifically the data from their stores, information from their fuel purchasing arms, and delivery data for their fleet.

Through a combination of the Databricks Lakehouse and the lineage and self-discovery capabilities of the Alation Data Intelligence Platform, RaceTrac rose to the challenge. Hear from Raghu Jayachandran, Senior Manager of Enterprise Data at RaceTrac, and discover how RaceTrac gained real-time access to their transaction data in Databricks, and uses Alation to provide insight into which data can drive the business insights they needed.

Talk by: Diby Malakar and Raghu Jayachandran

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin

Sponsored: Impetus | Accelerating ADP’s Business Transformation w/ a Modern Enterprise Data Platform

Learn How ADP’s Enterprise Data Platform Is used to drive direct monetization opportunities, differentiate its solutions, and improve operations. ADP is continuously searching for ways to increase innovation velocity, time-to-market, and improve the overall enterprise efficiency. Making data and tools available to teams across the enterprise while reducing data governance risk is the key to making progress on all fronts. Learn about ADP’s enterprise data platform that created a single source of truth with centralized tools, data assets, and services. It allowed teams to innovate and gain insights by leveraging cross-enterprise data and central machine learning operations.

Explore how ADP accelerated creation of the data platform on Databricks and AWS, achieve faster business outcomes, and improve overall business operations. The session will also cover how ADP significantly reduced its data governance risk, elevated the brand by amplifying data and insights as a differentiator, increased data monetization, and leveraged data to drive human capital management differentiation.

Talk by: Chetan Kalanki and Zaf Babin

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: Sisense-Developing Data Products: Infusion & Composability Are Changing Expectations

Composable analytics is the next progression of business intelligence. We will discuss how current analytics rely on two key principles: composability and agility. Through modularizing our analytics capabilities, we can rapidly “compose” new data applications. An organization uses these building blocks to deliver customized analytics experiences at a customer level.

This session will orientate business intelligence leaders to composable data and analytics.

  • How data teams can use composable analytics to decrease application development time.
  • How an organization can leverage existing and new tools to maximize value-based, data-driven insights.
    • Requirements for effectively deploying composable analytics.
    • Utilizing no, low-code and high-code analytics capabilities.
    • Extracting full value from your customer data and metadata.
    • Leveraging analytics building blocks to create new products and revenue streams.

Talk by: Scott Castle

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

The Future is Open: Data Streaming in an Omni-Cloud Reality

This session begins with data warehouse trivia and lessons learned from production implementations of multicloud data architecture. You will learn to design future-proof low latency data systems that focus on openness and interoperability. You will also gain a gentle introduction to Cloud FinOps principles that can help your organization reduce compute spend and increase efficiency. 

Most enterprises today are multicloud. While an assortment of low-code connectors boasts the ability to make data available for analytics in real time, they post long-lasting challenges:

  • Inefficient EDW targets
  • Inability to evolve schema
  • Forbiddingly expensive data exports due to cloud and vendor lock-in

The alternative is an open data lake that unifies batch and streaming workloads. Bronze landing zones in open format eliminate the data extraction costs required by proprietary EDW. Apache Spark™ Structured Streaming provides a unified ingestion interface. Streaming triggers allow us to switch back and forth between batch and stream with one-line code changes. Streaming aggregation enables us to incrementally compute on data that arrives near each other.

Specific examples are given on how to use Autoloader to discover newly arrived data and ensure exactly once, incremental processing. How DLT can be configured effectively to further simplify streaming jobs and accelerate the development cycle. How to apply SWE best practices to Workflows and integrate with popular Git providers, either using the Databricks Project or Databricks Terraform provider. 

Talk by: Christina Taylor

Here’s more to explore: Big Book of Data Engineering: 2nd Edition: https://dbricks.co/3XpPgNV The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Optimizing Batch and Streaming Aggregations

A client recently asked to optimize their batch and streaming workloads. It happened to be aggregations using DataFrame.groupby operation with a custom Scala UDAF over a data stream from Kafka. Just a single simple-looking request that turned itself up into a a-few-month-long hunt to find a more performant query execution planning than ObjectHashAggregateExec that kept falling back to a sort-based aggregation (i.e., the worst possible aggregation runtime performance). It quickly taught us that an aggregation using a custom Scala UDAF cannot be planned other than ObjectHashAggregateExec but at least tasks don't always have to fall back. And that's just batch workloads. When you throw in streaming semantics and think of the different output modes, windowing and streaming watermark optimizing aggregation can take a long time to do right.

Talk by: Jacek Laskowski

Here’s more to explore: Big Book of Data Engineering: 2nd Edition: https://dbricks.co/3XpPgNV The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Map Your Lakehouse Content with DiscoverX

An enterprise lakehouse contains many different datasets which are related to different sources and might belong to different business units. These datasets can span across hundreds of tables, and each table has a different schema, and those schemas evolve over time. The cyber security domain is a good example where datasets come from many different source systems and land in the lakehouse. With such a complex dataset ecosystem, answers to simple questions like “Have we ever detected this IP address?” or “Which columns contain IP addresses?” can become impractical and expensive.

DiscoverX can automate the discovery of all columns that might contain specific patterns, (e.g., IP addresses, MAC addresses, fully qualified domain names, etc.) and automatically generate search and indexing queries that span across multiple tables and columns.

Talk by: Erni Durdevic and David Tempelmann

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Unlocking Near Real Time Data Replication with CDC, Apache Spark™ Streaming, and Delta Lake

Tune into DoorDash's journey to migrate from a flaky ETL system with 24-hour data delays, to standardizing a CDC streaming pattern across more than 150 databases to produce near real-time data in a scalable, configurable, and reliable manner.

During this journey, understand how we use Delta Lake to build a self-serve, read-optimized data lake with data latencies of 15, whilst reducing operational overhead. Furthermore, understand how certain tradeoffs like conceding to a non-real-time system allow for multiple optimizations but still permit for OLTP query use-cases, and the benefits it provides.

Talk by: Ivan Peng and Phani Nalluri

Here’s more to explore: Big Book of Data Engineering: 2nd Edition: https://dbricks.co/3XpPgNV The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Deploying the Lakehouse to Improve the Viewer Experience on Discovery+

In this session, we will discuss how real-time data streaming can be used to gain insights into user behavior and preferences, and how this data is being used to provide personalized content and recommendations on Discovery+. We will examine techniques that enables faster decision making and insights on accurate real time data including data masking and data validation. To enable a wide set of data consumers from data engineers to data scientists to data analysts, we will discuss how Unity Catalog is leveraged for secure data access and sharing while still allowing teams flexibility.

Operating at this scale requires examining the value being created by the data being processed and optimizing along the way and we will share some of our success in this area.

Talk by: Deepa Paranjpe

Here’s more to explore: Big Book of Data Engineering: 2nd Edition: https://dbricks.co/3XpPgNV The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Extending Lakehouse Architecture with Collaborative Identity

Lakehouse architecture has become a valuable solution for unifying data processing for AI, but faces limitations in maximizing data’s full potential. Additional data infrastructure is helpful for strengthening data consolidation and data connectivity with third-party sources, which are necessary for building full data sets for accurate audience modeling. 

In this session, LiveRamp will demonstrate to data and analytics decision-makers how to build on the Lakehouse architecture with extensions for collaborative identity graph construction, including how to simplify and improve data enrichment, data activation, and data collaboration. LiveRamp will also introduce a complete data marketplace, which enables easy, pseudonymized data enhancements that widen the attribute set for better behavioral model construction.

With these techniques and technologies, enterprises across financial services, retail, media, travel, and more can safely unlock partner insights and ultimately produce more accurate inputs for personalization engines, and more engaging offers and recommendations for customers.

Talk by: Erin Boelkens and Shawn Gilleran

Here’s more to explore: A New Approach to Data Sharing: https://dbricks.co/44eUnT1

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

How Coinbase Built and Optimized SOON, a Streaming Ingestion Framework

Data with low latency is important for real-time incident analysis and metrics. Though we have up-to-date data in OLTP databases, they cannot support those scenarios. Data need to be replicated to a data warehouse to serve queries using GroupBy and Join across multiple tables from different systems. At Coinbase, we designed SOON (Spark cOntinuOus iNgestion) based on Kafka, Kafka Connect, and Apache Spark™ as an incremental table replication solution to replicate tables of any size from any database to Delta Lake in a timely manner. It also supports Kafka events ingestion naturally.

SOON incrementally ingests Kafka events as appends, updates, and deletes to an existing table on Delta Lake. The events are grouped into two categories: CDC (change data capture) events generated by Kafka Connect source connectors, and non-CDC events by the frontend or backend services. Both types can be appended or merged into the Delta Lake. Non-CDC events can be in any format, but CDC events must be in the standard SOON CDC schema. We implemented Kafka Connect SMTs to transform raw CDC events into this standardized format. SOON unifies all streaming ingestion scenarios such that users only need to learn one onboarding experience and the team only needs to maintain one framework.

We care about the ingestion performance. The biggest append-only table onboarded has ingress traffic at hundreds of thousands events per second; the biggest CDC-merge table onboarded has a snapshot size of a few TBs and CDC update traffic at hundreds of thousands events per second. A lot of innovative ideas are incorporated in SOON to improve its performance, such as min-max range merge optimization, KMeans merge optimization, no-update merge for deduplication, generated columns as partitions, etc.

Talk by: Chen Guo

Here’s more to explore: Big Book of Data Engineering: 2nd Edition: https://dbricks.co/3XpPgNV The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Rapidly Implementing Major Retailer API at the Hershey Company

Accurate, reliable, and timely data is critical for CPG companies to stay ahead in highly competitive retailer relationships, and for a company like the Hershey Company, the commercial relationship with Walmart is one of the most important. The team at Hershey found themselves with a looming deadline for their legacy analytics services and targeted a migration to the brand new Walmart Luminate API. Working in partnership with Advancing Analytics, the Hershey Company leveraged a metadata-driven Lakehouse Architecture to rapidly onboard the new Luminate API, helping the category management teams to overhaul how they measure, predict, and plan their business operations.

In this session, we will discuss the impact Luminate has had on Hershey's business covering key areas such as sales, supply chain, and retail field execution, and the technical building blocks that can be used to rapidly provision business users with the data they need, when they need it. We will discuss how key technologies enable this rapid approach, with Databricks Autoloader ingesting and shaping our data, Delta Streaming processing the data through the lakehouse and Databricks SQL providing a responsive serving layer. The session will include commentary as well as cover the technical journey.

Talk by: Simon Whiteley and Jordan Donmoyer

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc