talk-data.com talk-data.com

Event

Databricks DATA + AI Summit 2023

2026-01-11 YouTube Visit website ↗

Activities tracked

582

Sessions & talks

Showing 101–125 of 582 · Newest first

Search within this event →
IFC's MALENA Provides Analytics for ESG Reviews in Emerging Markets Using NLP and LLMs

IFC's MALENA Provides Analytics for ESG Reviews in Emerging Markets Using NLP and LLMs

2023-07-26 Watch
video

International Finance Corporation (IFC) is using data and AI to build machine learning solutions that create analytical capacity to support the review of ESG issues at scale. This includes natural language processing and requires entity recognition and other applications to support the work of IFC’s experts and other investors working in emerging markets. These algorithms are available via IFC’s Machine Learning ESG Analyst (MALENA) platform to enable rapid analysis, increase productivity, and build investor confidence. In this manner, IFC, a development finance institution with the mandate to address poverty in emerging markets, is making use of its historical datasets and open source AI solutions to build custom-AI applications that democratize access to ESG capacity to read and classify text.

In this session, you will learn the unique flexibility of the Apache Spark™ ecosystem from Databricks and how that has allowed IFC’s MALENA project to connect to scalable data lake storage, use different natural language processing models and seamlessly adopt MLOps.

Talk by: Atiyah Curmally and Blaise Sandwidi

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Increasing Data Trust: Enabling Data Governance on Databricks Using Unity Catalog & ML-Driven MDM

Increasing Data Trust: Enabling Data Governance on Databricks Using Unity Catalog & ML-Driven MDM

2023-07-26 Watch
video

As part of Comcast Effectv’s transformation into a completely digital advertising agency, it was key to develop an approach to manage and remediate data quality issues related to customer data so that the sales organization is using reliable data to enable data-driven decision making. Like many organizations, Effectv's customer lifecycle processes are spread across many systems utilizing various integrations between them. This results in key challenges like duplicate and redundant customer data that requires rationalization and remediation. Data is at the core of Effectv’s modernization journey with the intended result of winning more business, accelerating order fulfillment, reducing make-goods and identifying revenue.

In partnership with Slalom Consulting, Comcast Effectv built a traditional lakehouse on Databricks to ingest data from all of these systems but with a twist; they anchored every engineering decision in how it will enable their data governance program.

In this session, we will touch upon the data transformation journey at Effectv and dive deeper into the implementation of data governance leveraging Databricks solutions such as Delta Lake, Unity Catalog and DB SQL. Key focus areas include how we baked master data management into our pipelines by automating the matching and survivorship process, and bringing it all together for the data consumer via DBSQL to use our certified assets in bronze, silver and gold layers.

By making thoughtful decisions about structuring data in Unity Catalog and baking MDM into ETL pipelines, you can greatly increase the quality, reliability, and adoption of single-source-of-truth data so your business users can stop spending cycles on wrangling data and spend more time developing actionable insights for your business.

Talk by: Maggie Davis and Risha Ravindranath

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Real-Time Reporting and Analytics for Construction Data Powered by Delta Lake and DBSQL

Real-Time Reporting and Analytics for Construction Data Powered by Delta Lake and DBSQL

2023-07-26 Watch
video

Procore is a construction project management software that helps construction professionals efficiently manage their projects and collaborate with their teams. Our mission is to connect everyone in construction on a global platform.

Procore is the system of record for all construction projects. Our customers need to access the data in near real-time for construction insights. Enhanced reporting is a self-service operational reporting module that allows quick data access with consistency to thousands of tables and reports.

Procore data platform rebuilt the module (originally built on the relational database) using Databricks and Delta lake. We used Apache Spark™ streaming to maintain the consistent state on the ingestion side from Kafka and plan to leverage the fully capable functionalities of DBSQL using the serverless SQL warehouse to read the medallion models (built via DBT) in Delta Lake. In addition, the Unity Catalog and the Delta share features helped us share the data across regions seamlessly. This design enabled us to improve the p95 and p99 read time by xx% (which were initially timing out).

Attend this session to hear about the learnings and experience of building a Data Lakehouse architecture.

Talk by: Jay Yang and Hari Rajaram

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Taking Your Cloud Vendor to the Next Level: Solving Complex Challenges with Azure Databricks

Taking Your Cloud Vendor to the Next Level: Solving Complex Challenges with Azure Databricks

2023-07-26 Watch
video

Akamai's content delivery network (CDN) processes about 30% of the internet's daily traffic, resulting in a massive amount of data that presents engineering challenges, both internally and with cloud vendors. In this session, we will discuss the barriers faced while building a data infrastructure on Azure, Databricks, and Kafka to meet strict SLAs, hitting the limits of some of our cloud vendors’ services. We will describe the iterative process of re-architecting a massive scale data platform using the aforementioned technologies.

We will also delve into how today, Akamai is able to quickly ingest and make available to customers terabytes of data, as well as efficiently query Petabytes of data and return results within 10 seconds for most queries. This discussion will provide valuable insights for attendees and organizations seeking to effectively process and analyze large amounts of data.

Evaluating LLM-based Applications

Evaluating LLM-based Applications

2023-07-26 Watch
video

Evaluating LLM-based applications can feel like more of an art than a science. In this workshop, we'll give a hands-on introduction to evaluating language models. You'll come away with knowledge and tools you can use to evaluate your own applications, and answers to questions like:

  • Where do I get evaluation data from, anyway?
  • Is it possible to evaluate generative models in an automated way?
  • What metrics can I use?
  • What's the role of human evaluation?

Talk by: Josh Tobin

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Best Exploration of Columnar Shuffle Design

Best Exploration of Columnar Shuffle Design

2023-07-26 Watch
video

To significantly improve the performance of Spark SQL, there is a trend to offload Spark SQL execution to highly optimized native libraries or accelerators in past several years, like Photon from Databricks, Nvidia's Rapids plug-in, and Intel and Kyligence's initiated open source Gluten project. By the multi-fold performance improvement from these solutions, more and more Apache Spark™ users have started to adopt the new technology. One characteristics of native libraries is that they all use columnar data format as the basic data format. It's because the columnar data format has the intrinsic affinity to vectorized data processing using SIMD instructions. While vanilla Spark's shuffle is based on spark's internal row data format. The high overhead of the columnar to row and row to columnar conversion during the shuffle makes reusing current shuffle not possible. Due to the importance of shuffle service in Spark, we have to implement an efficient columnar shuffle, which brings couple of new challenges, like the split of columnar data, or the dictionary support during shuffle.

In this session, we will share the exploration process of the columnar shuffle design during our Gazelle and Gluten development, and best practices for implementing the columnar shuffle service. We will also share how we learned from the development of vanilla Spark's shuffle, for example, how to address the small files issue then we will propose the new shuffle solution. We will show the performance comparison between Columnar shuffle and vanilla Spark's row-based shuffle. Finally, we will share how the new built-in accelerators like QAT and IAA in the latest Intel processor are used in our columnar shuffle service and boost the performance.

Talk by: Binwei Yang and Rong Ma

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Best Practices for Running Efficient Apache Spark™ Workloads on Databricks

Best Practices for Running Efficient Apache Spark™ Workloads on Databricks

2023-07-26 Watch
video
Justin Breese (Databricks)

Every day thousands of customers choose to run business-critical Spark workloads on the Databricks Lakehouse Platform, a platform built by the creators of Apache Spark™. These customers take advantage of platform capabilities such as fully managed compute resources, dynamic autoscaling, an integrated workflow orchestration tool and of Photon, the extremely fast vectorized execution engine. All of these make the Databricks Lakehouse Platform the best place to run Spark workloads providing operational benefits as well as tremendous price/performance value.

This session which includes live demos will cover these and other platform capabilities that can help you build your next optimized Spark application.

Talk by: Justin Breese

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Databricks Lakehouse: How BlackBerry is Revolutionizing Cybersecurity Services Worldwide

Databricks Lakehouse: How BlackBerry is Revolutionizing Cybersecurity Services Worldwide

2023-07-26 Watch
video
Robert Lombardi , Justin Lai (Arctic Wolf)

Cybersecurity incidents are costly, and using an endpoint detection and response (EDR) solution enables the detection of cybersecurity incidents as quickly as possible. To effectively detect cybersecurity incidences requires the collection of millions of data points, and the storing/querying of endpoints data presents considerable engineering challenges. This includes quickly moving local data from endpoints to a single table in the cloud and enabling performant querying against it.

The need to avoid internal data siloing within BlackBerry was paramount as multiple teams required access to the data to deliver an effective EDR solution for the present and the future. Databricks tooling enabled us to break down our data silos and iteratively improve our EDR pipeline to ingest data faster and reduce querying latency by more than 20% while reducing costs by more than 30%.

In this session, we will share the journey, lessons learned, and the future for collecting, storing, governing, and sharing data from endpoints in Databricks. The result of building EDR using Databricks helped us accelerate the deployment of our data platform.

Talk by: Justin Lai and Robert Lombardi

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin

Databricks SQL: Why the Best Serverless Data Warehouse is a Lakehouse

Databricks SQL: Why the Best Serverless Data Warehouse is a Lakehouse

2023-07-26 Watch
video

Many organizations rely on complex cloud data architectures that create silos between applications, users and data. This fragmentation makes it difficult to access accurate, up-to-date information for analytics, often resulting in the use of outdated data. Enter the lakehouse, a modern data architecture that unifies data, AI, and analytics in a single location.

This session explores why the lakehouse is the best data warehouse, featuring success stories, use cases and best practices from industry experts. You'll discover how to unify and govern business-critical data at scale to build a curated data lake for data warehousing, SQL and BI. Additionally, you'll learn how Databricks SQL can help lower costs and get started in seconds with on-demand, elastic SQL serverless warehouses, and how to empower analytics engineers and analysts to quickly find and share new insights using their preferred BI and SQL tools such as Fivetran, dbt, Tableau, or Power BI.

Talk by: Miranda Luna and Cyrielle Simeone

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Data Extraction and Sharing Via The Delta Sharing Protocol

Data Extraction and Sharing Via The Delta Sharing Protocol

2023-07-26 Watch
video

The Delta Sharing open protocol for secure sharing and distribution of Lakehouse data is designed to reduce friction in getting data to users. Delivering custom data solutions from this protocol further leverages the technical investment committed to your Delta Lake infrastructure. There are key design and computational concepts unique to Delta Sharing to know when undertaking development. And there are pitfalls and hazards to avoid when delivering modern cloud data to traditional data platforms and users.

In this session, we introduce Delta Sharing Protocol development and examine our journey and the lessons learned while creating the Delta Sharing Excel Add-in. We will demonstrate scenarios of overfetching, underfetching, and interpretation of types. We will suggest methods to overcome these development challenges. The session will combine live demonstrations that exercise the Delta Sharing REST protocol with detailed analysis of the responses. The demonstrations will elaborate on optional capabilities of the protocol’s query mechanism, and how they are used and interpreted in real-life scenarios. As a reference baseline for data professionals, the Delta Sharing exercises will be framed relative to SQL counterparts. Specific attention will be paid to how they differ, and how Delta Sharing’s Change Data Feed (CDF) can power next-generation data architectures. The session will conclude with a survey of available integration solutions for getting the most out of your Delta Sharing environment, including frameworks, connectors, and managed services.

Attendees are encouraged to be familiar with REST, JSON, and modern programming concepts. A working knowledge of Delta Lake, the Parquet file format, and the Delta Sharing Protocol are advised.

Talk by: Roger Dunn

Here’s more to explore: A New Approach to Data Sharing: https://dbricks.co/44eUnT1

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Data Globalization at Conde Nast Using Delta Sharing

Data Globalization at Conde Nast Using Delta Sharing

2023-07-26 Watch
video

Databricks has been an essential part of the Conde Nast architecture for the last few years. Prior to building our centralized data platform, “evergreen,” we had similar challenges as many other organizations; siloed data, duplicated efforts for engineers, and a lack of collaboration between data teams. These problems led to mistrust in data sets and made it difficult to scale to meet the strategic globalization plan we had for Conde Nast.

Over the last few years we have been extremely successful in building a centralized data platform on Databricks in AWS, fully embracing the lakehouse vision from end-to-end. Now, our analysts and marketers can derive the same insights from one dataset and data scientists can use the same datasets for use cases such as personalization, subscriber propensity models, churn models and on-site recommendations for our iconic brands.

In this session, we’ll discuss how we plan to incorporate Unity Catalog and Delta Sharing as the next phase of our globalization mission. The evergreen platform has become the global standard for data processing and analytics at Conde. In order to manage the worldwide data and comply with GDPR requirements, we need to make sure data is processed in the appropriate region and PII data is handled appropriately. At the same time, we need to have a global view of the data to allow us to make business decisions at the global level. We’ll talk about how delta sharing allows us a simple, secure way to share de-identified datasets across regions in order to make these strategic business decisions, while complying with security requirements. Additionally, we’ll discuss how Unity Catalog allows us to secure, govern and audit these datasets in an easy and scalable manner.

Talk by: Zachary Bannor

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Embrace First-Party Customer Data for Marketing and Advertising using Data Cleanrooms

Embrace First-Party Customer Data for Marketing and Advertising using Data Cleanrooms

2023-07-26 Watch
video
Jordan Peck (/ Snowplow)

The digital marketing and advertising industry is going through revolutionary change in 2023, with technical, organisational, cultural and regulatory overhaul. As a result, measuring digital advertising effectiveness or coordinating and running highly targeted and effective ad campaigns is becoming more challenging than ever.

First party customer behavioral data provides organizations true competitive advantage and the ability outperform your peers in the battle for customer attention and brand loyalty.

However, first party customer data is still used sparingly across the digital ad ecosystem, and there are few tools or frameworks to allow advertisers to unlock the value in what first party data they have.

This session will show you how Snowplow allows organizations to deeply understand their users' behavior and intent by creating the best quality behavioral data. It will also explain that when this is combined with the Databricks Lakehouse and data clean rooms, brands can now unlock insights that were previously unachievable, and activate their first party customer behavioral data into highly effective, personalized and creative ad campaigns.

In this session you will learn: - Why first party data can be the ultimate in competitive advantage for digital advertisers - How data clean rooms combined with Snowplow behavioral data enable better insights and more impactful ad targeting - What specific marketing and advertising use cases are possible when utilizing a data clean room on top of the Databricks Lakehouse

Talk by: Jordan Peck

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Embracing the Future of Data Engineering: The Serverless, Real-Time Lakehouse in Action

Embracing the Future of Data Engineering: The Serverless, Real-Time Lakehouse in Action

2023-07-26 Watch
video
Frank Munz (Databricks)

As we venture into the future of data engineering, streaming and serverless technologies take center stage. In this fun, hands-on, in-depth and interactive session you can learn about the essence of future data engineering today.

We will tackle the challenge of processing streaming events continuously created by hundreds of sensors in the conference room from a serverless web app (bring your phone and be a part of the demo). The focus is on the system architecture, the involved products and the solution they provide. Which Databricks product, capability and settings will be most useful for our scenario? What does streaming really mean and why does it make our life easier? What are the exact benefits of serverless and how "serverless" is a particular solution?

Leveraging the power of the Databricks Lakehouse Platform, I will demonstrate how to create a streaming data pipeline with Delta Live Tables ingesting data from AWS Kinesis. Further, I’ll utilize advanced Databricks workflows triggers for efficient orchestration and real-time alerts feeding into a real-time dashboard. And since I don’t want you to leave with empty hands - I will use Delta Sharing to share the results of the demo we built with every participant in the room. Join me in this hands-on exploration of cutting-edge data engineering techniques and witness the future in action.

Talk by: Frank Munz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Essential Data Security Strategies for the Modern Enterprise Data Architecture

Essential Data Security Strategies for the Modern Enterprise Data Architecture

2023-07-26 Watch
video

Balancing critical data requirements is a 24-7 task for enterprise-level organizations that must straddle the need to open specific gates to enable self-service data access while closing other access points to maintain internal and external compliance. Data breaches can cost U.S. businesses an average of $9.4 million per occurrence; ignoring this leaves organizations vulnerable to severe losses and crippling costs.

The 2022 Gartner Hype Cycle for Data Security reports that more and more enterprises are modernizing their data architecture with cloud and technology partners to help them collect, store and manage business data; a trend that does not appear to be letting up. According to Gartner®, “by 2025, 30% of enterprises will have adopted the Broad Data Security Platform (bDSP), up from less than 10% in 2021, due to the pent-up demand for higher levels of data security and the rapid increase in product capabilities."

Moving to both a modern data architecture and data-driven culture sets enterprises on the right trajectory for growth, but it’s important to keep in mind individual public cloud platforms are not guaranteed to protect and secure data. To solve this, Privacera pioneered the industry’s first open-standards-based data security platform that integrates privacy and compliance across multiple cloud services.

During this presentation, we will discuss: - Why today’s modern data architecture needs a DSP that works across the entire data ecosystem; Essential DSP prescriptive measures and adoption strategies. - Why faster and more responsible access to data insights helps reduce cost, increases productivity, expedites decision making, and leads to exponential growth.

Talk by: Piet Loubser

Here’s more to explore: Data, Analytics, and AI Governance: https://dbricks.co/44gu3YU

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Generative AI at Scale Using GAN and Stable Diffusion

Generative AI at Scale Using GAN and Stable Diffusion

2023-07-26 Watch
video

Generative AI is under the spotlight and it has diverse applications but there are also many considerations when deploying a generative model at scale. This presentation will make a deep dive into multiple architectures and talk about optimization hacks for the sophisticated data pipelines that generative AI requires. The session will cover: - How to create and prepare a dataset for training at scale in single GPU and multi GPU environments. - How to optimize your data pipeline for training and inference in production considering the complex deep learning models that need to be run. - Tradeoff between higher quality outputs versus training time and resources and processing times.

Agenda: - Basic concepts in Generative AI: GAN networks and Stable Diffusion - Training and inference data pipelines - Industry applications and use cases

Talk by: Paula Martinez and Rodrigo Beceiro

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Lineage System Table in Unity Catalog

Lineage System Table in Unity Catalog

2023-07-26 Watch
video

Unity Catalog provides fully automated data lineage for all workloads in SQL, R, Python, Scala and across all asset types at Databricks. The aggregated view has been available to end users through data explorer and API. In this session, we are excited to share that lineage is available via delta table in their UC metastore. It stores full history of recent lineage records and it is near real time. Additionally, customers can query it through standard SQL interface. With that, customers can get significant operational insights about their workload for impact analysis, troubleshooting, quality assurance, data discovery, and data governance.

Together with the system table platform effort, which provides query history, job run operational data, audit logs and more, lineage table will be a critical piece to link all the data asset and entity asset together, providing better lakehouse observability and unification to customers.

Talk by: Menglei Sun

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Processing Prescriptions at Scale at Walgreens

Processing Prescriptions at Scale at Walgreens

2023-07-26 Watch
video

We designed a scalable Spark Streaming job to manage 100s of millions of prescription-related operations per day at an end-to-end SLA of a few minutes and a lookup time of one second using CosmosDB.

In this session, we will share not only the architecture, but the challenges and solutions to using the Spark Cosmos connector at scale. We will discuss usages of the Aggregator API, custom implementations of the CosmosDB connector, and the major roadblocks we encountered with the solutions we engineered. In addition, we collaborated closely with Cosmos development team at Microsoft and will share the new features which resulted. If you ever plan to use Spark with Cosmos, you won't want to miss these gotchas!

Talk by: Daniel Zafar

Here’s more to explore: Big Book of Data Engineering: 2nd Edition: https://dbricks.co/3XpPgNV The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Rapidly Scaling Applied AI/ML with Foundational Models and Applying Them to Modern AI/ML Use Cases

Rapidly Scaling Applied AI/ML with Foundational Models and Applying Them to Modern AI/ML Use Cases

2023-07-26 Watch
video
Nick King (Snowplow)

Today many of us are familiar with foundational models such as LLM/ChatGPT. However, there are many more enterprise foundational models that can be rapidly deployed, trained and applied to enterprise use cases. This approach dramatically increases the performance of AI/ML models in production, but also gives AI teams rapid roadmaps for efficiency and delivering value to the business. Databricks provides the ideal toolset to enable this approach.

In this session, we will provide a logically overview of foundational models available today, demonstrate a real-world use case, and provide a business framework for data scientists and business leaders to collaborate to rapidly deploy these use cases.

Talk by: Nick King

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin

Real-Time Streaming Solution for Call Center Analytics: Business Challenges and Technical Enablement

Real-Time Streaming Solution for Call Center Analytics: Business Challenges and Technical Enablement

2023-07-26 Watch
video

A large international client with a business footprint in North America, Europe and Africa reached out to us with an interest in having a real-time streaming solution designed and implemented for its call center handling incoming and outgoing client calls. The client had a previous bad experience with another vendor, who overpromised and underdelivered on the latency of the streaming solution. The previous vendor delivered an over-complex streaming data pipeline resulting in the data taking over five minutes to reach a visualization layer. The client felt that architecture was too complex and involved many services integrated together.

Our immediate challenges involved gaining the client's trust and proving that our design and implementation quality would supersede a previous experience. To resolve an immediate challenge of the overly complicated pipeline design, we deployed a Databricks Lakehouse architecture with Azure Databricks at the center of the solution. Our reference architecture integrated Genesys Cloud : App Services : Event Hub : Databricks : : Data Lake : Power BI.

The streaming solution proved to be low latency (seconds) during the POV stage, which led to subsequent productionalization of the pipeline with deployment of jobs, DLTs pipeline, including multi-notebook workflow and business and performance metrics dashboarding relied on by the call center staff for a day-to-day performance monitoring and improvements.

Talk by: Natalia Demidova

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: Accenture | Factory of the Future: Building Digital Twins Using Knowledge Graphs & Gen AI

Sponsored: Accenture | Factory of the Future: Building Digital Twins Using Knowledge Graphs & Gen AI

2023-07-26 Watch
video

Digital twins are the foundation for the Factory of the Future providing the data foundation to answer questions like what is happening and what can be done about it. It requires combining data across the business — from R&D, manufacturing, supply chain, and operations — and with partners, that then is used with AI to make decisions.

This session presents a case study of a digital twin implemented for warehouse controllers designed to alleviate internal decisions and recommendations for next trips, that replaces tribal knowledge and gut-decision making. We share how we use a domain knowledge graph to drive a data-driven approach that combines warehouse data, with simulations, AI models, and domain knowledge. Warehouse controllers use a dispatch control board that provides a list of orders by dispatch date and time, destination, carrier, assignments to the trailers and to the order and dock number. We show how this new semantic layer works with large language models to make it easier to answer questions on what trip to activate and trailer to choose; based on assets available, products in inventory, and what's coming out of manufacturing.

Talk by: Teresa Tung

Here’s more to explore: A New Approach to Data Sharing: https://dbricks.co/44eUnT1

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: Anomalo | Data Archaeology: Quickly Understand Unfamiliar Datasets Using Machine Learning

Sponsored: Anomalo | Data Archaeology: Quickly Understand Unfamiliar Datasets Using Machine Learning

2023-07-26 Watch
video

One of the most daunting and time-consuming activities for data scientists and data analysts is understanding new and unfamiliar data sets. When given such a new data set, how do you understand its shape and structure? How can you quickly understand its important trends and characteristics? The typical answer is hours of manual querying and exploration, a process many call data archaeology.

This session will show a better way to explore new data sets by letting machine learning do the work for you. In particular, we will showcase how Anomalo simplifies the process of understanding and obtaining insights from Databricks tables — without manual querying. With a few clicks, you can generate comprehensive profiles and powerful visualizations that give immediate insight into your data's key characteristics and trends, as well as its shape and structure. With this approach, very little manual data archaeology is required, and you can quickly get to work on getting value out of the data (rather than just exploring it).

Talk by: Elliot Shmukler and Vicky Andonova

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksi

Sponsored: AWS-Real Time Stream Data & Vis Using Databricks DLT, Amazon Kinesis, & Amazon QuickSight

Sponsored: AWS-Real Time Stream Data & Vis Using Databricks DLT, Amazon Kinesis, & Amazon QuickSight

2023-07-26 Watch
video

Amazon Kinesis Data Analytics is a managed service that can capture streaming data from IoT devices. Databricks Lakehouse platform provides ease of processing streaming and batch data using Delta Live Tables. Amazon Quicksight with powerful visualization capabilities can provides various advanced visualization capabilities with direct integration with Databricks. Combining these services, customers can capture, process, and visualize data from hundreds and thousands of IoT sensors with ease.

Talk by: Venkat Viswanathan

Here’s more to explore: Big Book of Data Engineering: 2nd Edition: https://dbricks.co/3XpPgNV The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: dbt Labs | Modernizing the Data Stack: Lessons Learned From Evolution at Zurich Insurance

Sponsored: dbt Labs | Modernizing the Data Stack: Lessons Learned From Evolution at Zurich Insurance

2023-07-26 Watch
video
Jose L Sanchez Ros (Zurich Insurance) , Gerard Sola (Zurich Insurance)

In this session, we will explore the path Zurich Insurance took to modernize its data stack and data engineering practices, and the lessons learned along the way. We'll touch on how and why the team chose to:

  • Adopt community standards in code quality, code coverage, code reusability, and CI/CD
  • Rebuild the way data engineering collaborates with business teams
  • Explore data tools accessible to non-engineering users, with considerations for code-first and no-code interfaces
  • Structure our dbt project and orchestration — and the factors that played into our decisions

Talk by: Jose L Sanchez Ros and Gerard Sola

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: Matillion - OurFamilyWizard Moves and Transforms Data for Databricks Delta Lake Easy

Sponsored: Matillion - OurFamilyWizard Moves and Transforms Data for Databricks Delta Lake Easy

2023-07-26 Watch
video
Beth Mattson (OurFamilyWizard) , Jamie Baker (Matillion)

OurFamilyWizard helps families living separately thrive, empowering parents with needed tools after divorce or separation. Migrating to a modern data stack built on a Databricks Delta Lake seemed like the obvious choice for OurFamilyWizard to start integrating 20 years of on-prem Oracle data with event tracking and SaaS cloud data, but they needed tools to do it. OurFamilyWizard turned to Matillion, a powerful and intuitive solution, to quickly load, combine, and transform source data into reporting tables and data marts, and empower them to turn raw data into information the organization can use to make decisions.

In this session, Beth Mattson, OurFamilyWizard Senior Data Engineer, will detail how Matillion helped OurFamilyWizard migrate their data to Databricks fast and provided end-to-end ETL capabilities. In addition, Jamie Baker, Matillion Director of Product Management, will give a brief demo and discuss the Matillion and Databricks partnership and what is on the horizon.

Talk by: Jamie Baker and Beth Mattson

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Streaming Data Analytics with Power BI and Databricks

Streaming Data Analytics with Power BI and Databricks

2023-07-26 Watch
video

This session is comprised of a series of end-to-end technical demos illustrating the synergy between Databricks and Power BI for streaming use cases, and considerations around when to choose which scenario:

Scenario 1: DLT + Power BI Direct Query and Auto Refresh

Scenario 2: Structured Streaming + Power BI streaming datasets

Scenario 3: DLT + Power BI composite datasets

Talk by: Liping Huang and Marius Panga

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc