Databricks

Lineage System Table in Unity Catalog

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Menglei Sun

API Data Governance Data Lakehouse Delta Python Scala SQL

Unity Catalog provides fully automated data lineage for all workloads in SQL, R, Python, Scala and across all asset types at Databricks. The aggregated view has been available to end users through data explorer and API. In this session, we are excited to share that lineage is available via delta table in their UC metastore. It stores full history of recent lineage records and it is near real time. Additionally, customers can query it through standard SQL interface. With that, customers can get significant operational insights about their workload for impact analysis, troubleshooting, quality assurance, data discovery, and data governance.

Together with the system table platform effort, which provides query history, job run operational data, audit logs and more, lineage table will be a critical piece to link all the data asset and entity asset together, providing better lakehouse observability and unification to customers.

Talk by: Menglei Sun

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Processing Prescriptions at Scale at Walgreens

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Daniel Zafar

API Cosmos Data Engineering Data Lakehouse Microsoft Spark Data Streaming

We designed a scalable Spark Streaming job to manage 100s of millions of prescription-related operations per day at an end-to-end SLA of a few minutes and a lookup time of one second using CosmosDB.

In this session, we will share not only the architecture, but the challenges and solutions to using the Spark Cosmos connector at scale. We will discuss usages of the Aggregator API, custom implementations of the CosmosDB connector, and the major roadblocks we encountered with the solutions we engineered. In addition, we collaborated closely with Cosmos development team at Microsoft and will share the new features which resulted. If you ever plan to use Spark with Cosmos, you won't want to miss these gotchas!

Talk by: Daniel Zafar

Here’s more to explore: Big Book of Data Engineering: 2nd Edition: https://dbricks.co/3XpPgNV The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Rapidly Scaling Applied AI/ML with Foundational Models and Applying Them to Modern AI/ML Use Cases

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Nick King (Snowplow)

AI/ML LLM

Today many of us are familiar with foundational models such as LLM/ChatGPT. However, there are many more enterprise foundational models that can be rapidly deployed, trained and applied to enterprise use cases. This approach dramatically increases the performance of AI/ML models in production, but also gives AI teams rapid roadmaps for efficiency and delivering value to the business. Databricks provides the ideal toolset to enable this approach.

In this session, we will provide a logically overview of foundational models available today, demonstrate a real-world use case, and provide a business framework for data scientists and business leaders to collaborate to rapidly deploy these use cases.

Talk by: Nick King

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin

Real-Time Streaming Solution for Call Center Analytics: Business Challenges and Technical Enablement

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Natalia Demidova

Analytics Azure BI Cloud Computing Data Lake Data Lakehouse Power BI Data Streaming

A large international client with a business footprint in North America, Europe and Africa reached out to us with an interest in having a real-time streaming solution designed and implemented for its call center handling incoming and outgoing client calls. The client had a previous bad experience with another vendor, who overpromised and underdelivered on the latency of the streaming solution. The previous vendor delivered an over-complex streaming data pipeline resulting in the data taking over five minutes to reach a visualization layer. The client felt that architecture was too complex and involved many services integrated together.

Our immediate challenges involved gaining the client's trust and proving that our design and implementation quality would supersede a previous experience. To resolve an immediate challenge of the overly complicated pipeline design, we deployed a Databricks Lakehouse architecture with Azure Databricks at the center of the solution. Our reference architecture integrated Genesys Cloud : App Services : Event Hub : Databricks : : Data Lake : Power BI.

The streaming solution proved to be low latency (seconds) during the POV stage, which led to subsequent productionalization of the pipeline with deployment of jobs, DLTs pipeline, including multi-notebook workflow and business and performance metrics dashboarding relied on by the call center staff for a day-to-day performance monitoring and improvements.

Talk by: Natalia Demidova

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: Accenture | Factory of the Future: Building Digital Twins Using Knowledge Graphs & Gen AI

Streaming Data Analytics with Power BI and Databricks

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Liping Huang , Marius Panga

Analytics BI Data Analytics Power BI Data Streaming

This session is comprised of a series of end-to-end technical demos illustrating the synergy between Databricks and Power BI for streaming use cases, and considerations around when to choose which scenario:

Scenario 1: DLT + Power BI Direct Query and Auto Refresh

Scenario 2: Structured Streaming + Power BI streaming datasets

Scenario 3: DLT + Power BI composite datasets

Talk by: Liping Huang and Marius Panga

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Testing Generative AI Models: What You Need to Know

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Yaron Singer

AI/ML AWS GenAI LLM MLOps Cyber Security Singer

Generative AI shows incredible promise for enterprise applications. The explosion of generative AI can be attributed to the convergence of several factors. Most significant is that the barrier to entry has dropped for AI application developers through customizable prompts (few-shot learning), enabling laypeople to generate high-quality content. The flexibility of models like ChatGPT and DALLE-2 have sparked curiosity and creativity about new applications that they can support. The number of tools will continue to grow in a manner similar to how AWS fueled app development. But excitement must be tampered by concerns about new risks imposed to business and society. Increased capability and adoption also increase risk exposure. As organizations explore creative boundaries of generative models, measures to reduce risk must be put in place. However, the enormous size of the input space and inherent complexity make this task more challenging than traditional ML models.

In this session, we summarize the new risks introduced by the new class of generative foundation models through several examples, and compare how these risks relate to the risks of mainstream discriminative models. Steps can be taken to reduce the operational risk, bias and fairness issues, and privacy and security of systems that leverage LLM for automation. We’ll explore model hallucinations, output evaluation, output bias, prompt injection, data leakage, stochasticity, and more. We’ll discuss some of the larger issues common to LLMs and show how to test for them. A comprehensive, test-based approach to generative AI development will help instill model integrity by proactively mitigating failure and the associated business risk.

Talk by: Yaron Singer

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Unleashing the Magic of Large Language Modeling with Dolly 2.0

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Gavita Regunath

AI/ML LLM MLOps

As the field of artificial intelligence continues to advance at an unprecedented pace, LLMs are becoming increasingly powerful and transformative. LLMs use deep learning techniques to analyze vast amounts of text data, and can generate language that is like human language. These models have been used for a wide range of applications, including language translation, chatbots, text summarization, and more.

Dolly 2.0 is the first open-source, instruction-following LLM that has been fine-tuned on a human-generated instruction dataset – with zero chance of copyright implications. This makes it an ideal tool for research and commercial use, and opens up new possibilities for businesses looking to streamline their operations and enhance their customer service offerings.

In this session, we will provide an overview of Dolly 2.0, discuss its features and capabilities, and showcase its potential through a demo of Dolly in action. Attendees will gain insights into the LLMs, and learn how to maximize the impact of this cutting-edge technology in their organizations. By the end of the session, attendees will have a deep understanding of the capabilities of Dolly 2.0, and will be equipped with the knowledge they need to integrate LLMs into their own operations in order to achieve greater efficiency, productivity, and customer satisfaction.

Talk by: Gavita Regunath

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Weaving the Data Mesh in the Department of Defense

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Brad Corwin , Cody Ferguson

AI/ML Analytics Data Lakehouse

The Chief Digital and AI Office (CDAO) was created to lead the strategy and policy on data, analytics, and AI adoption across the Department of Defense. To enable that vision, the Department must achieve new ways to scale and standardize delivery under a global strategy while enabling decentralized workflows that capture the wealth of data and domain expertise.

CDAO’s strategy and goals are aligned with data mesh principles. This alignment starts with providing enterprise-level infrastructure and services to advance the adoption of data, analytics, and AI, creating the self-service data infrastructure as a platform. And it continues through implementing policy for federated computational governance centered around decentralizing data ownership to become domain-oriented but enforcing the quality and trustworthiness of data. CDAO seeks to expand and make enterprise data more accessible through providing data as a product and leveraging a federated data catalog to designate authoritative data and common data models. This results in domain-oriented, decentralized data ownership to empower the business domains across the Department to increase mission and business impact that result in significant cost savings, saving lives, and data serving as a “public good.”

Please join us in our session as we discuss how the CDAO leverages modern, innovative implementations that accelerate the delivery of data and AI throughout one of the largest distributed organizations in the world; the Department of Defense. We will walk through how this enables delivery in various Department of Defense use cases.

Talk by: Brad Corwin and Cody Ferguson

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Delta Sharing: The Key Data Mesh Enabler

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Francesco Pizzolon

Azure Cloud Computing Delta

Data Mesh is an emerging architecture pattern that challenges the centralized data platform approach by empowering different engineering teams to own the data products in a specific business domain. One of the keys to the success of any Data Mesh initiative is selecting the right protocol for Data Sharing between different business data domains that could potentially be implemented through different technologies and cloud providers.

In this session you will learn about how the Delta Sharing protocol and the Delta table format have enabled the historically stuck-in-the-past energy and construction industry to be catapulted to the 21st century by way of a modern Data Mesh implementation based on Azure Databricks.

Talk by: Francesco Pizzolon

Here’s more to explore: A New Approach to Data Sharing: https://dbricks.co/44eUnT1

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

How Mars Achieved a People Analytics Transformation with a Modern Data Stack

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Rachel Belino , Sreeharsha Alagani

AI/ML Analytics Data Governance Data Lakehouse Modern Data Stack

People Analytics at Mars was formed two years ago as part of an ambitious journey to transform our HR analytics capabilities. To transform, we needed to build foundational services to provide our associates with helpful insights through fast results and resolving complex problems. Critical in that foundation are data governance and data enablement which is the responsibility of the Mars People Data Office team whose focus is to deliver high quality and reliable data that is reusable for current and future People Analytics use cases. Come learn how this team used Databricks in helping Mars achieve its People Analytics Transformation.

Talk by: Rachel Belino and Sreeharsha Alagani

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Simplifying Migrations to Lakehouse

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Dan Smith

Data Lakehouse

This session will cover:

Challenges with legacy platforms
Perenti Databricks migration journey
Reimagining migrations the Databricks way
The Databricks migration methodology and approach

Talk by: Dan Smith

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Unlocking the Value of Data Sharing in Financial Services with Lakehouse

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Spencer Cook (Databricks)

Data Lakehouse Delta Data Streaming

The emergence of secure data sharing is already having a tremendous economic impact, in large part due to the increasing ease and safety of sharing financial data. McKinsey predicts that the impact of open financial data will be 1-4.5% of GDP globally by 2030. This indicates there is a narrowing window on a massive opportunity for financial institutions and it is critical that they prioritize data sharing. This session will first address the ways in which Delta Sharing and Unity Catalog on a Databricks Lakehouse architecture provides a simple and open framework for building a Secure Data Sharing platform in the financial services industry. Next we will use a Databricks environment to walk through different use cases for open banking data and secure data sharing, demonstrating how they will be implemented using Delta Sharing, Unity Catalog, and other parts of the Lakehouse platform. The use cases will include examples of new product features such as Databricks to Databricks sharing, change data feed and streaming on Delta Sharing, table/column lineage, and the Delta Sharing Excel plugin to demonstrate state of the art sharing capabilities.

In this session, we will discuss secure data sharing on Databricks Lakehouse and will demonstrate architecture and code for common sharing use cases in the finance industry.

Talk by: Spencer Cook

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Feeding the World One Plant at a Time

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Naveed Farooqui , Fahad Khan (Volt Active Data)

AI/ML CI/CD Data Management Delta LLM MLOps

Join this session to learn how the CVML and Data Platform team at BlueRiver Technology utilized Databricks to maximize savings on herbicide usage and revolutionize Precision Agriculture.

Blue River Technology is an agricultural technology company that uses computer vision and machine learning (CVML) to revolutionize the way crops are grown and harvested. BRT’s See & Spray technology, which uses CVML to identify and precisely determine whether the plant is a weed or a crop so it can deliver a small, targeted dose of herbicide directly to the plant, while leaving the crop unharmed. By using this approach, Blue River significantly reduces the amount of herbicides used in agriculture by over 70% and has a positive impact on the environment and human health.

The technical challenges we seek to overcome are: - Processing massive petabytes of proprietary data at scale and in real time. Equipment in the field can generate up to 40TBs of data per hour per machine. - Aggregating, curating and visualizing at scale data can often be convoluted, error-prone and complex. - Streamlining pipelines runs from weeks to hours to ensure continuous delivery of data. - Abstracting and automating the infra, deployment and data management from each program. - Building downstream data products based on descriptive analysis, predictive analysis or prescriptive analysis to drive the machine behavior.

The business questions we seek to answer for any machine are: - Are we getting the spray savings we anticipated? - Are we reducing the use of herbicide at the scale we expected? - Are spraying nozzles performing at the expected rate? - Finding the relevant data to troubleshoot new edge conditions. - Providing a simple interface for data exploration to both technical and non-technical personas to help improve our model. - Identifying repetitive and new faults in our machines. - Filtering out data based on certain incidents. - Identifying anomalies for e.g. sudden drop in spray saving, like frequency of broad spray suddenly is too high.

How we are addressing and plan to address these challenges: - Designating Databricks as our purposeful DB for all data - using the bronze, silver and gold layer standards. - Processing new machine logs using a Delta Live table as a source both in batch and incremental manner. - Democratize access for data scientists, product managers, data engineers who are not proficient with the robotic software stack via notebooks for quick development as well as real time dashboards.

Talk by: Fahad Khan and Naveed Farooqui

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin

Sponsored: Kyvos | Analytics 100x Faster Lowest Cost w/ Kyvos & Databricks, Even on Trillions Rows

Activate Your Lakehouse with Unity Catalog

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Anup Segu (YipitData)

Analytics AWS Data Lakehouse Delta ETL/ELT Cyber Security

Building a lakehouse is straightforward today thanks to many open source technologies and Databricks. However, it can be taxing to extract value from lakehouses as they grow without robust data operations. Join us to learn how YipitData uses the Unity Catalog to streamline data operations and discover best practices to scale your own Lakehouse. At YipitData, our 15+ petabyte Lakehouse is a self-service data platform built with Databricks and AWS, supporting analytics for a data team of over 250. We will share how leveraging Unity Catalog accelerates our mission to help financial institutions and corporations leverage alternative data by:

Enabling clients to universally access our data through a spectrum of channels, including Sigma, Delta Sharing, and multiple clouds
Fostering collaboration across internal teams using a data mesh paradigm that yields rich insights
Strengthening the integrity and security of data assets through ACLs, data lineage, audit logs, and further isolation of AWS resources
Reducing the cost of large tables without downtime through automated data expiration and ETL optimizations on managed delta tables

Through our migration to Unity Catalog, we have gained tactics and philosophies to seamlessly flow our data assets internally and externally. Data platforms need to be value-generating, secure, and cost-effective in today's world. We are excited to share how Unity Catalog delivers on this and helps you get the most out of your lakehouse.

Talk by: Anup Segu

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

talk-data.com

Activity Trend

Top Events

Top Speakers

Lineage System Table in Unity Catalog

Processing Prescriptions at Scale at Walgreens

Rapidly Scaling Applied AI/ML with Foundational Models and Applying Them to Modern AI/ML Use Cases

Real-Time Streaming Solution for Call Center Analytics: Business Challenges and Technical Enablement

Sponsored: Accenture | Factory of the Future: Building Digital Twins Using Knowledge Graphs & Gen AI

Sponsored: Anomalo | Data Archaeology: Quickly Understand Unfamiliar Datasets Using Machine Learning

Sponsored: AWS-Real Time Stream Data & Vis Using Databricks DLT, Amazon Kinesis, & Amazon QuickSight

Sponsored: dbt Labs | Modernizing the Data Stack: Lessons Learned From Evolution at Zurich Insurance

Sponsored: Matillion - OurFamilyWizard Moves and Transforms Data for Databricks Delta Lake Easy

Streaming Data Analytics with Power BI and Databricks

Testing Generative AI Models: What You Need to Know

Unleashing the Magic of Large Language Modeling with Dolly 2.0

Weaving the Data Mesh in the Department of Defense

Delta Sharing: The Key Data Mesh Enabler

How Mars Achieved a People Analytics Transformation with a Modern Data Stack

Simplifying Migrations to Lakehouse

Unlocking the Value of Data Sharing in Financial Services with Lakehouse

Feeding the World One Plant at a Time

Sponsored: Kyvos | Analytics 100x Faster Lowest Cost w/ Kyvos & Databricks, Even on Trillions Rows

Activate Your Lakehouse with Unity Catalog