talk-data.com talk-data.com

Event

Databricks DATA + AI Summit 2023

2026-01-11 YouTube Visit website ↗

Activities tracked

287

Filtering by: AI/ML ×

Sessions & talks

Showing 26–50 of 287 · Newest first

Search within this event →
Data + AI Summit 2024 - Keynote Day 2 - Full

Data + AI Summit 2024 - Keynote Day 2 - Full

2024-06-14 Watch
video
Bilal Aslam (Databricks) , Yejin Choi (University of Washington; AI2) , Darshana Sivakumar (Databricks) , Ryan Blue (Tabular) , Zeashan Pappa (Databricks) , Ali Ghodsi (Databricks) , Reynold Xin (Databricks) , Matei Zaharia (Databricks) , Hannes Mühleisen (DuckDB Labs) , Alexander Booth (Texas Rangers Baseball Club) , Tareef Kawaf (Posit Sofware, PBC)

Speakers: - Alexander Booth, Asst Director of Research & Development, Texas Rangers - Ali Ghodsi, Co-Founder and CEO, Databricks - Bilal Aslam, Sr. Director of Product Management, Databricks - Darshana Sivakumar, Staff Product Manager, Databricks - Hannes Mühleisen, Creator of DuckDB, DuckDB Labs - Matei Zaharia, Chief Technology Officer and Co-Founder, Databricks - Reynold Xin, Chief Architect and Co-Founder, Databricks - Ryan Blue, CEO, Tabular - Tareef Kawaf, President, Posit Software, PBC - Yejin Choi, Sr Research Director Commonsense AI, AI2, University of Washington - Zeashan Pappa, Staff Product Manager, Databricks

About Databricks Databricks is the Data and AI company. More than 10,000 organizations worldwide — including Block, Comcast, Conde Nast, Rivian, and Shell, and over 60% of the Fortune 500 — rely on the Databricks Data Intelligence Platform to take control of their data and put it to work with AI. Databricks is headquartered in San Francisco, with offices around the globe, and was founded by the original creators of Lakehouse, Apache Spark™, Delta Lake and MLflow.

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data… Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Data + AI Summit Keynote Day 1 - Full

Data + AI Summit Keynote Day 1 - Full

2024-06-14 Watch
video
Patrick Wendall (Databricks) , Fei-Fei Li (Stanford University) , Brian Ames (General Motors) , Ken Wong (Databricks) , Ali Ghodsi (Databricks) , Jackie Brosamer (Block) , Reynold Xin (Databricks) , Jensen Huang (NVIDIA)

Databricks Data + AI Summit 2024 Keynote Day 1

Experts, researchers and open source contributors — from Databricks and across the data and AI community gathered in San Francisco June 10 - 13, 2024 to discuss the latest technologies in data management, data warehousing, data governance, generative AI for the enterprise, and data in the era of AI.

Hear from Databricks Co-founder and CEO Ali Ghodsi on building generative AI applications, putting your data to work, and how data + AI leads to data intelligence.

Plus a fireside chat between Ali Ghodsi and Nvidia Co-founder and CEO, Jensen Huang, on the expanded partnership between Nvidia and Databricks to accelerate enterprise data for the era of generative AI

Product announcements in the video include: - Databricks Data Intelligence Platform - Native support for NVIDIA GPU acceleration on the Databricks Data Intelligence Platform - Databricks open source model DBRX available as an NVIDIA NIM microservice - Shutterstock Image AI powered by Databricks - Databricks AI/BI - Databricks LakeFlow - Databricks Mosaic AI - Mosaic AI Agent Framework - Mosaic AI Agent Evaluation - Mosaic AI Tools Catalog - Mosaic AI Model Training - Mosaic AI Gateway

In this keynote hear from: - Ali Ghodsi, Co-founder and CEO, Databricks (1:45) - Brian Ames, General Motors (29:55) - Patrick Wendall, Co-founder and VP of Engineering, Databricks (38:00) - Jackie Brosamer, Head of AI, Data and Analytics, Block (1:14:42) - Fei Fei Li, Professor, Stanford University and Denning Co-Director, Stanford Institute for Human-Centered AI (1:23:15) - Jensen Huang, Co-founder and CEO of NVIDIA with Ali Ghodsi, Co-founder and CEO of Databricks (1:42:27) - Reynold Xin, Co-founder and Chief Architect, Databricks (2:07:43) - Ken Wong, Senior Director, Product Management, Databricks (2:31:15) - Ali Ghodsi, Co-founder and CEO, Databricks (2:48:16)

Data + AI Summit Keynote Day 1 - Ali Ghodsi, Co-founder and CEO of Databricks

Data + AI Summit Keynote Day 1 - Ali Ghodsi, Co-founder and CEO of Databricks

2024-06-12 Watch
video
Ali Ghodsi (Databricks)

Ali Ghodsi spoke to an audience of thousands at the Data + AI Summit keynote in San Francisco on the three biggest challenges in enterprise GenAI today and how the Databricks Data Intelligence platform is helping organizations solve them.

Data Warehousing using Fivetran, dbt and DBSQL

Data Warehousing using Fivetran, dbt and DBSQL

2023-08-03 Watch
video

In this video you will learn how to use Fivetran to ingest data from Salesforce into your Lakehouse. After the data has been ingested, you will then learn how you can transform your data using dbt. Then we will use Databricks SQL to query, visualize and govern your data. Lastly, we will show you how you can use AI functions in Databricks SQL to call language learning models.

Read more about Databricks SQL https://docs.databricks.com/en/sql/index.html#what-is-databricks-sql

Distributing Data Governance: How Unity Catalog Allows for a Collaborative Approach

Distributing Data Governance: How Unity Catalog Allows for a Collaborative Approach

2023-08-01 Watch
video

As one of the world’s largest providers of content delivery network (CDN) and security solutions, Akamai owns thousands of data assets of various shapes and sizes, some even go up to multiple PBs. Several departments within the company leverage Databricks for their data and AI workloads, which means we have over a hundred Databricks workspaces within a single Databricks account, where some of the assets are shared across products, and some are product-specific.

In this presentation, we will describe how to use the capabilities of Unity Catalog to distribute the administration burden between departments, while still maintaining a unified governance model.

We will also share the benefits we’ve found in using Unity Catalog, beyond just access management, such as:

  • Visibility into which data assets we have in the organization
  • Ability to identify and potentially eliminate duplicate data workloads between departments
  • Removing boilerplate code for accessing external sources
  • Increasing innovation of product teams by exposing the data assets in a better, more efficient way

Talk by: Gilad Asulin and Pulkit Chadha

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Cross-Platform Data Lineage with OpenLineage

Cross-Platform Data Lineage with OpenLineage

2023-07-28 Watch
video
Willy Lulciuc , Julien Le Dem (Astronomer)

There are more data tools available than ever before, and it is easier to build a pipeline than it has ever been. These tools and advancements have created an explosion of innovation, resulting in data within today's organizations becoming increasingly distributed and can't be contained within a single brain, a single team, or a single platform. Data lineage can help by tracing the relationships between datasets and providing a map of your entire data universe.

OpenLineage provides a standard for lineage collection that spans multiple platforms, including Apache Airflow, Apache Spark™, Flink®, and dbt. This empowers teams to diagnose and address widespread data quality and efficiency issues in real time. In this session, we will show how to trace data lineage across Apache Spark and Apache Airflow. There will be a walk-through of the OpenLineage architecture and a live demo of a running pipeline with real-time data lineage.

Talk by: Julien Le Dem,Willy Lulciuc

Here’s more to explore: Data, Analytics, and AI Governance: https://dbricks.co/44gu3YU

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: EY | Business Value Unleashed: Real-World Accelerating AI & Data-Centric Transformation

Sponsored: EY | Business Value Unleashed: Real-World Accelerating AI & Data-Centric Transformation

2023-07-28 Watch
video

Data and AI are revolutionizing industries and transforming businesses at an unprecedented pace. These advancements pave the way for groundbreaking outcomes such as fresh revenue streams, optimized working capital, and captivating, personalized customer experiences.

Join Hugh Burgin, Luke Pritchard and Dan Diasio as we explore a range of real-world examples of AI and data-driven transformation opportunities being powered by Databricks, including business value realized and technical solutions implemented. We will focus on how to integrate and leverage business insights, a diverse network of cloud-based solutions and Databricks to unleash new business value opportunities. By highlighting real-world use cases we will discuss:

  • Examples of how Manufacturing, Retail, Financial Services and other sectors are using Databricks services to scale AI, gain insights that matter and secure their data
  • The ways data monetization are changing how companies view data and incentivizing better data management
  • Examples of Generative AI and LLMs changing how businesses operate, how their customers engage, and what you can do about it

Talk by: Hugh Burgin and Luke Pritchard

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin

AI Regulation is Coming: The EU AI Act and How Databricks Can Help with Compliance

AI Regulation is Coming: The EU AI Act and How Databricks Can Help with Compliance

2023-07-27 Watch
video
Matteo Quattrocchi (BSA | The Software Alliance) , Scott Starbird (Databricks)

With the heightened attention on LLMs and what they can do, and the widening impact of AI on day-to-day life, the push by regulators across the globe to regulate AI is intensifying. As with GDPR in the privacy realm, the EU is leading the way with the EU Artificial Intelligence Act (AIA). Regulators everywhere will be looking to the AIA as precedent, and understanding the requirements imposed by the AIA is important for all players in the AI channel. Although not finalized, the basic framework regarding how the AIA will work is becoming clearer. The impact on developers and deployers of AI (‘providers’ and ‘users’ under the AIA) will be substantial. Although the AIA will probably not go into effect until early 2025, AI applications developed today will likely be affected, and design and development decisions made now should take the future regulations into account. In this session, we Matteo Quattrocchi, Brussels-based Director, Policy – EMEA, for BSA (the Software Alliance – the leading advocacy organization representing the enterprise software sector), will present an overview of the current proposed requirements under the AIA and give an update on the ongoing deliberations and likely timing for enactment. We will also highlight some of the ways the Lakehouse platform, including Managed MLflow, can help providers and users of ML-based applications meet the requirements of the AIA and other upcoming AI regulations.

Talk by: Matteo Quattrocchi and Scott Starbird

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Data Architecture: IQVIA's Migration to Databricks Lakehouse for High-Performance Analytics

Data Architecture: IQVIA's Migration to Databricks Lakehouse for High-Performance Analytics

2023-07-27 Watch
video

As the healthcare and life science (HLS) industry has grown and evolved, a need has emerged for scalable and cost-effective ETL solutions capable of processing billions of records at terabyte scale. IQVIA has the largest global healthcare data networks in the world, with over one million data sources providing access to 1.2B non-identified patient records and 100 billion healthcare records processed annually in over 100 countries. IQVIA’s ability to combine, centralize, and integrate various sources of HLS data enables clinical-to-commercial operational intelligence and omnichannel analytics for its clients. Databricks Lakehouse allows IQVIA to onboard the rapidly growing number of clients while delivering strong business value to customers, cost-efficiently and at scale. 

During this session, you will learn more about how IQVIA is leveraging Databricks Lakehouse as well as how HLS organizations can soon access IQVIA data assets though the Databricks Marketplace for quick and secure data sharing.

Talk by: Venkat Dasari and William Zanine

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Data Caching Strategies for Data Analytics and AI

Data Caching Strategies for Data Analytics and AI

2023-07-27 Watch
video

he increasing popularity of data analytics and artificial intelligence (AI) has led to a dramatic increase in the volume of data being used in these fields, creating a growing need for an enhanced computational capability. Cache plays a crucial role as an accelerator for data and AI computations, but it is important to note that these domains have different data access patterns, requiring different cache strategies. In this session, you will see our observations on data access patterns in the analytical SQL and AI training domains based on practical experience with large-scale systems. We will discuss the evaluation results of various caching strategies for analytical SQL and AI and provide caching recommendations for different use cases. Over the years, we have learned some best practices from big internet companies about the following aspects of our journey:

  1. Traffic pattern for analytical SQL and cache strategy recommendation
  2. Traffic pattern for AI training and how we can measure the cache efficiency for different AI training process
  3. Cache capacity planning based on real-time metrics of the working set
  4. Adaptive caching admission and eviction for uncertain traffic patterns

Talk by: Chunxu Tang and Beinan Wang

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp Databricks named a Leader in 2022 Gartner® Magic QuadrantTM CDBMS: https://dbricks.co/3phw20d

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Democratize AI & ML in a Large Company: The Importance of User Enablement & Technical Training

Democratize AI & ML in a Large Company: The Importance of User Enablement & Technical Training

2023-07-27 Watch
video

The biggest critical factor to success in a cloud transformation is people. As such, having a change management process in place to manage the impact of the transformation and user enablement is foundational to any large program. In this session, we will dive into how TD bank democratizes data, mobilizes a community of over 2000 analytics users and the tactics we used to successfully enable new use cases on Cloud. The session will focus on the following:

To democratize data: - Centralize a data platform that is accessible to all employees and allow for easy data sharing - Implement privacy and security to protect data and use data ethically - Compliance and governance for using data in responsible and compliant way - Simplification of processes and procedures to reduce redundancy and faster adoption

To mobilize end users: - Increase data literacy: provide training and resources for employees to increase their abilities and skills - Foster a culture of collaboration and openness: cross-functional teams to collaborate and share ideas - Encourage exploration of innovative ideas that impact the organization's values and customers technical enablement and adoption tactics we've used at TD Bank:

  1. Hands-on training for over 1300+ analytics users with emphasis on learn by doing, to relate to real-life situations
  2. Online tutorials and documentations to be used as self-paced study
  3. Workshops and office hours on specific topics to empower business users
  4. Coaching to work with teams on a specific use case/complex issue and provide recommendations for a faster, cost effective solutions
  5. Offer certification and encourage continuous education for employees to keep up to date with latest
  6. Feedback loop: get user feedback on training and user experience to improve future trainings

Talk by: Ellie Hajarian

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Five Things You Didn't Know You Could Do with Databricks Workflows

Five Things You Didn't Know You Could Do with Databricks Workflows

2023-07-27 Watch
video
Prashanth Babu (Databricks)

Databricks workflows has come a long way since the initial days of orchestrating simple notebooks and jar/wheel files. Now we can orchestrate multi-task jobs and create a chain of tasks with lineage and DAG with either fan-in or fan-out among multiple other patterns or even run another Databricks job directly inside another job.

Databricks workflows takes its tag: “orchestrate anything anywhere” pretty seriously and is a truly fully-managed, cloud-native orchestrator to orchestrate diverse workloads like Delta Live Tables, SQL, Notebooks, Jars, Python Wheels, dbt, SQL, Apache Spark™, ML pipelines with excellent monitoring, alerting and observability capabilities as well. Basically, it is a one-stop product for all orchestration needs for an efficient lakehouse. And what is even better is, it gives full flexibility of running your jobs in a cloud-agnostic and cloud-independent way and is available across AWS, Azure and GCP.

In this session, we will discuss and deep dive on some of the very interesting features and will showcase end-to-end demos of the features which will allow you to take full advantage of Databricks workflows for orchestrating the lakehouse.

Talk by: Prashanth Babu

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Improving Hospital Operations with Streaming Data and Real Time AI/ML

Improving Hospital Operations with Streaming Data and Real Time AI/ML

2023-07-27 Watch
video

Over the past two years, Providence has developed a robust streaming data platform (SDP) leveraging Databricks in Azure. The SDP enables us to ingest and process real-time data reflecting clinical operations across our 52 hospitals and roughly 1000 ambulatory clinics. The HL7 messages generated by Epic are parsed using Databricks in our secure cloud environment and used to generate an up-to-the minute picture of exactly what is happening at the point of care.

We are already leveraging this information to minimize hospital overcrowding and have been actively integrating AI/ML to accurately forecast future conditions (e.g., arrivals, length of stay, acuity, and discharge requirements.) This allows us to both improve resource utilization (e.g., nurse staffing levels) and to optimize patient throughput. The result is both improved patient care and operational efficiency.

In this session, we will share how these outcomes are only possible with the power and elegance afforded by our investments in Azure, Databricks, and increasingly Lakehouse. We will demonstrate Providence's blueprint for enabling real-time analytics which can be generalized to other healthcare providers.

Talk by: Lindsay Mico and Deylo Woo

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Managing Data Encryption in Apache Spark™

Managing Data Encryption in Apache Spark™

2023-07-27 Watch
video

Sensitive data sets can be encrypted directly by new Apache Spark™ versions (3.2 and higher). Setting several configuration parameters and DataFrame options will trigger the Apache Parquet modular encryption mechanism that protects select columns with column-specific keys. The upcoming Spark 3.4 version will also support uniform encryption, where all DataFrame columns are encrypted with the same key.

Spark data encryption is already leveraged by a number of companies to protect personal or business confidential data in their production environments. The main integration effort is focused on key access control and on building a Spark/Parquet plug-in code that can interact with company’s key management service (KMS).

In this session, we will briefly cover the basics of Spark/Parquet encryption usage, and dive into the details of encryption key management that will help in integrating this Spark data protection mechanism in your deployment. You will learn how to run a HelloWorld encryption sample, and how to extend it into a real world production code integrated with your organization’s KMS and access control policies. We will talk about the standard envelope encryption approach to big data protection, the performance-vs-security trade-offs between single and double envelope wrapping, internal and external key metadata storage. We will see a demo, and discuss the new features such as uniform encryption and two-tier management of encryption keys.

Talk by: Gidon Gershinsky

Here’s more to explore: Data, Analytics, and AI Governance: https://dbricks.co/44gu3YU

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Multicloud Data Governance on the Databricks Lakehouse

Multicloud Data Governance on the Databricks Lakehouse

2023-07-27 Watch
video

Across industries, a multicloud setup has quickly become the reality for large organizations. Multi-cloud introduces new governance challenges as permissions models often do not translate from one cloud to the other and if they do, are insufficiently granular to accommodate privacy requirements and principles of least privilege. This problem can be especially acute for data and AI workloads that rely on sharing and aggregating large and diverse data sources across business unit boundaries and where governance models need to incorporate assets such as table rows/columns and ML features and models.

In this session, we will provide guidelines on how best to overcome these challenges for companies that have adopted the Databricks Lakehouse as their collaborative space for data teams across the organization, by exploiting some of the unique product features of the Databricks platform. We will focus on a common scenario: a data platform team providing data assets to two different ML teams, one using the same cloud and the other one using a different cloud.

We will explain the step-by-step setup of a unified governance model by leveraging the following components and conventions:

  • Unity Catalog for implementing fine-grained access control across all data assets: files in cloud storage, rows and columns in tables and ML features and models
  • The Databricks Terraform provider to automatically enforce guardrails and permissions across clouds
  • Account level SSO Integration and identity federation to centralize administer access across workspaces
  • Delta sharing to seamlessly propagate changes in provider data sets to consumers in near real-time
  • Centralized audit logging for a unified view on what asset was accessed by whom

Talk by: Ioannis Papadopoulos and Volker Tjaden

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Real-Time ML in Marketplace at Lyft

Real-Time ML in Marketplace at Lyft

2023-07-27 Watch
video

Lyft is a ride-sharing company which is a two-sided marketplace; balancing supply and demand using various levers (passenger pricing, driver incentive etc.) to maintain an efficient system. Lyft has built a real-time optimization platform that helps to build the product faster. This complex system makes real-time decisions using various data sources; machine learning models; and a streaming infrastructure for low latency, reliability and scalability. This infrastructure consumes a massive number of events from different sources to make real-time product decisions.

In this session, we will discuss how Lyft organically evolved and scaled the streaming platform that provides a consistent view of the marketplace to aid an individual team independently run their optimization. The platform offers online and offline feature access that helps teams to back test their model in the future. It provides various other powerful capabilities such as replaying the production ML feature in PyNotebook, feature validation, near real-time model training, executing multi-layer of models in a DAG, etc. The speaker will elaborate things that helped him scale the systems to process millions of events per minute and power T0 products with tighter latency SLA.

Sponsored by: Labelbox | Unlocking Enterprise AI with Your Proprietary Data and Foundation Models

Sponsored by: Labelbox | Unlocking Enterprise AI with Your Proprietary Data and Foundation Models

2023-07-27 Watch
video
Manu Sharma (Labelbox)

We are starting to see a paradigm shift in how AI systems are built across enterprises. In 2023 and beyond, this shift is being propelled by the era of foundation models. Foundation models can be seen as the next evolution in using "pre-trained" models and transfer learning. In order to fully leverage these breakthrough models, we’ve seen a common formula for success: leading AI teams within enterprises need to be able successfully harness their own store of unstructured data and pair this with the right model in order to ship intelligent applications that deliver next-generation experiences to their customers.

In this session you will learn how to incorporate foundation models into your data and machine learning workflows so that anyone can build AI faster and, in many cases, get the business outcome without needing to build AI models altogether. Which foundation AI models can be used to pre-label / enrich data and what specific data pipeline (data engine) will enable this? Real-world use cases of when to incorporate large language models and fine-tuning to improve machine learning models in real-time. Discover the power of leveraging both Labelbox and Databricks to streamline this data management and model deployment process.

Talk by: Manu Sharma

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: Impetus | Accelerating ADP’s Business Transformation w/ a Modern Enterprise Data Platform

Sponsored: Impetus | Accelerating ADP’s Business Transformation w/ a Modern Enterprise Data Platform

2023-07-27 Watch
video

Learn How ADP’s Enterprise Data Platform Is used to drive direct monetization opportunities, differentiate its solutions, and improve operations. ADP is continuously searching for ways to increase innovation velocity, time-to-market, and improve the overall enterprise efficiency. Making data and tools available to teams across the enterprise while reducing data governance risk is the key to making progress on all fronts. Learn about ADP’s enterprise data platform that created a single source of truth with centralized tools, data assets, and services. It allowed teams to innovate and gain insights by leveraging cross-enterprise data and central machine learning operations.

Explore how ADP accelerated creation of the data platform on Databricks and AWS, achieve faster business outcomes, and improve overall business operations. The session will also cover how ADP significantly reduced its data governance risk, elevated the brand by amplifying data and insights as a differentiator, increased data monetization, and leveraged data to drive human capital management differentiation.

Talk by: Chetan Kalanki and Zaf Babin

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: Lightup Data | How McDonald's Leveraged Lightup Data Quality

Sponsored: Lightup Data | How McDonald's Leveraged Lightup Data Quality

2023-07-27 Watch
video
Manu Bansal , Matt Sandler (McDonald’s)

As one of the world's largest fast-food chains, McDonald's manages massive amounts of data for customers, sales, inventory, marketing, and more. And at that scale, ensuring the accuracy, reliability, and quality of all that data comes with a new set of complex challenges. Developing manual data quality checks with legacy tools was too time consuming and resource-intensive, requiring developer support and data domain expertise. Ultimately, they struggled to scale their checks across their enterprise data pipelines.

Join our featured customer session, where you’ll hear from Matt Sandler, Senior Director of Data and Analytics at McDonald’s, about how they use the Lightup Deep Data Quality platform to deploy pushdown data quality checks in minutes, not months — without developer support. From reactive to proactive, the McDonald’s data team leverages Lightup to scale their data quality checks across petabytes of data, ensuring high-quality data and reliable analytics for their products and services. During the session, you’ll learn:

  • The key challenges of scaling Data Quality checks with legacy tools
  • Why fixing data quality (fast) was critical to launching their new loyalty program and personalized marketing initiatives
  • How quickly McDonald’s ramped up with Lightup, transforming their data quality struggles into success

After the session, you’ll understand:

  • Why McDonald’s phased out their legacy Data Quality tools
  • The benefits of using pushdown data quality checks, AI-powered anomaly detection, and incident alerts
  • Best practices for scaling data quality checks in your own organization

Talk by: Matt Sandler and Manu Bansal

Here’s more to explore: Data, Analytics, and AI Governance: https://dbricks.co/44gu3YU

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: Gathr | Achieve 50x Faster Outcomes From Data at Scale - Using ML-Powered, No-Code Apps

Sponsored: Gathr | Achieve 50x Faster Outcomes From Data at Scale - Using ML-Powered, No-Code Apps

2023-07-27 Watch
video

Data Engineers love data and business users need outcomes. How do we cross the chasm? While there is no dearth of data in today’s world, managing and analyzing large datasets can be daunting. Additionally, data may lose its value over time. It needs to be analyzed and acted upon quickly, to accelerate decision-making, and help realize business outcomes faster. 

Take a deep dive into the future of the data economy and learn how to drive 50 times faster time to value. Hear from United Airlines how they leveraged Gathr to process massive volumes of complex digital interactions and operational data, to create breakthroughs in operations and customer experience, in real time.

The session will feature a live-demo, showcasing how enterprises from across domains leverage Gathr’s machine learning powered zero-code applications for ingestion, ETL, ML, XOps, Cloud Cost Control, Business Process Automation, and more – to accelerate their journey from data to outcomes, like never before.

Talk by: Sameer Bhide and Sarang Bapat

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin

Learnings From the Field: Migration From Oracle DW and IBM DataStage to Databricks on AWS

Learnings From the Field: Migration From Oracle DW and IBM DataStage to Databricks on AWS

2023-07-26 Watch
video

Legacy data warehouses are costly to maintain, unscalable and cannot deliver on data science, ML and real-time analytics use cases. Migrating from your enterprise data warehouse to Databricks lets you scale as your business needs grow and accelerate innovation by running all your data, analytics and AI workloads on a single unified data platform.

In the first part of this session we will guide you through the well-designed process and tools that will help you from the assessment phase to the actual implementation of an EDW migration project. Also, we will address ways to convert PL/SQL proprietary code to an open standard python code and take advantage of PySpark for ETL workloads and Databricks SQL’s data analytics workload power.

The second part of this session will be based on an EDW migration project of SNCF (French national railways); one of the major enterprise customers of Databricks in France. Databricks partnered with SNCF to migrate its real estate entity from Oracle DW and IBM DataStage to Databricks on AWS. We will walk you through the customer context, urgency to migration, challenges, target architecture, nitty-gritty details of implementation, best practices, recommendations, and learnings in order to execute a successful migration project in a very accelerated time frame.

Talk by: Himanshu Arora and Amine Benhamza

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Data Sharing and Beyond with Delta Sharing

Data Sharing and Beyond with Delta Sharing

2023-07-26 Watch
video
Milos Colic (Databricks) , Vuong Nguyen

Stepping into this brave new digital world we are certain that data will be a central product for many organizations. The way to convey their knowledge and their assets will be through data and analytics. Delta Sharing was the world's first open protocol for secure and scalable real-time data sharing. Through our customer conversations, there is a lot of anticipation of how Delta Sharing can be extended to non-tabular assets, such as machine learning experiments and models.

In this session, we will cover how we extended the Delta Sharing protocol to other sharing workflows, enabling sharing of ML models, arbitrary files and more. The development resulted in Arcuate, a Databricks Labs project with a data sharing flavor. The session will start with the high-level approach and how it can be extended to cover other similar use cases. It will then move to our implementation and how it integrates seamlessly with Databricks-managed Delta Sharing server and notebooks. We finally conclude with lessons learned, and our visions for a future of data sharing and beyond

Talk by: Vuong Nguyen and Milos Colic

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Extending Lakehouse Architecture with Collaborative Identity

Extending Lakehouse Architecture with Collaborative Identity

2023-07-26 Watch
video
Erin Boelkens (LiveRamp) , Shawn Gilleran (LiveRamp)

Lakehouse architecture has become a valuable solution for unifying data processing for AI, but faces limitations in maximizing data’s full potential. Additional data infrastructure is helpful for strengthening data consolidation and data connectivity with third-party sources, which are necessary for building full data sets for accurate audience modeling. 

In this session, LiveRamp will demonstrate to data and analytics decision-makers how to build on the Lakehouse architecture with extensions for collaborative identity graph construction, including how to simplify and improve data enrichment, data activation, and data collaboration. LiveRamp will also introduce a complete data marketplace, which enables easy, pseudonymized data enhancements that widen the attribute set for better behavioral model construction.

With these techniques and technologies, enterprises across financial services, retail, media, travel, and more can safely unlock partner insights and ultimately produce more accurate inputs for personalization engines, and more engaging offers and recommendations for customers.

Talk by: Erin Boelkens and Shawn Gilleran

Here’s more to explore: A New Approach to Data Sharing: https://dbricks.co/44eUnT1

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

JetBlue’s Real-Time AI & ML Digital Twin Journey Using Databricks

JetBlue’s Real-Time AI & ML Digital Twin Journey Using Databricks

2023-07-26 Watch
video

JetBlue has embarked over the past year on an AI and ML transformation. Databricks has been instrumental in this transformation due to the ability to integrate streaming pipelines, ML training using MLflow, ML API serving using ML registry and more in one cohesive platform. Using real-time streams of weather, aircraft sensors, FAA data feeds, JetBlue operations and more are used for the world's first AI and ML operating system orchestrating a digital-twin, known as BlueSky for efficient and safe operations. JetBlue has over 10 ML products (multiple models each product) in production across multiple verticals including dynamic pricing, customer recommendation engines, supply chain optimization, customer sentiment NLP and several more.

The core JetBlue data science and analytics team consists of Operations Data Science, Commercial Data Science, AI and ML engineering and Business Intelligence. To facilitate the rapid growth and faster go-to-market strategy, the team has built an internal Data Catalog + AutoML + AutoDeploy wrapper called BlueML using Databricks features to empower data scientists including advanced analysts with the ability to train and deploy ML models in less than five lines of code.

Talk by: Derrick Olson and Rob Bajra

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Self-Service Data Analytics and Governance at Enterprise Scale with Unity Catalog

Self-Service Data Analytics and Governance at Enterprise Scale with Unity Catalog

2023-07-26 Watch
video

This session focuses on one of the first Unity Catalog implementations for a large-scale enterprise. In this scenario, a cloud scale analytics platform with 7500 active users based on the lakehouse approach is used. In addition, there is potential for 1500 further users who are subject to special governance rules. They are consuming more than 600 TB of data stored in Delta Lake - continuously growing at more than 1TB per day. This might grow due to local country data. Therefore, the existing data platform must be extended to enable users to combine global and local data from their countries. A new data management was required, which reflects the strict information security rules at a need to know base. Core requirements are: read only from global data, write into local and share the results.

Due to a very pronounced information security awareness and a lack of the technological possibilities it was not possible to interdisciplinary analyze and exchange data so easy or at all so far. Therefore, a lot of business potential and gains could not be identified and realized.

With the new developments in the technology used and the basis of the lakehouse approach, thanks to Unity Catalog, we were able to develop a solution that could meet high requirements for security and process. And enables globally secured interdisciplinary data exchange and analysis at scale. This solution enables the democratization of the data. This results not only in the ability to gain better insights for business management, but also to generate entirely new business cases or products that require a higher degree of data integration and encourage the culture to change. We highlight technical challenges and solutions, present best practices and point out benefits of implementing Unity catalog for enterprises.

Talk by: Artem Meshcheryakov and Pascal van Bellen

Here’s more to explore: Data, Analytics, and AI Governance: https://dbricks.co/44gu3YU

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc