Databricks DATA + AI Summit 2023

Migrate and Modernize your Data Platform with Confluent and Databricks

2022-07-19 Watch

video

Analytics Cloud Computing Databricks Netezza Oracle

Moving and building in the cloud to accelerate analytics development requires enterprises to rethink their data infrastructure. Whether you are moving from an on-prem legacy system or you were born in the cloud, businesses are turning to Confluent and Databricks to help them unlock new real-time customer experiences and intelligence for their backend operations.

Join us to see how Confluent and Databricks enable companies to set data in motion across any system, at any scale, in near real-time. Connecting Confluent with Databricks allows companies to migrate and connect data from on-prem databases and data warehouses like Netezza, Oracle, and Cloudera to Databricks in the cloud to power real-time analytics.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Operational Analytics: Expanding the Reach of Data in the Lakehouse Era

2022-07-19 Watch

video

Analytics Data Lake Data Lakehouse Databricks ETL/ELT

Organizations want data lakes to be the source of truth for analytics. But operational teams rarely recognize the power the data lake, shortening the reach of all the valuable data within it. Instead, these business users often treat operational tools like Salesforce, Marketo, and NetSuite as their source of truth.

The reality is lakehouses and operational tools alike have missed critical pieces of data and don’t provide the full customer picture. Operational Analytics solves this last mile problem by making it possible to sync transformed data directly from your data lake back into these systems to expand the reach of your data.

In this talk you’ll learn: - What Operational Analytics & Reverse ETL are and why they're taking off - How Operational Analytics helps companies today activate and expand the reach of their data - Real-life use cases from companies using Operational Analytics to empower their data teams & give them the seat at the table they deserve

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Power to the (SQL) People: Python UDFs in DBSQL

2022-07-19 Watch

video

Analytics BI Cloud Computing Data Lakehouse Databricks dbt

Databricks SQL (DB SQL) allows customers to leverage the simple and powerful Lakehouse architecture with up to 12x better price/performance compared to traditional cloud data warehouses. Analysts can use standard SQL to easily query data and share insights using a query editor, dashboards or a BI tool of their choice, and analytics engineers can build and maintain efficient data pipelines, including with tools like dbt.

While SQL is great at querying and transforming data, sometimes you need to extend its capabilities with the power of Python, a full programming language. Users of Databricks notebooks already enjoy seamlessly mixing SQL, Python and several other programming languages. Use cases include masking or encrypting and decrypting sensitive data, complex transformation logic, using popular open source libraries or simply reusing code that has already been written elsewhere in Databricks. In many cases, it is simply prohibitive or even impossible to rewrite the logic in SQL.

Up to now, there was no way to use Python from within DBSQL. We are removing this restriction with the introduction of Python User Defined Functions (UDFs). DBSQL users can now create, manage and use Python UDFs using standard SQL. UDFs are registered in Unity Catalog, which means they can be governed and used throughout Databricks, including in notebooks.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Accelerating the Pace of Autism Diagnosis with Machine Learning Models

2022-07-19 Watch

video

AI/ML Databricks

A formal autism diagnosis can be an inefficient and lengthy process. Families may wait months or longer before receiving a diagnosis for their child despite evidence that earlier intervention leads to better treatment outcomes. Digital technologies which detect the presence of behaviors related to autism can scale access to pediatric diagnoses. This work aims to demonstrate the feasibility of deep learning technologies for detecting hand flapping from unstructured home videos as a first step towards validating whether models and digital technologies can be leveraged to aid with autism diagnoses. We used the Self-Stimulatory Behavior Dataset (SSBD), which contains 75 videos of hand flapping, head banging, and spinning exhibited by children. From all the hand flapping videos, we extracted 100 positive and control videos of hand flapping, each between 2 to 5 seconds in duration. Utilizing both landmark-driven-approaches and MobileNet V2’s pretrained convolutional layers, our highest performing model achieved a testing F1 score of 84% (90% precision and 80% recall) on the Self-Stimulatory Behavior Dataset (SSBD). This work provides the first step towards developing precise deep learning methods for activity detection of autism-related behaviors.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Adversarial Drifts, Model Monitoring, and Feedback Loops: Building Human-in-the-Loop Machine Learnin

2022-07-19 Watch

video

AI/ML Databricks

Protecting the user community and the platform from illegal or undesirable behavior is an important problem for most large online platforms. Content moderation (aka Integrity) systems aim to define, detect and take action on bad behavior/content at scale, usually accomplished with a combination of machine learning and human review.

Building hybrid human/ML systems for content moderation presents unique challenges, some of which we will discuss in this talk: * Human review annotation guidelines & how it impacts label quality for ML models * Bootstrapping labels for new categories of content violation policies * Role of adversarial drift in model performance degradation * Best practices for monitoring model performance & ecosystem health * Building adaptive machine learning models

The talk is a distillation of learnings from building such systems at Facebook, and from talking to other ML practitioners & researchers who’ve worked on similar systems elsewhere.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Agile Data Engineering: Reliability and Continuous Delivery at Scale

2022-07-19 Watch

video

Agile/Scrum CI/CD Data Engineering Data Quality Databricks

With businesses competing to deliver value while growing rapidly and adapting to changing markets, it is more important than ever for data teams to support faster and reliable insights. We need to fail fast, learn, adapt, release and repeat. For us, Trusted and unified data infrastructure with standardized practices is at the crux of it all

In this talk: we'll go over Atlassian's data engineering team organization, infrastructure and development practices

Team organization and roles
Overview of our data engineering technical stack
Code repositories and CICD setup
Testing framework
Development walkthrough
Production data quality & integrity
Alerting & Monitoring
Tracking operational metrics (SLI/SLO, Cost)

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Amgen’s Journey To Building a Global 360 View of its Customers with the Lakehouse

2022-07-19 Watch

video

Agile/Scrum AI/ML Analytics AWS Cloud Computing Data Engineering

Serving patients in over 100 countries, Amgen is a leading global biotech company focused on developing therapies that have the power to save lives. Delivering on this mission requires our commercial teams to regularly meet with healthcare providers to discuss new treatments that can help patients in need. With the onset of the pandemic, where face-to-face interactions with doctors and other Healthcare Providers (HCPs) were severely impacted, Amgen had to rethink these interactions. With that in mind, the Amgen Commercial Data and Analytics team leveraged a modern data and AI architecture built on the Databricks Lakehouse to help accelerate its digital and data insights capabilities. This foundation enabled Amgen’s teams to develop a comprehensive, customer-centric view to support flexible go-to-market models and provide personalized experiences to our customers. In this presentation, we will share our recent journey of how we took an agile approach to bringing together over 2.2 petabytes of internally generated and externally sourced vendor data , and onboard into our AWS Cloud and Databricks environments to enable a standardized, scalable and robust capabilities to meet the business requirements in our fast-changing life sciences environment. We will share use cases of how we harmonized and managed our diverse sets of data to deliver efficiency, simplification, and performance outcomes for the business. We will cover the following aspects of our journey along with best practices we learned over time: • Our architecture to support Amgen’s Commercial Data & Analytics constant processing around the globe • Engineering best practices for building large scale Data Lakes and Analytics platforms such as Team organization, Data Ingestion and Data Quality Frameworks, DevOps Toolkit and Maturity Frameworks, and more • Databricks capabilities adopted such as Delta Lake, Workspace policies, SQL workspace endpoints, and MLflow for model registry and deployment. Also, various tools were built for Databricks workspace administration • Databricks capabilities being explored for future, such as Multi-task Orchestration, Container-based Apache Spark Processing, Feature Store, Repos for Git integration, etc. • The types of commercial analytics use cases we are building on the Databricks Lakehouse platform Attendees building global and Enterprise scale data engineering solutions to meet diverse sets of business requirements will benefit from learning about our journey. Technologists will learn how we addressed specific Business problems via reusable capabilities built to maximize value.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

A Modern Approach to Big Data for Finance

2022-07-19 Watch

video

Big Data Data Management Databricks Delta Fabric

There are unique challenges associated with working with big data for finance (volume of data, disparate storage, variable sharing protocols etc...)
Leveraging open source technologies, like Databricks' Delta Sharing, in combination with a flexible data management stack, can allow organizations to be more nimble in testing and deploying more strategies
Live demonstration of Delta Sharing in combination with Nasdaq Data Fabric

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Implementing a Framework for Data Security and Policy at a Large Public Sector Agency

2022-07-19 Watch

video

Databricks Cyber Security

Most large public sector and government agencies all have multiple data-driven initiatives being implemented or considered across functional domains. But, as they scale these efforts they need to ensure data security and quality are top priorities.

In this session, the presenters discuss the core elements of a successful data security and quality framework, including best practices, potential pitfalls, and recommendations based on success with a large federal agency.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Implementing an End-to-End Demand Forecasting Solution Through Databricks and MLflow

2022-07-19 Watch

video

AI/ML Databricks Delta ERP Spark

In retail, the right quantity at the right time is crucial for success. In this session we share how a demand forecasting solution helped some of our retailers to improve efficiencies and sharpen fresh product production and delivery planning.

With the setup in place we train hundreds of models in parallel, training on various levels including store level, product level and the combination of the two. By leveraging the distributed computation of Spark, we can do all of this in a scalable and fast way. Powered by Delta Lake, feature store and MLFlow this session clarifies how we built a highly reliable ML factory.

We show how this setup runs at various retailers and feeds accurate demand forecasts back to the ERP system, supporting the clients in their production planning and delivery. Through this session we want to inspire retailers & conference attendants to use data & AI to not only gain efficiency but also decrease food waste.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Implementing Data Governance 3.0 for the Lakehouse Era: Community-Led and Bottom-Up

2022-07-19 Watch

video

Agile/Scrum Data Governance Data Lakehouse Databricks

In this session, I cover our lessons in rethinking data governance by approaching data governance as an enablement function through implementing over 200+ data projects. I’ll go into the nuts and bolts of tooling and cultural practices governing our team and data helped our team accomplish projects twice as fast with teams that were one-third our normal size.

The session concludes with why organizations should start believing in and investing in true data governance and implementing governance tools and processes that are agile and collaborative, rather than top-down.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Improving Apache Spark Application Processing Time by Configurations, Code Optimizations, etc.

2022-07-19 Watch

video

Azure Databricks Java Kafka KPI Spark

In this session, we'll go over several use-cases and describe the process of improving our spark structured streaming application micro-batch time from ~55 to ~30 seconds in several steps.

Our app is processing ~ 700 MB/s of compressed data, it has very strict KPIs, and it is using several technologies and frameworks such as: Spark 3.1, Kafka, Azure Blob Storage, AKS and Java 11.

We'll share our work and experience in those fields, and go over a few tips to create better Spark structured streaming applications.

The main areas that will be discussed are: Spark Configuration changes, code optimizations and the implementation of the Spark custom data source.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Improving Interactive Querying Experience on Spark SQL

2022-07-19 Watch

video

Databricks Spark SQL

Being a data driven company, interactive querying on 100s of petabytes of data is a common and important function at Pinterest. Interactive querying has different requirements and challenges from batch querying.

In this talk, we will talk about various architectural alternatives one can choose from to perform interactive querying with Spark SQL. Through discussion on trade-offs of those architectures and requirements for interactive querying, we will elaborate on our design choice. We will share enhancements we made to open source projects including Apache Spark, Apache Livy and Dr. Elephant along with in-house technologies we built to improve interactive querying experience at Pinterest. We will share enhancements like DDL query speed ups, spark session caching, spark session sharing, Apache Yarn’s diagnostic message improvements, query failure handling and tuning recommendations. We will also discuss some challenges we faced along the way and future improvements we are working on.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Improving patient care with Databricks

2022-07-19 Watch

video

Azure Databricks Delta PySpark

Learn how Wipro helped a world leader in medical technology to modernize its data used the PySpark interface on Azure Databricks to create reusable generic frameworks, including slowly changing dimensions (SCDs), data validation/reconciliation tools, and delta lake tables created from metadata.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Introduction to Flux and OSS Replication

2022-07-19 Watch

video

Analytics Cloud Computing Databricks

In this breakout session we’ll learn about Flux, the data scripting and query language for InfluxDB. InfluxDB is the leading time series database platform. With Flux you can perform time series lifecycle management tasks, data preparation and analytics, alert tasks, and more. InfluxDB has two offerings: InfluxDB Cloud and InfluxDB OSS. Finally, we’ll learn about how you can use Flux and the replication tool to consolidate data from your OSS instances running at the edge to InfluxDB Cloud.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Learn to Efficiently Test ETL Pipelines

2022-07-19 Watch

video

Databricks ETL/ELT PySpark Python

This talk is a story, using examples in Python and pySpark, about testing ETL pipelines efficiently. I won’t try to convince you that you need unit tests or automated tests – that’s up to you. If you do have unit tests for your ETL pipelines, or if you want them, it can be useful to make sure you aren’t testing more than you need.

I’ll be describing how a practical (non-pyramid shaped) heuristic helps me efficiently cover edge cases and unexpected bugs in my code by ensuring I test only the code needed for the feature I’m building.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Leveraging ML-Powered Analytics for Rapid Insights and Action (a demonstration)

2022-07-19 Watch

video

AI/ML Analytics Data Lakehouse Databricks Modern Data Stack

The modern data stack makes it possible to query high-volume data with extremely high granularity, dimensionality, and cardinality. Operationalized machine learning is a great way to address this complex data, focusing the scope of analyst inquiry and quickly exposing dimensions, groups, and sub-groups of data with the greatest impact on key metrics.

This session will discuss how to leverage operationalized AI/ML to automatically define millions of features and perform billions of simultaneous hypothesis tests across a wide dataset to identify key drivers of metric change. A technical demonstration will include an overview of leveraging the Databricks Lakehouse using Sisu’s AI/ML-powered decision intelligence platform: connecting to Databricks, defining metrics, automated AI/ML-powered analysis, and exposing actionable business insights.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Live Analytics: The next user engagement frontier

2022-07-19 Watch

video

Analytics Databricks Thoughtspot

The last couple years have put a new lens on how organizations approach analytics - day-old data became useless, and only in-the-moment-insights became relevant, pushing data and analytics teams to their breaking point. The results: everyone has fast forwarded in their transformation and modernization plans, and it's also made us look differently at who engages with data and how.

At ThoughtSpot, we believe analytics is not just for data people. It’s for everyone - everywhere. Join us in this session to: Learn how to transform the user experience with self-service, interactive analytics Get real-life tips on implementing a modern analytics strategy See a demo of Live Analytics in ThoughtSpot Hear how Norwegian airline Flyr is resetting analytics in their industry by putting data first

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Log Processing at Scale

2022-07-19 Watch

video

Databricks

FlashBlade's engineering code factory generates 5 million log lines per second into log files. We scan a stream of these log files looking for known anomalies. This helps reduce time to triage code factory build and test errors.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Low-Code Machine Learning on Databricks with AutoML

2022-07-19 Watch

video

AI/ML Analytics Databricks

Teams across an organization should be able to use predictive analytics for their business. While there are data scientists and data engineers who can leverage code to build ML models, there are domain experts and analysts who can benefit from low-code tools to build ML solutions.

Join this session to learn how you can leverage Databricks AutoML and other low-code tools to build, train and deploy ML models into production. Additionally, Databricks takes a unique glass-box approach, so you can take the code behind ML model and tweak further to fine-tune performance and integrate into production systems. See these capabilities in action and learn how Databricks empowers users of varying levels of expertise to build ML solutions.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Managing Straggler Executors at Apache Spark 3.3

2022-07-19 Watch

video

Databricks Kubernetes Spark

Tuning high-performance Apache Spark applications to handle mis-behaving executors is at best challenging and at worst impossible. Apache Spark does provide some built-in support to kill and recreate new executors under certain conditions such as long GC delays or due to application errors. However this still leaves-open various scenarios where slow-running executors can impact the overall performance of your application even when you enable features such as task speculation. In this talk, we are going to describe Apache Spark 3.3’s new feature, Executor Rolling. Apache Spark 3.3 (SPARK-37810) provides a built-in executor rolling driver plugin with three configurations.

spark.kubernetes.executor.rollInterval (default: '0s' which means being disabled.) spark.kubernetes.executor.rollPolicy (default: OUTLIER) spark.kubernetes.executor.minTasksPerExecutorBeforeRolling (default: 0)

This driver plugin tries to choose and decommission a single executor at every interval with the given policy. The followings are the built-in policies and their targets.

ID: An executor with the smallest executor ID
ADD_TIME: An executor with the smallest add-time
TOTAL_GC_TIME: An executor with the biggest GC time
TOTAL_DURATION: An executor with the biggest total task time
AVERAGE_DURATION: An executor with the biggest average task duration
FAILED_TASKS: An executor with the largest number of failed tasks
OUTLIER: An outlier executor or the biggest total task time

In short, Apache Spark 3.3 maintains the set of live executors literally freshly and reduces much engineering burdens to handle executors’ JVM misbehavior at diverse production jobs by utilizing the proposed built-in executor rolling policies in advance.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Migrate Your Existing DAGs to Databricks Workflows

2022-07-19 Watch

video

AI/ML Cloud Computing Data Lakehouse Databricks ETL/ELT

In this session, you will learn the benefits of orchestrating your business-critical ETL and ML workloads within the lakehouse, as well as how to migrate and consolidate your existing workflows to Databricks Workflows - a fully managed lakehouse orchestration service that allows you to run workflows on any cloud. We’ll walk you through different migration scenarios and share lessons learned and recommendations to help you reap the benefits of orchestration with Databricks Workflows.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Migrating Complex SAS Processes to Databricks - Case Study

2022-07-19 Watch

video

Analytics Cloud Computing Data Management Data Science Databricks ETL/ELT

Many federal agencies use SAS software for critical operational data processes. While SAS has historically been a leader in analytics, it has often been used by data analysts for ETL purposes as well. However, modern data science demands on ever-increasing volumes and types of data require a shift to modern, cloud architectures and data management tools and paradigms for ETL/ELT. In this presentation, we will provide a case study at Centers for Medicare and Medicaid Services (CMS) detailing the approach and results of migrating a large, complex legacy SAS process to modern, open-source/open-standard technology - Spark SQL & Databricks – to produce results ~75% faster without reliance on proprietary constructs of the SAS language, with more scalability, and in a manner that can more easily ingest old rules and better govern the inclusion of new rules and data definitions. Significant technical and business benefits derived from this modernization effort are described in this session.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

ML on the Lakehouse: Bringing Data and ML Together to Accelerate AI Use Cases

2022-07-19 Watch

video

AI/ML Data Lakehouse Databricks MLOps

Discover the latest innovations from Databricks that can help you build and operationalize the next generation of machine learning solutions. This session will dive into Databricks Machine Learning, a data-centric AI platform that spans the full machine learning lifecycle - from data ingestion and model training to production MLOps. You'll learn about key capabilities that you can leverage in your ML use cases and see the product in action. You will also directly hear how Databricks ML is being used to maximize supply chain logistics and keep millions of Coca-Cola products on the shelf.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

MLOps at DoorDash

2022-07-19 Watch

video

AI/ML Databricks MLOps

MLOps is one of the widely discussed topics in the ML practitioner community. Streamlining the ML development and productionalizing ML are important ingredients to realize the power of ML, however it requires a vast and complex infrastructure. The ROI of ML projects will start only when they are in production. The journey to implementing MLOps will be unique to each company. At DoorDash, we’ve been applying MLOps for a couple of years to support a diverse set of ML use cases and to perform large scale predictions at low latency.

This session will share our approach to MLOps, as well as some of the learnings and challenges. In addition, it will share some details about the DoorDash ML stack, which consists of a mixture of homegrown solutions, open source solutions and vendor solutions like Databricks.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

talk-data.com

Databricks DATA + AI Summit 2023

Top Topics

Top Speakers

Migrate and Modernize your Data Platform with Confluent and Databricks

Operational Analytics: Expanding the Reach of Data in the Lakehouse Era

Power to the (SQL) People: Python UDFs in DBSQL

Accelerating the Pace of Autism Diagnosis with Machine Learning Models

Adversarial Drifts, Model Monitoring, and Feedback Loops: Building Human-in-the-Loop Machine Learnin

Agile Data Engineering: Reliability and Continuous Delivery at Scale

Amgen’s Journey To Building a Global 360 View of its Customers with the Lakehouse

A Modern Approach to Big Data for Finance

Implementing a Framework for Data Security and Policy at a Large Public Sector Agency

Implementing an End-to-End Demand Forecasting Solution Through Databricks and MLflow

Implementing Data Governance 3.0 for the Lakehouse Era: Community-Led and Bottom-Up

Improving Apache Spark Application Processing Time by Configurations, Code Optimizations, etc.

Improving Interactive Querying Experience on Spark SQL

Improving patient care with Databricks

Introduction to Flux and OSS Replication

Learn to Efficiently Test ETL Pipelines

Leveraging ML-Powered Analytics for Rapid Insights and Action (a demonstration)

Live Analytics: The next user engagement frontier

Log Processing at Scale

Low-Code Machine Learning on Databricks with AutoML

Managing Straggler Executors at Apache Spark 3.3

Migrate Your Existing DAGs to Databricks Workflows

Migrating Complex SAS Processes to Databricks - Case Study

ML on the Lakehouse: Bringing Data and ML Together to Accelerate AI Use Cases

MLOps at DoorDash