talk-data.com talk-data.com

Event

Databricks DATA + AI Summit 2023

2026-01-11 YouTube Visit website ↗

Activities tracked

582

Sessions & talks

Showing 226–250 of 582 · Newest first

Search within this event →
Perplexity: A Copilot for All Your Web Searches and Research

Perplexity: A Copilot for All Your Web Searches and Research

2023-07-26 Watch
video

In this demo, we will show you the fastest and functional answer engine and search copilot that exists right now: Perplexity.ai. It can solve a wide array of problems starting from giving you fast answers to any topic to planning trips and doing market research on things unfamiliar to you, all in a trustworthy way without hallucinations, providing you references in the form of citations. This is made possible by harnessing the power of LLMs along with retrieval augmented generation from traditional search engines and indexes.

We will also show you how information discovery can now be fully personalized to you: personalization through prompt engineering. Finally, we will see use cases of how this search copilot can help you in your day to day tasks in a data team: be it a data engineer, data scientist, or a data analyst.

Talk by: Aravind Srinivas

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: Accenture | Databricks Enables Employee Data Domain to Align People w/ Business Outcomes

Sponsored: Accenture | Databricks Enables Employee Data Domain to Align People w/ Business Outcomes

2023-07-26 Watch
video

A global franchise retailer was struggling to understand the value of its employees and had not fostered a data-driven enterprise. During the journey to use facts as the basis for decision making, Databricks became the facilitator of DataMesh and created the pipelines, analytics and source engine for a three-layer — bronze, silver, gold — lakehouse that supports the HR domain and drives the integration of multiple additional domains: sales, customer satisfaction, product quality and more. In this talk, we will walk through:

  • The business rationale and drivers
  • The core data sources
  • The data products, analytics and pipelines
  • The adoption of Unity Catalog for data privacy compliance /adherence and data management
  • Data quality metrics

Join us to see the analytic product and the design behind this innovative view of employees and their business outcomes.

Talk by: Rebecca Bucnis

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp Databricks named a Leader in 2022 Gartner® Magic QuadrantTM CDBMS: https://dbricks.co/3phw20d

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: Anomalo | Scaling Data Quality with Unsupervised Machine Learning Methods

Sponsored by: Anomalo | Scaling Data Quality with Unsupervised Machine Learning Methods

2023-07-26 Watch
video

The challenge is no longer how big, diverse, or distributed your data is. It's that you can't trust it. Companies are utilizing rules and metrics to monitor data quality, but they’re tedious to set up and maintain. We will present a set of fully unsupervised machine learning algorithms for monitoring data quality at scale, which requires no setup, catching unexpected issues and preventing alert fatigue by minimizing false positives. At the end of this talk, participants will be equipped with insight into unsupervised data quality monitoring, its advantages and limitations, and how it can help scale trust in your data.

Talk by: Vicky Andonova

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: Avanade | Enabling Real-Time Analytics with Structured Streaming and Delta Live Tables

Sponsored by: Avanade | Enabling Real-Time Analytics with Structured Streaming and Delta Live Tables

2023-07-26 Watch
video

Join the panel to hear how Avanade is helping clients enable real-time analytics and tackle the people and process problems that accompany technology, powered by Azure Databricks.

Talk by: Thomas Kim, Dael Williamson, Zoé Durand

Here’s more to explore: Data, Analytics, and AI Governance: https://dbricks.co/44gu3YU

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: Fivetran | Fivetran and Catalyst Enable Businesses & Solve Critical Market Challenges

Sponsored by: Fivetran | Fivetran and Catalyst Enable Businesses & Solve Critical Market Challenges

2023-07-26 Watch
video

Fivetran helps Enterprise and Commercial companies improve the efficiency of their data movement, infrastructure, and analysis by providing a secure, scalable platform for high-volume data movement. In this fireside chat, we will dive into the pain points that drove Catalyst, a cloud-based platform that helps software companies grow revenue with advanced insights and workflows that strengthen customer adoption, retention, expansion and advocacy, to begin their search for a partnership that would automate and simplify data management along with the pivotal success driven by the implementation of Fivetran and Databricks. 

Discover how together Fivetran and Databricks:

  • Deliver scalable, real-time analytics to customers with minimal configuration and centralize customer data into customer success tools.
  • Improve Catalyst’s visibility into customer health, opportunities, and risks across all teams.
  • Turn data into revenue-driving insights around digital customer behavior with improved targeting and Ai/ Machine learning.
  • Provide a robust and scalable data infrastructure that supports Catalyst’s growing data needs, with improvements in data availability, data quality, and overall efficiency in data operations.

Talk by: Edward Chiu and Lauren Schwartz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: Wipro | Personalized Price Transparency Using Generative AI

Sponsored by: Wipro | Personalized Price Transparency Using Generative AI

2023-07-26 Watch
video

Patients are increasingly taking an active role in managing their healthcare costs and are more likely to choose providers and treatments based on cost considerations. Learn how technology can help build cost-efficient care models across the healthcare continuum, delivering higher quality care while improving patient experience and operational efficiency.

Talk by: Janine Pratt

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Structured Streaming: Demystifying Arbitrary Stateful Operations

Structured Streaming: Demystifying Arbitrary Stateful Operations

2023-07-26 Watch
video
Angela Chu (Databricks)

Let’s face it -- data is messy. And your company’s business requirements? Even messier. You’re staring at your screen, knowing there is a tool that will let you give your business partners the information they need as quickly as they need it. There’s even a Python version of it now. But…it looks kind of scary. You’ve never used it before, and you don’t know where to start. Yes, we’re talking about the dreaded flatMapGroupsWithState. But fear not - we’ve got you covered.

In this session, we’ll take a real-word use case and use it to show you how to break down flatMapGroupsWithState into its basic building blocks. We’ll explain each piece in both Scala and the newly-released Python, and at the end we’ll illustrate how it all comes together to enable the implementation of arbitrary stateful operations with Spark Structured Streaming.

Talk by: Angela Chu

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Taking Your Cloud Vendor to the Next Level: Solving Complex Challenges with Azure Databricks

Taking Your Cloud Vendor to the Next Level: Solving Complex Challenges with Azure Databricks

2023-07-26 Watch
video
Tomer Patel , Itai Yaffe (Nielsen Identity Engine)

Akamai's content delivery network (CDN) processes about 30% of the internet's daily traffic, resulting in a massive amount of data that presents engineering challenges, both internally and with cloud vendors. In this session, we will discuss the barriers faced while building a data infrastructure on Azure, Databricks, and Kafka to meet strict SLAs, hitting the limits of some of our cloud vendors’ services. We will describe the iterative process of re-architecting a massive scale data platform using the aforementioned technologies.

We will also delve into how today, Akamai is able to quickly ingest and make available to customers terabytes of data, as well as efficiently query Petabytes of data and return results within 10 seconds for most queries. This discussion will provide valuable insights for attendees and organizations seeking to effectively process and analyze large amounts of data.

Talk by: Tomer Patel and Itai Yaffe

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

The Future of Data Sharing and Collaboration: A Perspective from Industry Leaders

The Future of Data Sharing and Collaboration: A Perspective from Industry Leaders

2023-07-26 Watch
video

More and more, organizations must exchange data with their customers, suppliers and partners. And yet, efficiency and immediate accessibility are equally important. To be truly data-driven, organizations need a better way to share data.

Join a panel of industry leaders from London Stock Exchange, Accuweather, Zoominfo and CoreLogic as they dive into the significance of open standards for data sharing and the game-changing impact of marketplaces that enable the exchange of not just data, but notebooks, dashboards, ML models, and applications. Discover how collaboration can break down walled-garden approaches and unlock limitless potential for innovation. Gain valuable insights into the future of data sharing and collaboration as the panelists share their experiences and successful strategies for effective data collaboration.

This session covers it all, from the role of technology in secure sharing to ethical considerations. Ask any questions that you might have. Don't wait to transform the future of your industry - register now and join the data-sharing and collaboration revolution.

Talk by: Jay Bhankharia, Sneh Kakileti, Naftali Cohen, Brian Battaglia, and Paul Lentz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Under the Hood: Intelligent Workload Management

Under the Hood: Intelligent Workload Management

2023-07-26 Watch
video
Priyam Dutta (Databricks)

Join this talk to learn from a senior staff engineer at Databricks how machine learning is leveraged to make Databricks SQL more responsive and efficient. This is a “bits and bytes” talk for those interested in knowing how our engine works.

Talk by: Priyam Dutta

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Unleashing Large Language Models with Databricks SQL's AI Functions

Unleashing Large Language Models with Databricks SQL's AI Functions

2023-07-26 Watch
video

This talk introduces AI Functions, a new feature in Databricks SQL that enables seamless integration of Large Language Models (LLMs) into SQL workflows. We illustrate how AI Functions simplifies the use of LLMs like OpenAI’s ChatGPT for tasks such as text classification, and bypassing the need for complex pipelines.

By demonstrating the setup and application of AI Functions, this shows how this tool democratizes AI and puts the power of LLMs directly into the hands of your data analysts and scientists. The talk concludes with a look towards the future of AI Functions and the exciting possibilities they unlock for businesses.

Talk by: Shitao Li and Yu Gong

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Unlocking the Power of Databricks SDKs: The Power to Integrate, Streamline, and Automate

Unlocking the Power of Databricks SDKs: The Power to Integrate, Streamline, and Automate

2023-07-26 Watch
video
Serge Smertin (Databricks)

In today's data-driven landscape, the demands placed upon data engineers are diverse and multifaceted. With the integration of Java, Python, or Go microservices, Databricks SDKs provide a powerful bridge between the established ecosystems and Databricks. They allow data engineers to unlock new levels of integration and collaboration, as well as integrate Unity Catalog into processes to create advanced workflows straight from notebooks.

In this session, learn best practices for when and how to use SDK, command-line interface, or Terraform integration to seamlessly integrate with Databricks and revolutionize how you integrate with the Databricks Lakehouse. The session covers using shell scripts to automate complex tasks and streamline operations that improve scalability.

Talk by: Serge Smertin

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Using NLP to Evaluate 100 Million Global Webpages Daily to Contextually Target Consumers

Using NLP to Evaluate 100 Million Global Webpages Daily to Contextually Target Consumers

2023-07-26 Watch
video
Xuefu Wang , Mark Lee (Databricks)

This session will cover the challenges and the solution that The Trade Desk went through to scale their ML models for NLP for 100 million web pages per day.

TTD's contextual targeting team needs to analyze 100 million web pages per day. Fifty percent of the webpages are non-English. Half of the content was not being properly analyzed and targeted intelligently. TTD attempted to build a model using Spark NLP, however the package could not scale and was not cost-effective. GPU utilization was low and the solution was cost prohibitive. TTD engaged with Databricks in early 2022 to build an NLP model on Databricks. Our teams partnered closely together. We were able to build a solution using distributed inference (150-200 GPUs running at 80%+ utilization); Each day, Databricks translated two hundred times faster across 50 million web pages that are in for over 35 + languages and at a fraction of the cost. This solution enables TTD teams to standardize on English for contextual targeting ML models. TTD can now be a one-stop shop for their customers' global advertising needs.

The Trade Desk is headquartered in Ventura, California. It is the largest independent demand-side platform in the world, competing against Google, Facebook, and others. Unlike traditional marketing, programmatic marketing is operated by real-time, split-second decisions based on user identity, device information, and other data points. It enables highly personalized consumer experiences and improves return-on-investment for companies and advertisers.

Talk by: Xuefu Wang and Mark Lee

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Deep Dive Into Grammarly's Data Platform

Deep Dive Into Grammarly's Data Platform

2023-07-25 Watch
video

Grammarly helps 30 million people and 50,000 teams to communicate more effectively. Using the Databricks Lakehouse Platform, we can rapidly ingest, transform, aggregate, and query complex data sets from an ecosystem of sources, all governed by Unity Catalog. This session will overview Grammarly’s data platform and the decisions that shaped the implementation. We will dive deep into some architectural challenges the Grammarly Data Platform team overcame as we developed a self-service framework for incremental event processing.

Our investment in the lakehouse and Unity Catalog has dramatically improved the speed of our data value chain: making 5 billion events (ingested, aggregated, de-identified, and governed) available to stakeholders (data scientists, business analysts, sales, marketing) and downstream services (feature store, reporting/dashboards, customer support, operations) available within 15. As a result, we have improved our query cost performance (110% faster at 10% the cost) compared to our legacy system on AWS EMR.

I will share architecture diagrams, their implications at scale, code samples, and problems solved and to be solved in a technology-focused discussion about Grammarly’s iterative lakehouse data platform.

Talk by: Faraz Yasrobi and Christopher Locklin

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Unity Catalog, Delta Sharing and Data Mesh on Databricks Lakehouse

Unity Catalog, Delta Sharing and Data Mesh on Databricks Lakehouse

2023-07-25 Watch
video

In this technical deep dive, we will detail how customers implemented data mesh on Databricks and how standardizing on delta format enabled delta-to-delta share to non-Databricks consumers.

  • Current state of the IT landscape
  • Data silos (problems with organizations not having connected data in the ecosystem)
  • A look back on why we moved away from data warehouses and choose cloud in the first place
  • What caused the data chaos in the cloud (instrumentation and too much stitching together) ~ periodic table list of services of the cloud
  • How to strike the balance between autonomy and centralization
  • Why Databricks Unity Catalog puts you in the right path to implementing data mesh strategy
  • What are the process and features that enable and end-to-end Implementation of a data strategy
  • How customers were able to successfully implement the data mesh on out of the box Unity Catalog and delta sharing without overwhelming their IT tool stack
  • Use cases
  • Delta-to-delta data sharing
  • Delta-to-others data sharing
  • How do you navigate when data today is available across regions, across clouds, on-prem and external systems
  • Change data feed to share only “data that has changed”
  • Data stewardship
  • Why ABAC is important
  • How file based access policies and governance play an important role
  • Future state and its pitfalls
  • Egress costs
  • Data compliances

Talk by: Surya Turaga and Thomas Roach

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Deep Dive into the New Features of Apache Spark™ 3.4

Deep Dive into the New Features of Apache Spark™ 3.4

2023-07-25 Watch
video

Join us for this Technical Deep Dive session. In 2022, Apache Spark™ was awarded the prestigious SIGMOD Systems Award, because Spark is the de facto standard for data processing.

In this session, we will share the latest progress in Apache Spark community. With tremendous contribution from the open source community, Spark 3.4 managed to resolve in excess of 2,400 Jira tickets. We will talk about the major features and improvements in Spark 3.4. The major updates are Spark Connect, numerous PySpark and SQL language features, engine performance enhancements, as well as operational improvements in Spark UX and error handling.

Talk by: Xiao Li and Daniel Tenedorio

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

PII Detection at Scale on the Lakehouse

PII Detection at Scale on the Lakehouse

2023-07-25 Watch
video

SEEK is Australia’s largest online employment marketplace and a market leader spanning ten countries across Asia Pacific and Latin America. SEEK provides employment opportunities for roughly 16 million monthly active users and process 25 million candidate applications to listings. Processing millions of resumes involves handling and managing highly sensitive candidate information, usually inputted in a highly unstructured format. With recent high-profile data leaks in Australia, personally identifiable information (PII) protection has become a major focus area for large digital organizations.

The first step is detection, and SEEK has developed a custom framework built using HuggingFace transformers fine-tuned with nuances around employment. For example, “Software Engineer at Databricks” is not PII, but “CEO at Databricks” is PII. After identifying and anonymizing PII in stream and batch data, SEEK uses Unity Catalog’s data lineage to track PII through their reporting, ETL, and other downstream ML use-cases and govern access control achieving an organization-wide data management capability driven by deep learning and enforcement using Databricks.

Talk by: Ajmal Aziz and Rachael Straiton

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

How Rec Room Processes Billions of Events Per Day with Databricks and RudderStack

How Rec Room Processes Billions of Events Per Day with Databricks and RudderStack

2023-07-25 Watch
video

Learn how Rec Room, a fast-growing augmented and virtual reality software startup, is saving 50% of their engineering team's time by using Databricks and RudderStack to power real-time analytics and insights for their 85 million gaming customers.

In this session, you will walk through a step-by-step explanation of how Rec Room set up efficient processes for ingestion into their data lakehouse, transformation, reverse-ETL and product analytics. You will also see how Rec Room is using incremental materialization of tables to save costs and establish an uptime of close to 100%.

Talk by: Albert Hu and Lewis Mbae

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

What’s New with Data Sharing and Collaboration on the Lakehouse: From Delta Sharing to Clean Rooms

What’s New with Data Sharing and Collaboration on the Lakehouse: From Delta Sharing to Clean Rooms

2023-07-25 Watch
video

Get ready to accelerate your data and AI collaboration game with the Databricks product team. Join us as we build the next generation of secure data collaboration capabilities on the lakehouse. Whether you're just starting your data sharing journey or exploring advanced data collaboration features like data cleanrooms, this session is tailor-made for you.

In this demo-packed session, you'll discover what’s new in Delta Sharing including dynamic and materialized views for sharing, sharing other assets such as notebooks, ML models, new Delta Sharing open source connectors for the tools of your choice, and updates to Databricks cleanroom. Learn how lakehouse is the perfect solution for your data and AI collaboration requirements, across clouds, regions and platforms and without any vendor lock-in. Plus, you'll get a peek into our upcoming roadmap. Ask any burning questions you have for our expert product team as they build a collaborative lakehouse for data, analytics and AI.

Talk by: Erika Ehrli, Kelly Albano, and Xiaotong Sun

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: Microsoft | Next-Level Analytics with Power BI and Databricks

Sponsored by: Microsoft | Next-Level Analytics with Power BI and Databricks

2023-07-25 Watch
video

The widely-adopted combination of Power BI and Databricks has been a game-changer in providing a comprehensive solution for modern data analytics. In this session, you’ll learn how self-service analytics combined with the Databricks Lakehouse Platform can allow users to make better-informed decisions by unlocking insights hidden in complex data. We’ll provide practical examples of how organizations have leveraged these technologies together to drive digital transformation, lower total cost of ownership (TCO), and increase revenue. By the end of the presentation and demo, you’ll understand how Power BI and Databricks can help drive real-time insights at scale for organizations in any industry.

Talk by: Bob Zhang and Mahesh Prakriya

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Taking Control of Streaming Healthcare Data

Taking Control of Streaming Healthcare Data

2023-07-25 Watch
video

Chesapeake Regional Information System for our Patients (CRISP), a nonprofit healthcare information exchange (HIE), initially partnered with Slalom to build a Databricks data lakehouse architecture in response to the analytics demands of the COVID-19 pandemic, since then they have expanded the platform to additional use cases. Recently they have worked together to engineer streaming data pipelines to process healthcare messages, such as HL7, to help CRISP become vendor independent.

This session will focus on the improvements CRISP has made to their data lakehouse platform to support streaming use cases and the impact these changes have had for the organization. We will touch on using Databricks Auto Loader to efficiently ingest incoming files, ensuring data quality with Delta Live Tables, and sharing data internally with a SQL warehouse, as well as some of the work CRISP has done to parse and standardize HL7 messages from hundreds of sources. These efforts have allowed CRISP to stream over 4 million messages daily in near real-time with the scalability it needs to continue to onboard new healthcare providers so it can continue to facilitate care and improve health outcomes.

Talk by: Andy Hanks and Chris Mantz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Databricks As Code:Effectively Automate a Secure Lakehouse Using Terraform for Resource Provisioning

Databricks As Code:Effectively Automate a Secure Lakehouse Using Terraform for Resource Provisioning

2023-07-25 Watch
video

At Rivian, we have automated more than 95% of our Databricks resource provisioning workflows using an in-house Terraform module, affording us a lean admin team to manage over 750 users. In this session, we will cover the following elements of our approach and how others can benefit from improved team efficiency.

  • User and service principal management
  • Our permission model on Unity Catalog for data governance
  • Workspace and secrets resource management
  • Managing internal package dependencies using init scripts
  • Facilitating dashboards, SQL queries and their associated permissions
  • Scaling source of truth Petabyte scale Delta Lake table ingestion jobs and workflows

Talk by: Jason Shiverick and Vadivel Selvaraj

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

MLOps at Gucci: From Zero to Hero

MLOps at Gucci: From Zero to Hero

2023-07-25 Watch
video

Delta Lake is an open-source storage format that can be ideally used for storing large-scale datasets, which can be used for single-node and distributed training of deep learning models. Delta Lake storage format gives deep learning practitioners unique data management capabilities for working with their datasets. The challenge is that, as of now, it’s not possible to use Delta Lake to train PyTorch models directly.

PyTorch community has recently introduced a Torchdata library for efficient data loading. This library supports many formats out of the box, but not Delta Lake. This talk will demonstrate using the Delta Lake storage format for single-node and distributed PyTorch training using the torchdata framework and standalone delta-rs Delta Lake implementation.

Talk by: Michael Shtelma

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

What's New in Databricks SQL -- With Live Demos

What's New in Databricks SQL -- With Live Demos

2023-07-25 Watch
video
Can Efeoglu (Databricks)

We’ve been pushing ahead to make the lakehouse even better for data warehousing across several pillars: native serverless experience, best in class price performance, intelligent workload management & observability and enhanced connectivity, analyst & developer experiences. As we look to double down on that pace of innovation, we want to deep dive into everything that’s been keeping us busy.

In this session we will share an update on key roadmap items. To bring things to life, you will see live demos of the most recent capabilities, from data ingestion, transformation, and consumption, using the modern data stack along with Databricks SQL.

Talk by: Can Efeoglu

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

How to Build a Metadata Driven Data Pipelines with Delta Live Tables

How to Build a Metadata Driven Data Pipelines with Delta Live Tables

2023-07-25 Watch
video

In this session, you will learn how you can use metaprogramming to automate the creation and management of Delta Live Tables pipelines at scale. The goal is to make it easy to use DLT for large-scale migrations, and other use cases that require ingesting and managing hundreds or thousands of tables, using generic code components and configuration-driven pipelines that can be dynamically reused across different projects or datasets.

Talk by: Mojgan Mazouchi and Ravi Gawai

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc