talk-data.com talk-data.com

Event

Databricks DATA + AI Summit 2023

2026-01-11 YouTube Visit website ↗

Activities tracked

582

Sessions & talks

Showing 201–225 of 582 · Newest first

Search within this event →
Demonstrate-Search-Predict: Composing Retrieval and Language Models for Knowledge-Intensive NLP

Demonstrate-Search-Predict: Composing Retrieval and Language Models for Knowledge-Intensive NLP

2023-07-26 Watch
video

In this talk, you will learn about how retrieval-augmented in-context learning has emerged as a powerful approach for addressing knowledge intensive tasks using frozen language models (LM) and retrieval models (RM). Existing work has combined these in simple “retrieve-then-read” pipelines in which the RM retrieves passages that are inserted into the LM prompt.

To begin to fully realize the potential of frozen LMs and RMs, we propose Demonstrate–Search–Predict (DSP), a framework that relies on passing natural language texts in sophisticated pipelines between an LM and an RM. DSP can express high-level programs that bootstrap pipeline-aware demonstrations, search for relevant passages, and generate grounded predictions, systematically breaking down problems into small transformations that the LM and RM can handle more reliably.

We have written novel DSP programs for answering questions in open-domain, multi-hop, and conversational settings, establishing in early evaluations new state-of-the-art in-context learning results and delivering 37–125%, 8–40%, and 80–290% relative gains against vanilla LMs, a standard retrieve-then-read pipeline, and a contemporaneous self-ask pipeline, respectively.

Talk by: Keshav Santhanam

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp Databricks named a Leader in 2022 Gartner® Magic QuadrantTM CDBMS: https://dbricks.co/3phw20d

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Future Data Access Control: Booz Allen Hamilton’s Way of Securing Databricks Lakehouse with Immuta

Future Data Access Control: Booz Allen Hamilton’s Way of Securing Databricks Lakehouse with Immuta

2023-07-26 Watch
video

In this talk, I’ll review how we utilize Attribute-Based Access Control (ABAC) to enforce policy via Immuta. I’ll discuss the differences between the ABAC and legacy Role-Based Access Control (RBAC) approaches to control access and how the RBAC approach is not sufficient to keep up with today’s growing big data market. With so much data available, there also comes substantial risk. Data can contain many sensitive data elements, including PII and PHI. Industry leaders like Databricks are pushing the boundaries of data technology, which leads to constantly evolving data use cases. And that’s a good thing. However, the RBAC approach is struggling to keep up with those advancements.

So what is RBAC? It’s an approach to data access that permits system access based on the end-user’s role. For legacy systems, it’s meant as a simple but effective approach to securing data. Are you a manager? Then you’ll get access to data meant for managers. This is great for small deployments with clearly defined roles. Here at Booz Allen, we invested in Databricks because we have an environment of over 30 thousand users and billions of rows of data.

To mitigate this problem and align with our forward-thinking company standard, we introduced Immuta into our stack. Immuta uses ABAC to allow for dynamic data access control. Users are automatically assigned certain attributes, and access is based on those attributes instead of just their role. This allows for more flexibility and allows data access control to easily scale without the need to constantly map a user to their role. Using attributes, we can write policies in one place and have them applied across all our data platforms. This makes for a truly holistic data governance approach and provides immediate ROI and time savings for the company.

Talk by: Jeffrey Hess

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

How to Train Your Own Large Language Models

How to Train Your Own Large Language Models

2023-07-26 Watch
video

Given the success of OpenAI’s GPT-4 and Google’s PaLM, every company is now assessing its own use cases for Large Language Models (LLMs). Many companies will ultimately decide to train their own LLMs for a variety of reasons, ranging from data privacy to increased control over updates and improvements. One of the most common reasons will be to make use of proprietary internal data.

In this session, we’ll go over how to train your own LLMs, from raw data to deployment in a user-facing production environment. We’ll discuss the engineering challenges, and the vendors that make up the modern LLM stack: Databricks, Hugging Face, and MosaicML. We’ll also break down what it means to train an LLM using your own data, including the various approaches and their associated tradeoffs.

Topics covered in this session: - How Replit trained a state-of-the-art LLM from scratch - The different approaches to using LLMs with your internal data - The differences between fine-tuning, instruction tuning, and RLHF

Talk by: Reza Shabani

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

JoinBoost: In Data Base Machine Learning for Tree-Models

JoinBoost: In Data Base Machine Learning for Tree-Models

2023-07-26 Watch
video

Data and machine learning (ML) are crucial for enterprise operations. Enterprises store data in databases for management and use ML to gain business insights. However, there is a mismatch between the way ML expects data to be organized (a single table) and the way data is organized in databases (a join graph of multiple tables) and leads to inefficiencies when joining and materializing tables.

In this talk, you will see how we successfully address this issue. We introduce JoinBoost, a lightweight python library that trains tree models (such as random forests and gradient boosting) for join graphs in databases. JoinBoost acts as a query rewriting layer that is compatible with cloud databases, and eliminates the need for costly join materialization.

Talk by: Zachary Huang

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp Databricks named a Leader in 2022 Gartner® Magic QuadrantTM CDBMS: https://dbricks.co/3phw20d

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Lakehouse Architecture to Advance Security Analytics at the Department of State

Lakehouse Architecture to Advance Security Analytics at the Department of State

2023-07-26 Watch
video

In 2023, the Department of State surged forward on implementing a lakehouse architecture to get faster, smarter, and more effective on cybersecurity log monitoring and incident response. In addition to getting us ahead of federal mandates, this approach promises to enable advanced analytics and machine learning across our highly federated global IT environment while minimizing costs associated with data retention and aggregation.

This talk will include a high-level overview of the technical and policy challenge and a technical deeper dive on the tactical implementation choices made. We’ll share lessons learned related to governance and securing organizational support, connecting between multiple cloud environments, and standardizing data to make it useful for analytics. And finally, we’ll discuss how the lakehouse leverages Databricks in multicloud environments to promote decentralized ownership of data while enabling strong, centralized data governance practices.

Talk by: Timothy Ahrens and Edward Moe

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Leveraging Data Science for Game Growth and Matchmaking Optimization

Leveraging Data Science for Game Growth and Matchmaking Optimization

2023-07-26 Watch
video
Shuo Chen (Databricks) , Zhenyu Zhao (Databricks)

For video games, Data Science solutions can be applied throughout players' lifecycle, from Adtech, LTV forecasting, In-game economic system monitoring to experimentation. Databricks is used as a data and computation foundation to power these data science solutions, enabling data scientists to easily develop and deploy these solutions for different use cases.

In this session, we will share insights on how Databricks-powered data science solutions drive game growth and improve player experiences using different advanced analytics, modeling, experimentation, and causal inference methods. We will introduce the business use cases, data science techniques, as well as Databricks demos.

Talk by: Zhenyu Zhao and Shuo Chen

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Security Best Practices and Tools to Build a Secure Lakehouse

Security Best Practices and Tools to Build a Secure Lakehouse

2023-07-26 Watch
video
Arun Pamulapati (Databricks) , Anindita Mahapatra (Databricks)

To learn more, visit the Databricks Security and Trust Center: https://www.databricks.com/trust

As you embark on a lakehouse project or evolve your existing data lake, you may want to improve your security posture and take advantage of new security features—there may even be a security team at your company that demands it. Databricks has worked with thousands of customers to securely deploy the Databricks Platform to meet their architecture and security requirements. While many organizations deploy security differently, we have found a common set of guidelines and features among organizations that require a high level of security. In this session, we will detail the security features and architectural choices frequently used by these organizations and walk through a series of threat models for the risks that most concern security teams. While this session is great for people who already know Databricks—don’t worry—that knowledge isn’t required. You will walk away with a full handbook detailing all the concepts, configurations, check lists, security analysis tool (SAT), and security reference architecture (SRA) automation scripts from the session so that you can make immediate progress when you get back to the office. Security can be hard, but we’ve collected the hard work already done by some of the best in the industry, and built tools, to make it easier. Come learn how. See how good looks like via a demo.

Talk by: Arun Pamulapati and Anindita Mahapatra

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: AccuWeather | Weather Matters: How to Harness its Power to Improve Your Bottom Line

Sponsored by: AccuWeather | Weather Matters: How to Harness its Power to Improve Your Bottom Line

2023-07-26 Watch
video

AccuWeather provides the world's most sophisticated weather intelligence to make lives and businesses simpler, safer, better, and more-informed. To achieve AccuWeather's core mission and obtain its technical and business ambitions, AccuWeather uses Databricks to support its next-generation forecasting engine and historical database, enabling their teams of scientists and engineers to develop accurate, scalable weather data solutions. Businesses can take advantage and truly harness the power of AccuWeather's data suite by easily integrating weather information into their own decision support systems and analytics to effectively improve their bottom line.

Talk by: Timothy Loftus and Eric Michielli

Here’s more to explore: A New Approach to Data Sharing: https://dbricks.co/44eUnT1

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: Infosys | Topaz AI First Innovations

Sponsored by: Infosys | Topaz AI First Innovations

2023-07-26 Watch
video
Neeraj Dixit (Infosys)

Insights into Infosys' Topaz AI First Innovations including AI-enabled Analytics and AI-enabled Automation to help clients in significant cost savings, improved efficiency and customer experience across industry segments.

Talk by: Neeraj Dixit

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: KPMG | Multicloud Enterprise Delta Sharing & Governance using Unity Catalog @ S&P Global

Sponsored: KPMG | Multicloud Enterprise Delta Sharing & Governance using Unity Catalog @ S&P Global

2023-07-26 Watch
video

Cloud technologies have revolutionized global data access across a number of industries. However, many enterprise organizations face challenges in adopting these technologies effectively, as comprehensive cloud data governance strategies and solutions are complex and evolving – particularly in hybrid or multicloud scenarios involving multiple third parties. KPMG and S&P Global have harnessed the power of Databricks Lakehouse to create a novel approach.

By integrating Unity Catalogue, Delta Sharing, and the KPMG Modern Data Platform, S&P Global has enabled scalable, transformative cross-enterprise data sharing and governance. This demonstration highlights a collaboration between S&P Global Sustainable1 (S1) ESG program and the KPMG ESG Analytics Accelerators to enable large-scale SFDR ESG portfolio analytics. Join us to discover our solution that drives transformative change, fosters data-driven decision-making, and bolsters sustainability efforts in a wide range of industries.

Talk by: Niels Hanson,Dennis Tally

Here’s more to explore: A New Approach to Data Sharing: https://dbricks.co/44eUnT1

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: Matillion | Using Matillion to Boost Productivity w/ Lakehouse and your Full Data Stack

Sponsored: Matillion | Using Matillion to Boost Productivity w/ Lakehouse and your Full Data Stack

2023-07-26 Watch
video
Rick Wear , Sarah Pollitt (Matillion)

In this presentation, Matillion’s Sarah Pollitt, Group Product Manager for ETL, will discuss how you can use Matillion to load data from popular data sources such as Salesforce, SAP, and over a hundred out-of-the-box connectors into your data lakehouse. You can quickly transform this data using powerful tools like Matillion or dbt, or your own custom notebooks, to derive valuable insights. She will also explore how you can run streaming pipelines to ensure real-time data processing, and how you can extract and manage this data using popular governance tools such as Alation or Collibra, ensuring compliance and data quality. Finally, Sarah will showcase how you can seamlessly integrate this data into your analytics tools of choice, such as Thoughtspot, PowerBI, or any other analytics tool that fits your organization's needs.

Talk by: Rick Wear

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: Snowplow | Revolutionize Your Customer Engagement Strategy w/ First-Party Customer Data

Sponsored: Snowplow | Revolutionize Your Customer Engagement Strategy w/ First-Party Customer Data

2023-07-26 Watch
video

In today's highly competitive market, personalized experiences are the key to winning customer engagement and loyalty. But how can you deliver these experiences at scale? The answer lies in a single unified view of your customers, powered by rich first-party customer data. With complete 360 visibility into your customer's journey, you can predict their next best action and deliver the most relevant experience based on their unique needs and behaviors.

Join this session to learn how to unlock the full potential of your first-party customer data by empowering your data team to collaborate seamlessly with your marketing team by removing technology barriers. Learn how to create a data-driven next-best action (NBA) strategy by building solutions that will set you apart in the competitive landscape and captivate your customers at every touchpoint. In this session, you'll discover: - The critical importance of personalized experiences in today's hyper-competitive market Proven strategies for building a data-driven NBA approach that drives results - See a live demo of how Snowplow and Databricks can be combined to produce powerful ML models for NBA revolutionizing your customer data strategy - Best practices for fostering strong collaboration between marketing and data teams to achieve business outcomes and deliver next-gen customer experiences

Don't miss out on this opportunity to unlock the full potential of your first-party customer data and revolutionize your customer engagement strategy.

Talk by: Yali Sassoon

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsor: Sigma Computing | Using Sigma Input Tables Helps Databricks Data Science & ML Apps

Sponsor: Sigma Computing | Using Sigma Input Tables Helps Databricks Data Science & ML Apps

2023-07-26 Watch
video

In this talk, we will dive into the powerful analytics combination of Databricks and Sigma for data science and machine learning use cases. Databricks offers a scalable and flexible platform for building and deploying machine learning models at scale, while Sigma enhances this framework with real-time data insights, analysis, and visualization capabilities. Going a step further, we will demonstrate how input tables can be utilized from Sigma to create seamless workflows in Databricks for data science and machine learning. From this workflow, business users can leverage data science and ML models to do ad-hoc analysis and make data-driven decisions.

This talk is perfect for data scientists, data analysts, business users, and anyone interested in harnessing the power of Databricks and Sigma to drive business value. Join us and discover how these two platforms can revolutionize the way you analyze and leverage your data.

Talk by: Mitch Ertle and Greg Owen

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Using Open Source Tools to Build Privacy-Conscious Data Systems

Using Open Source Tools to Build Privacy-Conscious Data Systems

2023-07-26 Watch
video

With the rapid proliferation of consumer data privacy laws across the world, it is becoming a strict requirement for data organizations to be mindful of data privacy risks. Privacy violation fines are reaching record highs and will only get higher as governments continue to crack down on the runaway abuse of user data. To continue producing value without becoming a liability, data systems must include privacy protections at a foundational level.

The most practical way to do this is to enable privacy as code, shifting privacy left and including it as a foundational part of the organization's software development life cycle. The promise of privacy as code is that data organizations can be liberated from inefficient, manual workflows for producing the compliance deliverables their legal teams need, and instead ship at speed with pre-defined privacy guardrails built into the structure of their preferred workflows.

Despite being an emerging and complex problem, there are already powerful open source tools available designed to help organizations of all sizes achieve this outcome. Fides is an open source privacy as code tool, written in Python and Typescript, that is engineered to tackle a variety of privacy problems throughout the application lifecycle. The most relevant feature for data organizations is the ability to annotate systems and their datasets with data privacy metadata, thus enabling automatic rejection of dangerous or illegal uses. Fides empowers data organizations to be proactive, not reactive, in terms of protecting user privacy and reducing organizational risk. Moving forward data privacy will need to be top of mind for data teams.

Talk by: Thomas La Piana

Here’s more to explore: Data, Analytics, and AI Governance: https://dbricks.co/44gu3YU

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Vector Data Lakes

Vector databases such as ElasticSearch and Pinecone offer fast ingestion and querying on vector embeddings with ANNs. However, they typically do not decouple compute and storage, making them hard to integrate in production data stacks. Because data storage in these databases is expensive and not easily accessible, data teams typically maintain ETL pipelines to offload historical embedding data to blob stores. When that data needs to be queried, they get loaded back into the vector database in another ETL process. This is reminiscent of loading data from OLTP database to cloud storage, then loading said data into an OLAP warehouse for offline analytics.

Recently, “lakehouse” offerings allow direct OLAP querying on cloud storage, removing the need for the second ETL step. The same could be done for embedding data. While embedding storage in blob stores cannot satisfy the high TPS requirements in online settings, we argue it’s sufficient for offline analytics use cases like slicing and dicing data based on embedding clusters. Instead of loading the embedding data back into the vector database for offline analytics, we propose direct processing on embeddings stored in Parquet files in Delta Lake. You will see that offline embedding workloads typically touch a large portion of the stored embeddings without the need for random access.

As a result, the workload is entirely bound by network throughput instead of latency, making it quite suitable for blob storage backends. On a test one billion vector dataset, ETL into cloud storage takes around one hour on a dedicated GPU instance, while batched nearest neighbor search can be done in under one minute with four CPU instances. We believe future “lakehouses” will ship with native support for these embedding workloads.

Talk by: Tony Wang and Chang She

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp Databricks named a Leader in 2022 Gartner® Magic QuadrantTM CDBMS: https://dbricks.co/3phw20d

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Building the Multi-Modal Future: Open Ecosystems and Data at Play

Building the Multi-Modal Future: Open Ecosystems and Data at Play

2023-07-26 Watch
video

Join Nathan as he explores Stability AI's latest advancements in open source generative AI, focused on building the multimodal information infrastructure of the future. Get an insider's perspective on our recently released model, Stable Diffusion XL v0.9, and Stability's behind-the-scenes efforts. Discover how advancements in open-source generative AI models enable efficient development of multimodal AI systems, and learn how researchers worldwide are customizing these models and leveraging unique datasets. 

Nathan will discuss the dynamic interplay between open-source models and enterprise AI adoption, resulting in efficient, tailored solutions. At Stability AI, our focus is on unlocking the inherent value and competitive advantage found in unique data and AI ownership. Combining open-source models with proprietary data assets creates a strategic advantage for enterprises.

Despite the growing AI trend, the need for human judgment and creativity remains pivotal. At Stability AI, our goal is to augment rather than replace human capabilities using AI collaboration and co-creation. Join us in shaping a collaborative generative future.

Talk by: Nathan Lile

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp Databricks named a Leader in 2022 Gartner® Magic QuadrantTM CDBMS: https://dbricks.co/3phw20d

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Delta Lake AMA

Delta Lake AMA

2023-07-26 Watch
video
Robert Pack (Databricks) , Bart Samwel (Databricks) , Allison Portis , Tathagata Das (Databricks)

Have some great questions about Delta Lake?  Well, come by and ask the experts your questions!

Talk by: Bart Samwel, Tathagata Das, Robert Pack, and Allison Portis

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Discuss How LLMs Will Change the Way We Work

Discuss How LLMs Will Change the Way We Work

2023-07-26 Watch
video
Ben Harvey , Sean Owen (Databricks) , Ankit Mathur (Databricks) , Debu Sinha , Jan van der Vegt

Will LLMs change the way we work?  Ask questions from a panel of LLM and AI experts on what problems LLMs will solve and its potential new challenges

Talk by: Ben Harvey, Jan van der Vegt, Ankit Mathur, Debu Sinha, and Sean Owen

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Foundation Models in the Modern Data Stack

Foundation Models in the Modern Data Stack

2023-07-26 Watch
video

As Foundation Models (FMs) continue to grow in size, innovations continue to push the boundaries of what these models can do on language and image tasks. This talk will describe our work on applying FMs to structured data tasks like data linkage, cleaning and querying. We will then discuss challenges and solutions that these models present for production deployment in the modern data stack.

Talk by: Ines Chami

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Going Beyond SQL: Python UDFs in Unity Catalog for All Your Lakehouse

Going Beyond SQL: Python UDFs in Unity Catalog for All Your Lakehouse

2023-07-26 Watch
video
Jakob Mund (Databricks)

While SQL is powerful, it does have some limits. Fear not, this lightning talk introduces user-defined functions (UDFs) written in Python, managed and governed in Databricks Unity Catalog, and usable across the Lakehouse. This covers the basics from how to create and govern UDFs to more advanced topics including networking, observability and provide a glimpse of how it works under the hood. After this session, you will be equipped to take SQL and the Lakehouse to the next level using Python UDFs.

Talk by: Jakob Mund

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

How Comcast Effectv Drives Data Observability with Databricks and Monte Carlo

How Comcast Effectv Drives Data Observability with Databricks and Monte Carlo

2023-07-26 Watch
video
Scott Lerner (Comcast Effectv) , Robinson Creighton (Comcast Effectv)

Comcast Effectv, the 2,000-employee advertising wing of Comcast, America’s largest telecommunications company, provides custom video ad solutions powered by aggregated viewership data. As a global technology and media company connecting millions of customers to personalized experiences and processing billions of transactions, Comcast Effectv was challenged with handling massive loads of data, monitoring hundreds of data pipelines, and managing timely coordination across data teams.

In this session, we will discuss Comcast Effectv’s journey to building a more scalable, reliable lakehouse and driving data observability at scale with Monte Carlo. This has enabled Effectv to have a single pane of glass view of their entire data environment to ensure consumer data trust across their entire AWS, Databricks, and Looker environment.

Talk by: Scott Lerner and Robinson Creighton

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

How to Create and Manage a High-Performance Analytics Team

How to Create and Manage a High-Performance Analytics Team

2023-07-26 Watch
video

Data science and analytics teams are unique. Large and small corporations want to build and manage analytics teams to convert their data and analytic assets into revenue and competitive advantage, but many are failing before they make their first hire. In this session, the audience will learn how to structure, hire, manage and grow an analytics team. Organizational structure, project and program portfolios, neurodiversity, developing talent, and more will be discussed.

Questions and discussion will be encouraged and engaged in. The audience will leave with a deeper understanding of how to succeed in turning data and analytics into tangible results.

Talk by: John Thompson

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

How We Made a Unified Talent Solution Using Databricks Machine Learning, Fine-Tuned LLM & Dolly 2.0

How We Made a Unified Talent Solution Using Databricks Machine Learning, Fine-Tuned LLM & Dolly 2.0

2023-07-26 Watch
video

Using Databricks, we built a “Unified Talent Solution” backed by a robust data and AI engine for analyzing skills of a combined pool of permanent employees, contractors, part-time employees and vendors, inferring skill gaps, future trends and recommended priority areas to bridge talent gaps, which ultimately greatly improved operational efficiency, transparency, commercial model, and talent experience of our client. We leveraged a variety of ML algorithms such as boosting, neural networks and NLP transformers to provide better AI-driven insights.

One inevitable part of developing these models within a typical DS workflow is iteration. Databricks' end-to-end ML/DS workflow service, MLflow, helped streamline this process by organizing them into experiments that tracked the data used for training/testing, model artifacts, lineage and the corresponding results/metrics. For checking the health of our models using drift detection, bias and explainability techniques, MLflow's deploying, and monitoring services were leveraged extensively.

Our solution built on Databricks platform, simplified ML by defining a data-centric workflow that unified best practices from DevOps, DataOps, and ModelOps. Databricks Feature Store allowed us to productionize our models and features jointly. Insights were done with visually appealing charts and graphs using PowerBI, plotly, matplotlib, that answer business questions most relevant to clients. We built our own advanced custom analytics platform on top of delta lake as Delta’s ACID guarantees allows us to build a real-time reporting app that displays consistent and reliable data - React (for front-end), Structured Streaming for ingesting data from Delta table with live query analytics on real time data ML predictions based on analytics data.

Talk by: Nitu Nivedita

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

LLM in Practice: How to Productionize Your LLMs

LLM in Practice: How to Productionize Your LLMs

2023-07-26 Watch
video
Sam Raymond (Databricks) , Conor Murphy , Colton Peltier , David Talby (John Snow Labs and Pacific AI) , Cheng Yin Eng

Ask questions from a panel of data science experts who have deployed LLMs and AI models into production.

Talk by: David Talby, Conor Murphy, Cheng Yin Eng, Sam Raymond, and Colton Peltier

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

PaLM 2: A Smaller, Faster and More Capable LLM

PaLM 2: A Smaller, Faster and More Capable LLM

2023-07-26 Watch
video

PaLM 2 is a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM. This improved efficiency enables broader deployment while also allowing the model to respond faster, for a more natural pace of interaction.

PaLM 2 demonstrates robust reasoning capabilities exemplified by large improvements over PaLM on BIG-Bench and other reasoning tasks. PaLM 2 exhibits stable performance on a suite of responsible AI evaluations, and enables inference-time control over toxicity without additional overhead or impact on other capabilities. Overall, PaLM 2 achieves state-of-the-art performance across a diverse set of tasks and capabilities.

Talk by: Andy Dai

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc