Databricks DATA + AI Summit 2023

Demonstrate-Search-Predict: Composing Retrieval and Language Models for Knowledge-Intensive NLP

2023-07-26 Watch

video

Keshav Santhanam

AI/ML Databricks NLP

In this talk, you will learn about how retrieval-augmented in-context learning has emerged as a powerful approach for addressing knowledge intensive tasks using frozen language models (LM) and retrieval models (RM). Existing work has combined these in simple “retrieve-then-read” pipelines in which the RM retrieves passages that are inserted into the LM prompt.

To begin to fully realize the potential of frozen LMs and RMs, we propose Demonstrate–Search–Predict (DSP), a framework that relies on passing natural language texts in sophisticated pipelines between an LM and an RM. DSP can express high-level programs that bootstrap pipeline-aware demonstrations, search for relevant passages, and generate grounded predictions, systematically breaking down problems into small transformations that the LM and RM can handle more reliably.

We have written novel DSP programs for answering questions in open-domain, multi-hop, and conversational settings, establishing in early evaluations new state-of-the-art in-context learning results and delivering 37–125%, 8–40%, and 80–290% relative gains against vanilla LMs, a standard retrieve-then-read pipeline, and a contemporaneous self-ask pipeline, respectively.

Talk by: Keshav Santhanam

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp Databricks named a Leader in 2022 Gartner® Magic QuadrantTM CDBMS: https://dbricks.co/3phw20d

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Future Data Access Control: Booz Allen Hamilton’s Way of Securing Databricks Lakehouse with Immuta

2023-07-26 Watch

video

Jeffrey Hess

Big Data Data Governance Data Lakehouse Databricks

In this talk, I’ll review how we utilize Attribute-Based Access Control (ABAC) to enforce policy via Immuta. I’ll discuss the differences between the ABAC and legacy Role-Based Access Control (RBAC) approaches to control access and how the RBAC approach is not sufficient to keep up with today’s growing big data market. With so much data available, there also comes substantial risk. Data can contain many sensitive data elements, including PII and PHI. Industry leaders like Databricks are pushing the boundaries of data technology, which leads to constantly evolving data use cases. And that’s a good thing. However, the RBAC approach is struggling to keep up with those advancements.

So what is RBAC? It’s an approach to data access that permits system access based on the end-user’s role. For legacy systems, it’s meant as a simple but effective approach to securing data. Are you a manager? Then you’ll get access to data meant for managers. This is great for small deployments with clearly defined roles. Here at Booz Allen, we invested in Databricks because we have an environment of over 30 thousand users and billions of rows of data.

To mitigate this problem and align with our forward-thinking company standard, we introduced Immuta into our stack. Immuta uses ABAC to allow for dynamic data access control. Users are automatically assigned certain attributes, and access is based on those attributes instead of just their role. This allows for more flexibility and allows data access control to easily scale without the need to constantly map a user to their role. Using attributes, we can write policies in one place and have them applied across all our data platforms. This makes for a truly holistic data governance approach and provides immediate ROI and time savings for the company.

Talk by: Jeffrey Hess

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

How to Train Your Own Large Language Models

2023-07-26 Watch

video

Reza Shabani

Databricks LLM MLOps

Given the success of OpenAI’s GPT-4 and Google’s PaLM, every company is now assessing its own use cases for Large Language Models (LLMs). Many companies will ultimately decide to train their own LLMs for a variety of reasons, ranging from data privacy to increased control over updates and improvements. One of the most common reasons will be to make use of proprietary internal data.

In this session, we’ll go over how to train your own LLMs, from raw data to deployment in a user-facing production environment. We’ll discuss the engineering challenges, and the vendors that make up the modern LLM stack: Databricks, Hugging Face, and MosaicML. We’ll also break down what it means to train an LLM using your own data, including the various approaches and their associated tradeoffs.

Topics covered in this session: - How Replit trained a state-of-the-art LLM from scratch - The different approaches to using LLMs with your internal data - The differences between fine-tuning, instruction tuning, and RLHF

Talk by: Reza Shabani

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

JoinBoost: In Data Base Machine Learning for Tree-Models

2023-07-26 Watch

video

Zachary Huang

AI/ML Cloud Computing Databricks Python

Data and machine learning (ML) are crucial for enterprise operations. Enterprises store data in databases for management and use ML to gain business insights. However, there is a mismatch between the way ML expects data to be organized (a single table) and the way data is organized in databases (a join graph of multiple tables) and leads to inefficiencies when joining and materializing tables.

In this talk, you will see how we successfully address this issue. We introduce JoinBoost, a lightweight python library that trains tree models (such as random forests and gradient boosting) for join graphs in databases. JoinBoost acts as a query rewriting layer that is compatible with cloud databases, and eliminates the need for costly join materialization.

Talk by: Zachary Huang

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp Databricks named a Leader in 2022 Gartner® Magic QuadrantTM CDBMS: https://dbricks.co/3phw20d

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Lakehouse Architecture to Advance Security Analytics at the Department of State

2023-07-26 Watch

video

Edward Moe , Timothy Ahrens

AI/ML Analytics Cloud Computing Data Governance Data Lakehouse Databricks

In 2023, the Department of State surged forward on implementing a lakehouse architecture to get faster, smarter, and more effective on cybersecurity log monitoring and incident response. In addition to getting us ahead of federal mandates, this approach promises to enable advanced analytics and machine learning across our highly federated global IT environment while minimizing costs associated with data retention and aggregation.

This talk will include a high-level overview of the technical and policy challenge and a technical deeper dive on the tactical implementation choices made. We’ll share lessons learned related to governance and securing organizational support, connecting between multiple cloud environments, and standardizing data to make it useful for analytics. And finally, we’ll discuss how the lakehouse leverages Databricks in multicloud environments to promote decentralized ownership of data while enabling strong, centralized data governance practices.

Talk by: Timothy Ahrens and Edward Moe

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Leveraging Data Science for Game Growth and Matchmaking Optimization

2023-07-26 Watch

video

Shuo Chen (Databricks) , Zhenyu Zhao (Databricks)

Analytics Data Science Databricks

For video games, Data Science solutions can be applied throughout players' lifecycle, from Adtech, LTV forecasting, In-game economic system monitoring to experimentation. Databricks is used as a data and computation foundation to power these data science solutions, enabling data scientists to easily develop and deploy these solutions for different use cases.

In this session, we will share insights on how Databricks-powered data science solutions drive game growth and improve player experiences using different advanced analytics, modeling, experimentation, and causal inference methods. We will introduce the business use cases, data science techniques, as well as Databricks demos.

Talk by: Zhenyu Zhao and Shuo Chen

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Security Best Practices and Tools to Build a Secure Lakehouse

2023-07-26 Watch

video

Arun Pamulapati (Databricks) , Anindita Mahapatra (Databricks)

Data Lake Data Lakehouse Databricks Cyber Security

To learn more, visit the Databricks Security and Trust Center: https://www.databricks.com/trust

As you embark on a lakehouse project or evolve your existing data lake, you may want to improve your security posture and take advantage of new security features—there may even be a security team at your company that demands it. Databricks has worked with thousands of customers to securely deploy the Databricks Platform to meet their architecture and security requirements. While many organizations deploy security differently, we have found a common set of guidelines and features among organizations that require a high level of security. In this session, we will detail the security features and architectural choices frequently used by these organizations and walk through a series of threat models for the risks that most concern security teams. While this session is great for people who already know Databricks—don’t worry—that knowledge isn’t required. You will walk away with a full handbook detailing all the concepts, configurations, check lists, security analysis tool (SAT), and security reference architecture (SRA) automation scripts from the session so that you can make immediate progress when you get back to the office. Security can be hard, but we’ve collected the hard work already done by some of the best in the industry, and built tools, to make it easier. Come learn how. See how good looks like via a demo.

Talk by: Arun Pamulapati and Anindita Mahapatra

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: AccuWeather | Weather Matters: How to Harness its Power to Improve Your Bottom Line

Sponsor: Sigma Computing | Using Sigma Input Tables Helps Databricks Data Science & ML Apps

2023-07-26 Watch

video

Mitch Ertle (Sigma) , Greg Owen

AI/ML Analytics Data Science Databricks

In this talk, we will dive into the powerful analytics combination of Databricks and Sigma for data science and machine learning use cases. Databricks offers a scalable and flexible platform for building and deploying machine learning models at scale, while Sigma enhances this framework with real-time data insights, analysis, and visualization capabilities. Going a step further, we will demonstrate how input tables can be utilized from Sigma to create seamless workflows in Databricks for data science and machine learning. From this workflow, business users can leverage data science and ML models to do ad-hoc analysis and make data-driven decisions.

This talk is perfect for data scientists, data analysts, business users, and anyone interested in harnessing the power of Databricks and Sigma to drive business value. Join us and discover how these two platforms can revolutionize the way you analyze and leverage your data.

Talk by: Mitch Ertle and Greg Owen

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Using Open Source Tools to Build Privacy-Conscious Data Systems

2023-07-26 Watch

video

Thomas La Piana

AI/ML Analytics Databricks Python TypeScript

With the rapid proliferation of consumer data privacy laws across the world, it is becoming a strict requirement for data organizations to be mindful of data privacy risks. Privacy violation fines are reaching record highs and will only get higher as governments continue to crack down on the runaway abuse of user data. To continue producing value without becoming a liability, data systems must include privacy protections at a foundational level.

The most practical way to do this is to enable privacy as code, shifting privacy left and including it as a foundational part of the organization's software development life cycle. The promise of privacy as code is that data organizations can be liberated from inefficient, manual workflows for producing the compliance deliverables their legal teams need, and instead ship at speed with pre-defined privacy guardrails built into the structure of their preferred workflows.

Despite being an emerging and complex problem, there are already powerful open source tools available designed to help organizations of all sizes achieve this outcome. Fides is an open source privacy as code tool, written in Python and Typescript, that is engineered to tackle a variety of privacy problems throughout the application lifecycle. The most relevant feature for data organizations is the ability to annotate systems and their datasets with data privacy metadata, thus enabling automatic rejection of dangerous or illegal uses. Fides empowers data organizations to be proactive, not reactive, in terms of protecting user privacy and reducing organizational risk. Moving forward data privacy will need to be top of mind for data teams.

Talk by: Thomas La Piana

Here’s more to explore: Data, Analytics, and AI Governance: https://dbricks.co/44gu3YU

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Vector Data Lakes

2023-07-26 Watch

video

Chang She , Tony Wang

AI/ML Analytics Cloud Computing Cloud Storage Data Lakehouse Databricks

Vector databases such as ElasticSearch and Pinecone offer fast ingestion and querying on vector embeddings with ANNs. However, they typically do not decouple compute and storage, making them hard to integrate in production data stacks. Because data storage in these databases is expensive and not easily accessible, data teams typically maintain ETL pipelines to offload historical embedding data to blob stores. When that data needs to be queried, they get loaded back into the vector database in another ETL process. This is reminiscent of loading data from OLTP database to cloud storage, then loading said data into an OLAP warehouse for offline analytics.

Recently, “lakehouse” offerings allow direct OLAP querying on cloud storage, removing the need for the second ETL step. The same could be done for embedding data. While embedding storage in blob stores cannot satisfy the high TPS requirements in online settings, we argue it’s sufficient for offline analytics use cases like slicing and dicing data based on embedding clusters. Instead of loading the embedding data back into the vector database for offline analytics, we propose direct processing on embeddings stored in Parquet files in Delta Lake. You will see that offline embedding workloads typically touch a large portion of the stored embeddings without the need for random access.

As a result, the workload is entirely bound by network throughput instead of latency, making it quite suitable for blob storage backends. On a test one billion vector dataset, ETL into cloud storage takes around one hour on a dedicated GPU instance, while batched nearest neighbor search can be done in under one minute with four CPU instances. We believe future “lakehouses” will ship with native support for these embedding workloads.

Talk by: Tony Wang and Chang She

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp Databricks named a Leader in 2022 Gartner® Magic QuadrantTM CDBMS: https://dbricks.co/3phw20d

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Building the Multi-Modal Future: Open Ecosystems and Data at Play

2023-07-26 Watch

video

Nathan Lile

AI/ML Databricks GenAI

Join Nathan as he explores Stability AI's latest advancements in open source generative AI, focused on building the multimodal information infrastructure of the future. Get an insider's perspective on our recently released model, Stable Diffusion XL v0.9, and Stability's behind-the-scenes efforts. Discover how advancements in open-source generative AI models enable efficient development of multimodal AI systems, and learn how researchers worldwide are customizing these models and leveraging unique datasets.

Nathan will discuss the dynamic interplay between open-source models and enterprise AI adoption, resulting in efficient, tailored solutions. At Stability AI, our focus is on unlocking the inherent value and competitive advantage found in unique data and AI ownership. Combining open-source models with proprietary data assets creates a strategic advantage for enterprises.

Despite the growing AI trend, the need for human judgment and creativity remains pivotal. At Stability AI, our goal is to augment rather than replace human capabilities using AI collaboration and co-creation. Join us in shaping a collaborative generative future.

Talk by: Nathan Lile

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp Databricks named a Leader in 2022 Gartner® Magic QuadrantTM CDBMS: https://dbricks.co/3phw20d

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Delta Lake AMA

2023-07-26 Watch

video

Robert Pack (Databricks) , Bart Samwel (Databricks) , Allison Portis , Tathagata Das (Databricks)

Databricks Delta

Have some great questions about Delta Lake? Well, come by and ask the experts your questions!

Talk by: Bart Samwel, Tathagata Das, Robert Pack, and Allison Portis

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Discuss How LLMs Will Change the Way We Work

2023-07-26 Watch

video

Ben Harvey , Sean Owen (Databricks) , Ankit Mathur (Databricks) , Debu Sinha , Jan van der Vegt

AI/ML Databricks LLM MLOps

Will LLMs change the way we work? Ask questions from a panel of LLM and AI experts on what problems LLMs will solve and its potential new challenges

Talk by: Ben Harvey, Jan van der Vegt, Ankit Mathur, Debu Sinha, and Sean Owen

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Foundation Models in the Modern Data Stack

2023-07-26 Watch

video

Ines Chami

Databricks LLM Modern Data Stack MLOps

As Foundation Models (FMs) continue to grow in size, innovations continue to push the boundaries of what these models can do on language and image tasks. This talk will describe our work on applying FMs to structured data tasks like data linkage, cleaning and querying. We will then discuss challenges and solutions that these models present for production deployment in the modern data stack.

Talk by: Ines Chami

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Going Beyond SQL: Python UDFs in Unity Catalog for All Your Lakehouse

2023-07-26 Watch

video

Jakob Mund (Databricks)

Data Lakehouse Databricks Python SQL

While SQL is powerful, it does have some limits. Fear not, this lightning talk introduces user-defined functions (UDFs) written in Python, managed and governed in Databricks Unity Catalog, and usable across the Lakehouse. This covers the basics from how to create and govern UDFs to more advanced topics including networking, observability and provide a glimpse of how it works under the hood. After this session, you will be equipped to take SQL and the Lakehouse to the next level using Python UDFs.

Talk by: Jakob Mund

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

How Comcast Effectv Drives Data Observability with Databricks and Monte Carlo

2023-07-26 Watch

video

Scott Lerner (Comcast Effectv) , Robinson Creighton (Comcast Effectv)

AWS Data Lakehouse Databricks Looker Monte Carlo

Comcast Effectv, the 2,000-employee advertising wing of Comcast, America’s largest telecommunications company, provides custom video ad solutions powered by aggregated viewership data. As a global technology and media company connecting millions of customers to personalized experiences and processing billions of transactions, Comcast Effectv was challenged with handling massive loads of data, monitoring hundreds of data pipelines, and managing timely coordination across data teams.

In this session, we will discuss Comcast Effectv’s journey to building a more scalable, reliable lakehouse and driving data observability at scale with Monte Carlo. This has enabled Effectv to have a single pane of glass view of their entire data environment to ensure consumer data trust across their entire AWS, Databricks, and Looker environment.

Talk by: Scott Lerner and Robinson Creighton

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

How to Create and Manage a High-Performance Analytics Team

2023-07-26 Watch

video

John Thompson

AI/ML Analytics Data Lakehouse Data Science Databricks

Data science and analytics teams are unique. Large and small corporations want to build and manage analytics teams to convert their data and analytic assets into revenue and competitive advantage, but many are failing before they make their first hire. In this session, the audience will learn how to structure, hire, manage and grow an analytics team. Organizational structure, project and program portfolios, neurodiversity, developing talent, and more will be discussed.

Questions and discussion will be encouraged and engaged in. The audience will leave with a deeper understanding of how to succeed in turning data and analytics into tangible results.

Talk by: John Thompson

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

How We Made a Unified Talent Solution Using Databricks Machine Learning, Fine-Tuned LLM & Dolly 2.0

2023-07-26 Watch

video

Nitu Nivedita

AI/ML Analytics Databricks DataOps Delta DevOps

Using Databricks, we built a “Unified Talent Solution” backed by a robust data and AI engine for analyzing skills of a combined pool of permanent employees, contractors, part-time employees and vendors, inferring skill gaps, future trends and recommended priority areas to bridge talent gaps, which ultimately greatly improved operational efficiency, transparency, commercial model, and talent experience of our client. We leveraged a variety of ML algorithms such as boosting, neural networks and NLP transformers to provide better AI-driven insights.

One inevitable part of developing these models within a typical DS workflow is iteration. Databricks' end-to-end ML/DS workflow service, MLflow, helped streamline this process by organizing them into experiments that tracked the data used for training/testing, model artifacts, lineage and the corresponding results/metrics. For checking the health of our models using drift detection, bias and explainability techniques, MLflow's deploying, and monitoring services were leveraged extensively.

Our solution built on Databricks platform, simplified ML by defining a data-centric workflow that unified best practices from DevOps, DataOps, and ModelOps. Databricks Feature Store allowed us to productionize our models and features jointly. Insights were done with visually appealing charts and graphs using PowerBI, plotly, matplotlib, that answer business questions most relevant to clients. We built our own advanced custom analytics platform on top of delta lake as Delta’s ACID guarantees allows us to build a real-time reporting app that displays consistent and reliable data - React (for front-end), Structured Streaming for ingesting data from Delta table with live query analytics on real time data ML predictions based on analytics data.

Talk by: Nitu Nivedita

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

LLM in Practice: How to Productionize Your LLMs

2023-07-26 Watch

video

Sam Raymond (Databricks) , Conor Murphy , Colton Peltier , David Talby (John Snow Labs and Pacific AI) , Cheng Yin Eng

AI/ML Data Science Databricks LLM

Ask questions from a panel of data science experts who have deployed LLMs and AI models into production.

Talk by: David Talby, Conor Murphy, Cheng Yin Eng, Sam Raymond, and Colton Peltier

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

PaLM 2: A Smaller, Faster and More Capable LLM

2023-07-26 Watch

video

Andy Dai

AI/ML Databricks LLM MLOps

PaLM 2 is a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM. This improved efficiency enables broader deployment while also allowing the model to respond faster, for a more natural pace of interaction.

PaLM 2 demonstrates robust reasoning capabilities exemplified by large improvements over PaLM on BIG-Bench and other reasoning tasks. PaLM 2 exhibits stable performance on a suite of responsible AI evaluations, and enables inference-time control over toxicity without additional overhead or impact on other capabilities. Overall, PaLM 2 achieves state-of-the-art performance across a diverse set of tasks and capabilities.

Talk by: Andy Dai

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

talk-data.com

Databricks DATA + AI Summit 2023

Top Topics

Top Speakers

Demonstrate-Search-Predict: Composing Retrieval and Language Models for Knowledge-Intensive NLP

Future Data Access Control: Booz Allen Hamilton’s Way of Securing Databricks Lakehouse with Immuta

How to Train Your Own Large Language Models

JoinBoost: In Data Base Machine Learning for Tree-Models

Lakehouse Architecture to Advance Security Analytics at the Department of State

Leveraging Data Science for Game Growth and Matchmaking Optimization

Security Best Practices and Tools to Build a Secure Lakehouse

Sponsored by: AccuWeather | Weather Matters: How to Harness its Power to Improve Your Bottom Line

Sponsored by: Infosys | Topaz AI First Innovations

Sponsored: KPMG | Multicloud Enterprise Delta Sharing & Governance using Unity Catalog @ S&P Global

Sponsored: Matillion | Using Matillion to Boost Productivity w/ Lakehouse and your Full Data Stack

Sponsored: Snowplow | Revolutionize Your Customer Engagement Strategy w/ First-Party Customer Data

Sponsor: Sigma Computing | Using Sigma Input Tables Helps Databricks Data Science & ML Apps

Using Open Source Tools to Build Privacy-Conscious Data Systems

Vector Data Lakes

Building the Multi-Modal Future: Open Ecosystems and Data at Play

Delta Lake AMA

Discuss How LLMs Will Change the Way We Work

Foundation Models in the Modern Data Stack

Going Beyond SQL: Python UDFs in Unity Catalog for All Your Lakehouse

How Comcast Effectv Drives Data Observability with Databricks and Monte Carlo

How to Create and Manage a High-Performance Analytics Team

How We Made a Unified Talent Solution Using Databricks Machine Learning, Fine-Tuned LLM & Dolly 2.0

LLM in Practice: How to Productionize Your LLMs

PaLM 2: A Smaller, Faster and More Capable LLM