talk-data.com talk-data.com

Event

Databricks DATA + AI Summit 2023

2026-01-11 YouTube Visit website ↗

Activities tracked

287

Filtering by: AI/ML ×

Sessions & talks

Showing 101–125 of 287 · Newest first

Search within this event →
Sponsored: Snowplow | Revolutionize Your Customer Engagement Strategy w/ First-Party Customer Data

Sponsored: Snowplow | Revolutionize Your Customer Engagement Strategy w/ First-Party Customer Data

2023-07-26 Watch
video

In today's highly competitive market, personalized experiences are the key to winning customer engagement and loyalty. But how can you deliver these experiences at scale? The answer lies in a single unified view of your customers, powered by rich first-party customer data. With complete 360 visibility into your customer's journey, you can predict their next best action and deliver the most relevant experience based on their unique needs and behaviors.

Join this session to learn how to unlock the full potential of your first-party customer data by empowering your data team to collaborate seamlessly with your marketing team by removing technology barriers. Learn how to create a data-driven next-best action (NBA) strategy by building solutions that will set you apart in the competitive landscape and captivate your customers at every touchpoint. In this session, you'll discover: - The critical importance of personalized experiences in today's hyper-competitive market Proven strategies for building a data-driven NBA approach that drives results - See a live demo of how Snowplow and Databricks can be combined to produce powerful ML models for NBA revolutionizing your customer data strategy - Best practices for fostering strong collaboration between marketing and data teams to achieve business outcomes and deliver next-gen customer experiences

Don't miss out on this opportunity to unlock the full potential of your first-party customer data and revolutionize your customer engagement strategy.

Talk by: Yali Sassoon

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsor: Sigma Computing | Using Sigma Input Tables Helps Databricks Data Science & ML Apps

Sponsor: Sigma Computing | Using Sigma Input Tables Helps Databricks Data Science & ML Apps

2023-07-26 Watch
video

In this talk, we will dive into the powerful analytics combination of Databricks and Sigma for data science and machine learning use cases. Databricks offers a scalable and flexible platform for building and deploying machine learning models at scale, while Sigma enhances this framework with real-time data insights, analysis, and visualization capabilities. Going a step further, we will demonstrate how input tables can be utilized from Sigma to create seamless workflows in Databricks for data science and machine learning. From this workflow, business users can leverage data science and ML models to do ad-hoc analysis and make data-driven decisions.

This talk is perfect for data scientists, data analysts, business users, and anyone interested in harnessing the power of Databricks and Sigma to drive business value. Join us and discover how these two platforms can revolutionize the way you analyze and leverage your data.

Talk by: Mitch Ertle and Greg Owen

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Using Open Source Tools to Build Privacy-Conscious Data Systems

Using Open Source Tools to Build Privacy-Conscious Data Systems

2023-07-26 Watch
video

With the rapid proliferation of consumer data privacy laws across the world, it is becoming a strict requirement for data organizations to be mindful of data privacy risks. Privacy violation fines are reaching record highs and will only get higher as governments continue to crack down on the runaway abuse of user data. To continue producing value without becoming a liability, data systems must include privacy protections at a foundational level.

The most practical way to do this is to enable privacy as code, shifting privacy left and including it as a foundational part of the organization's software development life cycle. The promise of privacy as code is that data organizations can be liberated from inefficient, manual workflows for producing the compliance deliverables their legal teams need, and instead ship at speed with pre-defined privacy guardrails built into the structure of their preferred workflows.

Despite being an emerging and complex problem, there are already powerful open source tools available designed to help organizations of all sizes achieve this outcome. Fides is an open source privacy as code tool, written in Python and Typescript, that is engineered to tackle a variety of privacy problems throughout the application lifecycle. The most relevant feature for data organizations is the ability to annotate systems and their datasets with data privacy metadata, thus enabling automatic rejection of dangerous or illegal uses. Fides empowers data organizations to be proactive, not reactive, in terms of protecting user privacy and reducing organizational risk. Moving forward data privacy will need to be top of mind for data teams.

Talk by: Thomas La Piana

Here’s more to explore: Data, Analytics, and AI Governance: https://dbricks.co/44gu3YU

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Vector Data Lakes

Vector databases such as ElasticSearch and Pinecone offer fast ingestion and querying on vector embeddings with ANNs. However, they typically do not decouple compute and storage, making them hard to integrate in production data stacks. Because data storage in these databases is expensive and not easily accessible, data teams typically maintain ETL pipelines to offload historical embedding data to blob stores. When that data needs to be queried, they get loaded back into the vector database in another ETL process. This is reminiscent of loading data from OLTP database to cloud storage, then loading said data into an OLAP warehouse for offline analytics.

Recently, “lakehouse” offerings allow direct OLAP querying on cloud storage, removing the need for the second ETL step. The same could be done for embedding data. While embedding storage in blob stores cannot satisfy the high TPS requirements in online settings, we argue it’s sufficient for offline analytics use cases like slicing and dicing data based on embedding clusters. Instead of loading the embedding data back into the vector database for offline analytics, we propose direct processing on embeddings stored in Parquet files in Delta Lake. You will see that offline embedding workloads typically touch a large portion of the stored embeddings without the need for random access.

As a result, the workload is entirely bound by network throughput instead of latency, making it quite suitable for blob storage backends. On a test one billion vector dataset, ETL into cloud storage takes around one hour on a dedicated GPU instance, while batched nearest neighbor search can be done in under one minute with four CPU instances. We believe future “lakehouses” will ship with native support for these embedding workloads.

Talk by: Tony Wang and Chang She

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp Databricks named a Leader in 2022 Gartner® Magic QuadrantTM CDBMS: https://dbricks.co/3phw20d

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Building the Multi-Modal Future: Open Ecosystems and Data at Play

Building the Multi-Modal Future: Open Ecosystems and Data at Play

2023-07-26 Watch
video

Join Nathan as he explores Stability AI's latest advancements in open source generative AI, focused on building the multimodal information infrastructure of the future. Get an insider's perspective on our recently released model, Stable Diffusion XL v0.9, and Stability's behind-the-scenes efforts. Discover how advancements in open-source generative AI models enable efficient development of multimodal AI systems, and learn how researchers worldwide are customizing these models and leveraging unique datasets. 

Nathan will discuss the dynamic interplay between open-source models and enterprise AI adoption, resulting in efficient, tailored solutions. At Stability AI, our focus is on unlocking the inherent value and competitive advantage found in unique data and AI ownership. Combining open-source models with proprietary data assets creates a strategic advantage for enterprises.

Despite the growing AI trend, the need for human judgment and creativity remains pivotal. At Stability AI, our goal is to augment rather than replace human capabilities using AI collaboration and co-creation. Join us in shaping a collaborative generative future.

Talk by: Nathan Lile

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp Databricks named a Leader in 2022 Gartner® Magic QuadrantTM CDBMS: https://dbricks.co/3phw20d

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Discuss How LLMs Will Change the Way We Work

Discuss How LLMs Will Change the Way We Work

2023-07-26 Watch
video
Ben Harvey , Sean Owen (Databricks) , Ankit Mathur (Databricks) , Debu Sinha , Jan van der Vegt

Will LLMs change the way we work?  Ask questions from a panel of LLM and AI experts on what problems LLMs will solve and its potential new challenges

Talk by: Ben Harvey, Jan van der Vegt, Ankit Mathur, Debu Sinha, and Sean Owen

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

How to Create and Manage a High-Performance Analytics Team

How to Create and Manage a High-Performance Analytics Team

2023-07-26 Watch
video

Data science and analytics teams are unique. Large and small corporations want to build and manage analytics teams to convert their data and analytic assets into revenue and competitive advantage, but many are failing before they make their first hire. In this session, the audience will learn how to structure, hire, manage and grow an analytics team. Organizational structure, project and program portfolios, neurodiversity, developing talent, and more will be discussed.

Questions and discussion will be encouraged and engaged in. The audience will leave with a deeper understanding of how to succeed in turning data and analytics into tangible results.

Talk by: John Thompson

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

How We Made a Unified Talent Solution Using Databricks Machine Learning, Fine-Tuned LLM & Dolly 2.0

How We Made a Unified Talent Solution Using Databricks Machine Learning, Fine-Tuned LLM & Dolly 2.0

2023-07-26 Watch
video

Using Databricks, we built a “Unified Talent Solution” backed by a robust data and AI engine for analyzing skills of a combined pool of permanent employees, contractors, part-time employees and vendors, inferring skill gaps, future trends and recommended priority areas to bridge talent gaps, which ultimately greatly improved operational efficiency, transparency, commercial model, and talent experience of our client. We leveraged a variety of ML algorithms such as boosting, neural networks and NLP transformers to provide better AI-driven insights.

One inevitable part of developing these models within a typical DS workflow is iteration. Databricks' end-to-end ML/DS workflow service, MLflow, helped streamline this process by organizing them into experiments that tracked the data used for training/testing, model artifacts, lineage and the corresponding results/metrics. For checking the health of our models using drift detection, bias and explainability techniques, MLflow's deploying, and monitoring services were leveraged extensively.

Our solution built on Databricks platform, simplified ML by defining a data-centric workflow that unified best practices from DevOps, DataOps, and ModelOps. Databricks Feature Store allowed us to productionize our models and features jointly. Insights were done with visually appealing charts and graphs using PowerBI, plotly, matplotlib, that answer business questions most relevant to clients. We built our own advanced custom analytics platform on top of delta lake as Delta’s ACID guarantees allows us to build a real-time reporting app that displays consistent and reliable data - React (for front-end), Structured Streaming for ingesting data from Delta table with live query analytics on real time data ML predictions based on analytics data.

Talk by: Nitu Nivedita

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

LLM in Practice: How to Productionize Your LLMs

LLM in Practice: How to Productionize Your LLMs

2023-07-26 Watch
video
Sam Raymond (Databricks) , Conor Murphy , Colton Peltier , David Talby (John Snow Labs and Pacific AI) , Cheng Yin Eng

Ask questions from a panel of data science experts who have deployed LLMs and AI models into production.

Talk by: David Talby, Conor Murphy, Cheng Yin Eng, Sam Raymond, and Colton Peltier

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

PaLM 2: A Smaller, Faster and More Capable LLM

PaLM 2: A Smaller, Faster and More Capable LLM

2023-07-26 Watch
video

PaLM 2 is a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM. This improved efficiency enables broader deployment while also allowing the model to respond faster, for a more natural pace of interaction.

PaLM 2 demonstrates robust reasoning capabilities exemplified by large improvements over PaLM on BIG-Bench and other reasoning tasks. PaLM 2 exhibits stable performance on a suite of responsible AI evaluations, and enables inference-time control over toxicity without additional overhead or impact on other capabilities. Overall, PaLM 2 achieves state-of-the-art performance across a diverse set of tasks and capabilities.

Talk by: Andy Dai

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Perplexity: A Copilot for All Your Web Searches and Research

Perplexity: A Copilot for All Your Web Searches and Research

2023-07-26 Watch
video

In this demo, we will show you the fastest and functional answer engine and search copilot that exists right now: Perplexity.ai. It can solve a wide array of problems starting from giving you fast answers to any topic to planning trips and doing market research on things unfamiliar to you, all in a trustworthy way without hallucinations, providing you references in the form of citations. This is made possible by harnessing the power of LLMs along with retrieval augmented generation from traditional search engines and indexes.

We will also show you how information discovery can now be fully personalized to you: personalization through prompt engineering. Finally, we will see use cases of how this search copilot can help you in your day to day tasks in a data team: be it a data engineer, data scientist, or a data analyst.

Talk by: Aravind Srinivas

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: Accenture | Databricks Enables Employee Data Domain to Align People w/ Business Outcomes

Sponsored: Accenture | Databricks Enables Employee Data Domain to Align People w/ Business Outcomes

2023-07-26 Watch
video

A global franchise retailer was struggling to understand the value of its employees and had not fostered a data-driven enterprise. During the journey to use facts as the basis for decision making, Databricks became the facilitator of DataMesh and created the pipelines, analytics and source engine for a three-layer — bronze, silver, gold — lakehouse that supports the HR domain and drives the integration of multiple additional domains: sales, customer satisfaction, product quality and more. In this talk, we will walk through:

  • The business rationale and drivers
  • The core data sources
  • The data products, analytics and pipelines
  • The adoption of Unity Catalog for data privacy compliance /adherence and data management
  • Data quality metrics

Join us to see the analytic product and the design behind this innovative view of employees and their business outcomes.

Talk by: Rebecca Bucnis

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp Databricks named a Leader in 2022 Gartner® Magic QuadrantTM CDBMS: https://dbricks.co/3phw20d

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: Anomalo | Scaling Data Quality with Unsupervised Machine Learning Methods

Sponsored by: Anomalo | Scaling Data Quality with Unsupervised Machine Learning Methods

2023-07-26 Watch
video

The challenge is no longer how big, diverse, or distributed your data is. It's that you can't trust it. Companies are utilizing rules and metrics to monitor data quality, but they’re tedious to set up and maintain. We will present a set of fully unsupervised machine learning algorithms for monitoring data quality at scale, which requires no setup, catching unexpected issues and preventing alert fatigue by minimizing false positives. At the end of this talk, participants will be equipped with insight into unsupervised data quality monitoring, its advantages and limitations, and how it can help scale trust in your data.

Talk by: Vicky Andonova

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: Avanade | Enabling Real-Time Analytics with Structured Streaming and Delta Live Tables

Sponsored by: Avanade | Enabling Real-Time Analytics with Structured Streaming and Delta Live Tables

2023-07-26 Watch
video

Join the panel to hear how Avanade is helping clients enable real-time analytics and tackle the people and process problems that accompany technology, powered by Azure Databricks.

Talk by: Thomas Kim, Dael Williamson, Zoé Durand

Here’s more to explore: Data, Analytics, and AI Governance: https://dbricks.co/44gu3YU

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: Fivetran | Fivetran and Catalyst Enable Businesses & Solve Critical Market Challenges

Sponsored by: Fivetran | Fivetran and Catalyst Enable Businesses & Solve Critical Market Challenges

2023-07-26 Watch
video

Fivetran helps Enterprise and Commercial companies improve the efficiency of their data movement, infrastructure, and analysis by providing a secure, scalable platform for high-volume data movement. In this fireside chat, we will dive into the pain points that drove Catalyst, a cloud-based platform that helps software companies grow revenue with advanced insights and workflows that strengthen customer adoption, retention, expansion and advocacy, to begin their search for a partnership that would automate and simplify data management along with the pivotal success driven by the implementation of Fivetran and Databricks. 

Discover how together Fivetran and Databricks:

  • Deliver scalable, real-time analytics to customers with minimal configuration and centralize customer data into customer success tools.
  • Improve Catalyst’s visibility into customer health, opportunities, and risks across all teams.
  • Turn data into revenue-driving insights around digital customer behavior with improved targeting and Ai/ Machine learning.
  • Provide a robust and scalable data infrastructure that supports Catalyst’s growing data needs, with improvements in data availability, data quality, and overall efficiency in data operations.

Talk by: Edward Chiu and Lauren Schwartz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: Wipro | Personalized Price Transparency Using Generative AI

Sponsored by: Wipro | Personalized Price Transparency Using Generative AI

2023-07-26 Watch
video

Patients are increasingly taking an active role in managing their healthcare costs and are more likely to choose providers and treatments based on cost considerations. Learn how technology can help build cost-efficient care models across the healthcare continuum, delivering higher quality care while improving patient experience and operational efficiency.

Talk by: Janine Pratt

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

The Future of Data Sharing and Collaboration: A Perspective from Industry Leaders

The Future of Data Sharing and Collaboration: A Perspective from Industry Leaders

2023-07-26 Watch
video

More and more, organizations must exchange data with their customers, suppliers and partners. And yet, efficiency and immediate accessibility are equally important. To be truly data-driven, organizations need a better way to share data.

Join a panel of industry leaders from London Stock Exchange, Accuweather, Zoominfo and CoreLogic as they dive into the significance of open standards for data sharing and the game-changing impact of marketplaces that enable the exchange of not just data, but notebooks, dashboards, ML models, and applications. Discover how collaboration can break down walled-garden approaches and unlock limitless potential for innovation. Gain valuable insights into the future of data sharing and collaboration as the panelists share their experiences and successful strategies for effective data collaboration.

This session covers it all, from the role of technology in secure sharing to ethical considerations. Ask any questions that you might have. Don't wait to transform the future of your industry - register now and join the data-sharing and collaboration revolution.

Talk by: Jay Bhankharia, Sneh Kakileti, Naftali Cohen, Brian Battaglia, and Paul Lentz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Under the Hood: Intelligent Workload Management

Under the Hood: Intelligent Workload Management

2023-07-26 Watch
video
Priyam Dutta (Databricks)

Join this talk to learn from a senior staff engineer at Databricks how machine learning is leveraged to make Databricks SQL more responsive and efficient. This is a “bits and bytes” talk for those interested in knowing how our engine works.

Talk by: Priyam Dutta

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Unleashing Large Language Models with Databricks SQL's AI Functions

Unleashing Large Language Models with Databricks SQL's AI Functions

2023-07-26 Watch
video

This talk introduces AI Functions, a new feature in Databricks SQL that enables seamless integration of Large Language Models (LLMs) into SQL workflows. We illustrate how AI Functions simplifies the use of LLMs like OpenAI’s ChatGPT for tasks such as text classification, and bypassing the need for complex pipelines.

By demonstrating the setup and application of AI Functions, this shows how this tool democratizes AI and puts the power of LLMs directly into the hands of your data analysts and scientists. The talk concludes with a look towards the future of AI Functions and the exciting possibilities they unlock for businesses.

Talk by: Shitao Li and Yu Gong

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Using NLP to Evaluate 100 Million Global Webpages Daily to Contextually Target Consumers

Using NLP to Evaluate 100 Million Global Webpages Daily to Contextually Target Consumers

2023-07-26 Watch
video
Xuefu Wang , Mark Lee (Databricks)

This session will cover the challenges and the solution that The Trade Desk went through to scale their ML models for NLP for 100 million web pages per day.

TTD's contextual targeting team needs to analyze 100 million web pages per day. Fifty percent of the webpages are non-English. Half of the content was not being properly analyzed and targeted intelligently. TTD attempted to build a model using Spark NLP, however the package could not scale and was not cost-effective. GPU utilization was low and the solution was cost prohibitive. TTD engaged with Databricks in early 2022 to build an NLP model on Databricks. Our teams partnered closely together. We were able to build a solution using distributed inference (150-200 GPUs running at 80%+ utilization); Each day, Databricks translated two hundred times faster across 50 million web pages that are in for over 35 + languages and at a fraction of the cost. This solution enables TTD teams to standardize on English for contextual targeting ML models. TTD can now be a one-stop shop for their customers' global advertising needs.

The Trade Desk is headquartered in Ventura, California. It is the largest independent demand-side platform in the world, competing against Google, Facebook, and others. Unlike traditional marketing, programmatic marketing is operated by real-time, split-second decisions based on user identity, device information, and other data points. It enables highly personalized consumer experiences and improves return-on-investment for companies and advertisers.

Talk by: Xuefu Wang and Mark Lee

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

PII Detection at Scale on the Lakehouse

PII Detection at Scale on the Lakehouse

2023-07-25 Watch
video

SEEK is Australia’s largest online employment marketplace and a market leader spanning ten countries across Asia Pacific and Latin America. SEEK provides employment opportunities for roughly 16 million monthly active users and process 25 million candidate applications to listings. Processing millions of resumes involves handling and managing highly sensitive candidate information, usually inputted in a highly unstructured format. With recent high-profile data leaks in Australia, personally identifiable information (PII) protection has become a major focus area for large digital organizations.

The first step is detection, and SEEK has developed a custom framework built using HuggingFace transformers fine-tuned with nuances around employment. For example, “Software Engineer at Databricks” is not PII, but “CEO at Databricks” is PII. After identifying and anonymizing PII in stream and batch data, SEEK uses Unity Catalog’s data lineage to track PII through their reporting, ETL, and other downstream ML use-cases and govern access control achieving an organization-wide data management capability driven by deep learning and enforcement using Databricks.

Talk by: Ajmal Aziz and Rachael Straiton

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

What’s New with Data Sharing and Collaboration on the Lakehouse: From Delta Sharing to Clean Rooms

What’s New with Data Sharing and Collaboration on the Lakehouse: From Delta Sharing to Clean Rooms

2023-07-25 Watch
video

Get ready to accelerate your data and AI collaboration game with the Databricks product team. Join us as we build the next generation of secure data collaboration capabilities on the lakehouse. Whether you're just starting your data sharing journey or exploring advanced data collaboration features like data cleanrooms, this session is tailor-made for you.

In this demo-packed session, you'll discover what’s new in Delta Sharing including dynamic and materialized views for sharing, sharing other assets such as notebooks, ML models, new Delta Sharing open source connectors for the tools of your choice, and updates to Databricks cleanroom. Learn how lakehouse is the perfect solution for your data and AI collaboration requirements, across clouds, regions and platforms and without any vendor lock-in. Plus, you'll get a peek into our upcoming roadmap. Ask any burning questions you have for our expert product team as they build a collaborative lakehouse for data, analytics and AI.

Talk by: Erika Ehrli, Kelly Albano, and Xiaotong Sun

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Building & Managing a Data Platform for a Delta Lake Exceeding 13PB & 1000s of Users: AT&T's Story

Building & Managing a Data Platform for a Delta Lake Exceeding 13PB & 1000s of Users: AT&T's Story

2023-07-25 Watch
video

Data runs AT&T’s business, just like it runs most businesses these days. Data can lead to a greater understanding of a business and when translated correctly into information can provide human and business systems valuable insights to make better decisions. Unique to AT&T is the volume of data we support, how much of our work that is driven by AI and the scale at which data and AI drive value for our customers and stakeholders.

Our cloud migration journey includes making data and AI more accessible to employees throughout AT&T so they can use their deep business expertise to leverage data more easily and rapidly. We always had to balance this data democratization and desire for speed with keeping our data private and secure. We loved the open ecosystem model of Lakehouse that enables data, BI and ML tools to be seamlessly integrated on a single pane arena; it simplifies the architecture and reduces dependencies between technologies in the cloud. Being clear in our architecture guidelines and patterns was very important to us for our success.

We are seeing more interest from our business unit partners and continuing to build the AI capability AI as a service to support more citizen data scientists. To scale up our Lakehouse journey, we built a Databricks center of excellence (CoE) function in AT&T which today has approximately 1400+ active members, further concentrating existing expertise and resources in ML/AI discipline to collaborate on all things Databricks like technical support, trainings, FAQ’s and best practices to attain and sustain world-class performance and drive business value for AT&T. Join us to learn more about how we process and manage over 10 petabytes of our network Lakehouse with Delta Lake and Databricks.

Build Your Data Lakehouse with a Modern Data Stack on Databricks

Build Your Data Lakehouse with a Modern Data Stack on Databricks

2023-07-25 Watch
video
Pearl Ubaru (Databricks) , Ari Kaplan (Databricks)

Are you looking for an introduction to the Lakehouse and what the related technology is all about? This session is for you. This session explains the value that lakehouses bring to the table using examples of companies that are actually modernizing their data, showing demos throughout. The data lakehouse is the future for modern data teams that want to simplify data workloads, ease collaboration, and maintain the flexibility and openness to stay agile as a company scales.

Come to this session and learn about the full stack, including data engineering, data warehousing in a lakehouse, data streaming, governance, and data science and AI. Learn how you can create modern data solutions of your own.

Talk by: Ari Kaplan and Pearl Ubaru

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

How to Build LLMs on Your Company’s Data While on a Budget

How to Build LLMs on Your Company’s Data While on a Budget

2023-07-25 Watch
video
Sean Owen (Databricks)

Large Language Models (LLMs) are taking AI mainstream across companies and individuals. However, public LLMs are trained on general-purpose data. They do not include your own corporate data and they are black boxes on how they are trained. Because terminology is different for healthcare, financial, retail, digital-native and other industries, companies today are looking for industry-specific LLMs to better understand the terminology, context and knowledge that better suits their needs. In contrast to closed LLMs, open source-based models can be used for commercial usage or customized to suit an enterprise’s needs on their own data. Learn how Databricks makes it easy for you to build, tune and use custom models, including a deep dive into Dolly, the first open source, instruction-following LLM, fine-tuned on a human-generated instruction dataset licensed for research and commercial use.

In this session, you will:

  • See a real-life demo of creating your own LLMs specific to your industry
  • Learn how to securely train on your own documents if needed
  • Learn how Databricks makes it quick, scalable and inexpensive
  • Deep dive into Dolly and its applications

Talk by: Sean Owen

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc