talk-data.com talk-data.com

Topic

Databricks

big_data analytics spark

1286

tagged

Activity Trend

515 peak/qtr
2020-Q1 2026-Q1

Activities

1286 activities · Newest first

Security Best Practices and Tools to Build a Secure Lakehouse

To learn more, visit the Databricks Security and Trust Center: https://www.databricks.com/trust

As you embark on a lakehouse project or evolve your existing data lake, you may want to improve your security posture and take advantage of new security features—there may even be a security team at your company that demands it. Databricks has worked with thousands of customers to securely deploy the Databricks Platform to meet their architecture and security requirements. While many organizations deploy security differently, we have found a common set of guidelines and features among organizations that require a high level of security. In this session, we will detail the security features and architectural choices frequently used by these organizations and walk through a series of threat models for the risks that most concern security teams. While this session is great for people who already know Databricks—don’t worry—that knowledge isn’t required. You will walk away with a full handbook detailing all the concepts, configurations, check lists, security analysis tool (SAT), and security reference architecture (SRA) automation scripts from the session so that you can make immediate progress when you get back to the office. Security can be hard, but we’ve collected the hard work already done by some of the best in the industry, and built tools, to make it easier. Come learn how. See how good looks like via a demo.

Talk by: Arun Pamulapati and Anindita Mahapatra

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: AccuWeather | Weather Matters: How to Harness its Power to Improve Your Bottom Line

AccuWeather provides the world's most sophisticated weather intelligence to make lives and businesses simpler, safer, better, and more-informed. To achieve AccuWeather's core mission and obtain its technical and business ambitions, AccuWeather uses Databricks to support its next-generation forecasting engine and historical database, enabling their teams of scientists and engineers to develop accurate, scalable weather data solutions. Businesses can take advantage and truly harness the power of AccuWeather's data suite by easily integrating weather information into their own decision support systems and analytics to effectively improve their bottom line.

Talk by: Timothy Loftus and Eric Michielli

Here’s more to explore: A New Approach to Data Sharing: https://dbricks.co/44eUnT1

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: Infosys | Topaz AI First Innovations

Insights into Infosys' Topaz AI First Innovations including AI-enabled Analytics and AI-enabled Automation to help clients in significant cost savings, improved efficiency and customer experience across industry segments.

Talk by: Neeraj Dixit

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: KPMG | Multicloud Enterprise Delta Sharing & Governance using Unity Catalog @ S&P Global

Cloud technologies have revolutionized global data access across a number of industries. However, many enterprise organizations face challenges in adopting these technologies effectively, as comprehensive cloud data governance strategies and solutions are complex and evolving – particularly in hybrid or multicloud scenarios involving multiple third parties. KPMG and S&P Global have harnessed the power of Databricks Lakehouse to create a novel approach.

By integrating Unity Catalogue, Delta Sharing, and the KPMG Modern Data Platform, S&P Global has enabled scalable, transformative cross-enterprise data sharing and governance. This demonstration highlights a collaboration between S&P Global Sustainable1 (S1) ESG program and the KPMG ESG Analytics Accelerators to enable large-scale SFDR ESG portfolio analytics. Join us to discover our solution that drives transformative change, fosters data-driven decision-making, and bolsters sustainability efforts in a wide range of industries.

Talk by: Niels Hanson,Dennis Tally

Here’s more to explore: A New Approach to Data Sharing: https://dbricks.co/44eUnT1

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: Matillion | Using Matillion to Boost Productivity w/ Lakehouse and your Full Data Stack

In this presentation, Matillion’s Sarah Pollitt, Group Product Manager for ETL, will discuss how you can use Matillion to load data from popular data sources such as Salesforce, SAP, and over a hundred out-of-the-box connectors into your data lakehouse. You can quickly transform this data using powerful tools like Matillion or dbt, or your own custom notebooks, to derive valuable insights. She will also explore how you can run streaming pipelines to ensure real-time data processing, and how you can extract and manage this data using popular governance tools such as Alation or Collibra, ensuring compliance and data quality. Finally, Sarah will showcase how you can seamlessly integrate this data into your analytics tools of choice, such as Thoughtspot, PowerBI, or any other analytics tool that fits your organization's needs.

Talk by: Rick Wear

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: Snowplow | Revolutionize Your Customer Engagement Strategy w/ First-Party Customer Data

In today's highly competitive market, personalized experiences are the key to winning customer engagement and loyalty. But how can you deliver these experiences at scale? The answer lies in a single unified view of your customers, powered by rich first-party customer data. With complete 360 visibility into your customer's journey, you can predict their next best action and deliver the most relevant experience based on their unique needs and behaviors.

Join this session to learn how to unlock the full potential of your first-party customer data by empowering your data team to collaborate seamlessly with your marketing team by removing technology barriers. Learn how to create a data-driven next-best action (NBA) strategy by building solutions that will set you apart in the competitive landscape and captivate your customers at every touchpoint. In this session, you'll discover: - The critical importance of personalized experiences in today's hyper-competitive market Proven strategies for building a data-driven NBA approach that drives results - See a live demo of how Snowplow and Databricks can be combined to produce powerful ML models for NBA revolutionizing your customer data strategy - Best practices for fostering strong collaboration between marketing and data teams to achieve business outcomes and deliver next-gen customer experiences

Don't miss out on this opportunity to unlock the full potential of your first-party customer data and revolutionize your customer engagement strategy.

Talk by: Yali Sassoon

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsor: Sigma Computing | Using Sigma Input Tables Helps Databricks Data Science & ML Apps

In this talk, we will dive into the powerful analytics combination of Databricks and Sigma for data science and machine learning use cases. Databricks offers a scalable and flexible platform for building and deploying machine learning models at scale, while Sigma enhances this framework with real-time data insights, analysis, and visualization capabilities. Going a step further, we will demonstrate how input tables can be utilized from Sigma to create seamless workflows in Databricks for data science and machine learning. From this workflow, business users can leverage data science and ML models to do ad-hoc analysis and make data-driven decisions.

This talk is perfect for data scientists, data analysts, business users, and anyone interested in harnessing the power of Databricks and Sigma to drive business value. Join us and discover how these two platforms can revolutionize the way you analyze and leverage your data.

Talk by: Mitch Ertle and Greg Owen

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Using Open Source Tools to Build Privacy-Conscious Data Systems

With the rapid proliferation of consumer data privacy laws across the world, it is becoming a strict requirement for data organizations to be mindful of data privacy risks. Privacy violation fines are reaching record highs and will only get higher as governments continue to crack down on the runaway abuse of user data. To continue producing value without becoming a liability, data systems must include privacy protections at a foundational level.

The most practical way to do this is to enable privacy as code, shifting privacy left and including it as a foundational part of the organization's software development life cycle. The promise of privacy as code is that data organizations can be liberated from inefficient, manual workflows for producing the compliance deliverables their legal teams need, and instead ship at speed with pre-defined privacy guardrails built into the structure of their preferred workflows.

Despite being an emerging and complex problem, there are already powerful open source tools available designed to help organizations of all sizes achieve this outcome. Fides is an open source privacy as code tool, written in Python and Typescript, that is engineered to tackle a variety of privacy problems throughout the application lifecycle. The most relevant feature for data organizations is the ability to annotate systems and their datasets with data privacy metadata, thus enabling automatic rejection of dangerous or illegal uses. Fides empowers data organizations to be proactive, not reactive, in terms of protecting user privacy and reducing organizational risk. Moving forward data privacy will need to be top of mind for data teams.

Talk by: Thomas La Piana

Here’s more to explore: Data, Analytics, and AI Governance: https://dbricks.co/44gu3YU

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Vector Data Lakes

Vector databases such as ElasticSearch and Pinecone offer fast ingestion and querying on vector embeddings with ANNs. However, they typically do not decouple compute and storage, making them hard to integrate in production data stacks. Because data storage in these databases is expensive and not easily accessible, data teams typically maintain ETL pipelines to offload historical embedding data to blob stores. When that data needs to be queried, they get loaded back into the vector database in another ETL process. This is reminiscent of loading data from OLTP database to cloud storage, then loading said data into an OLAP warehouse for offline analytics.

Recently, “lakehouse” offerings allow direct OLAP querying on cloud storage, removing the need for the second ETL step. The same could be done for embedding data. While embedding storage in blob stores cannot satisfy the high TPS requirements in online settings, we argue it’s sufficient for offline analytics use cases like slicing and dicing data based on embedding clusters. Instead of loading the embedding data back into the vector database for offline analytics, we propose direct processing on embeddings stored in Parquet files in Delta Lake. You will see that offline embedding workloads typically touch a large portion of the stored embeddings without the need for random access.

As a result, the workload is entirely bound by network throughput instead of latency, making it quite suitable for blob storage backends. On a test one billion vector dataset, ETL into cloud storage takes around one hour on a dedicated GPU instance, while batched nearest neighbor search can be done in under one minute with four CPU instances. We believe future “lakehouses” will ship with native support for these embedding workloads.

Talk by: Tony Wang and Chang She

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp Databricks named a Leader in 2022 Gartner® Magic QuadrantTM CDBMS: https://dbricks.co/3phw20d

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Building the Multi-Modal Future: Open Ecosystems and Data at Play

Join Nathan as he explores Stability AI's latest advancements in open source generative AI, focused on building the multimodal information infrastructure of the future. Get an insider's perspective on our recently released model, Stable Diffusion XL v0.9, and Stability's behind-the-scenes efforts. Discover how advancements in open-source generative AI models enable efficient development of multimodal AI systems, and learn how researchers worldwide are customizing these models and leveraging unique datasets. 

Nathan will discuss the dynamic interplay between open-source models and enterprise AI adoption, resulting in efficient, tailored solutions. At Stability AI, our focus is on unlocking the inherent value and competitive advantage found in unique data and AI ownership. Combining open-source models with proprietary data assets creates a strategic advantage for enterprises.

Despite the growing AI trend, the need for human judgment and creativity remains pivotal. At Stability AI, our goal is to augment rather than replace human capabilities using AI collaboration and co-creation. Join us in shaping a collaborative generative future.

Talk by: Nathan Lile

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp Databricks named a Leader in 2022 Gartner® Magic QuadrantTM CDBMS: https://dbricks.co/3phw20d

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Delta Lake AMA

Have some great questions about Delta Lake?  Well, come by and ask the experts your questions!

Talk by: Bart Samwel, Tathagata Das, Robert Pack, and Allison Portis

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Discuss How LLMs Will Change the Way We Work

Will LLMs change the way we work?  Ask questions from a panel of LLM and AI experts on what problems LLMs will solve and its potential new challenges

Talk by: Ben Harvey, Jan van der Vegt, Ankit Mathur, Debu Sinha, and Sean Owen

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Foundation Models in the Modern Data Stack

As Foundation Models (FMs) continue to grow in size, innovations continue to push the boundaries of what these models can do on language and image tasks. This talk will describe our work on applying FMs to structured data tasks like data linkage, cleaning and querying. We will then discuss challenges and solutions that these models present for production deployment in the modern data stack.

Talk by: Ines Chami

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Going Beyond SQL: Python UDFs in Unity Catalog for All Your Lakehouse

While SQL is powerful, it does have some limits. Fear not, this lightning talk introduces user-defined functions (UDFs) written in Python, managed and governed in Databricks Unity Catalog, and usable across the Lakehouse. This covers the basics from how to create and govern UDFs to more advanced topics including networking, observability and provide a glimpse of how it works under the hood. After this session, you will be equipped to take SQL and the Lakehouse to the next level using Python UDFs.

Talk by: Jakob Mund

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

How Comcast Effectv Drives Data Observability with Databricks and Monte Carlo

Comcast Effectv, the 2,000-employee advertising wing of Comcast, America’s largest telecommunications company, provides custom video ad solutions powered by aggregated viewership data. As a global technology and media company connecting millions of customers to personalized experiences and processing billions of transactions, Comcast Effectv was challenged with handling massive loads of data, monitoring hundreds of data pipelines, and managing timely coordination across data teams.

In this session, we will discuss Comcast Effectv’s journey to building a more scalable, reliable lakehouse and driving data observability at scale with Monte Carlo. This has enabled Effectv to have a single pane of glass view of their entire data environment to ensure consumer data trust across their entire AWS, Databricks, and Looker environment.

Talk by: Scott Lerner and Robinson Creighton

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

How to Create and Manage a High-Performance Analytics Team

Data science and analytics teams are unique. Large and small corporations want to build and manage analytics teams to convert their data and analytic assets into revenue and competitive advantage, but many are failing before they make their first hire. In this session, the audience will learn how to structure, hire, manage and grow an analytics team. Organizational structure, project and program portfolios, neurodiversity, developing talent, and more will be discussed.

Questions and discussion will be encouraged and engaged in. The audience will leave with a deeper understanding of how to succeed in turning data and analytics into tangible results.

Talk by: John Thompson

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

How We Made a Unified Talent Solution Using Databricks Machine Learning, Fine-Tuned LLM & Dolly 2.0

Using Databricks, we built a “Unified Talent Solution” backed by a robust data and AI engine for analyzing skills of a combined pool of permanent employees, contractors, part-time employees and vendors, inferring skill gaps, future trends and recommended priority areas to bridge talent gaps, which ultimately greatly improved operational efficiency, transparency, commercial model, and talent experience of our client. We leveraged a variety of ML algorithms such as boosting, neural networks and NLP transformers to provide better AI-driven insights.

One inevitable part of developing these models within a typical DS workflow is iteration. Databricks' end-to-end ML/DS workflow service, MLflow, helped streamline this process by organizing them into experiments that tracked the data used for training/testing, model artifacts, lineage and the corresponding results/metrics. For checking the health of our models using drift detection, bias and explainability techniques, MLflow's deploying, and monitoring services were leveraged extensively.

Our solution built on Databricks platform, simplified ML by defining a data-centric workflow that unified best practices from DevOps, DataOps, and ModelOps. Databricks Feature Store allowed us to productionize our models and features jointly. Insights were done with visually appealing charts and graphs using PowerBI, plotly, matplotlib, that answer business questions most relevant to clients. We built our own advanced custom analytics platform on top of delta lake as Delta’s ACID guarantees allows us to build a real-time reporting app that displays consistent and reliable data - React (for front-end), Structured Streaming for ingesting data from Delta table with live query analytics on real time data ML predictions based on analytics data.

Talk by: Nitu Nivedita

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

LLM in Practice: How to Productionize Your LLMs

Ask questions from a panel of data science experts who have deployed LLMs and AI models into production.

Talk by: David Talby, Conor Murphy, Cheng Yin Eng, Sam Raymond, and Colton Peltier

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

PaLM 2: A Smaller, Faster and More Capable LLM

PaLM 2 is a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM. This improved efficiency enables broader deployment while also allowing the model to respond faster, for a more natural pace of interaction.

PaLM 2 demonstrates robust reasoning capabilities exemplified by large improvements over PaLM on BIG-Bench and other reasoning tasks. PaLM 2 exhibits stable performance on a suite of responsible AI evaluations, and enables inference-time control over toxicity without additional overhead or impact on other capabilities. Overall, PaLM 2 achieves state-of-the-art performance across a diverse set of tasks and capabilities.

Talk by: Andy Dai

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Perplexity: A Copilot for All Your Web Searches and Research

In this demo, we will show you the fastest and functional answer engine and search copilot that exists right now: Perplexity.ai. It can solve a wide array of problems starting from giving you fast answers to any topic to planning trips and doing market research on things unfamiliar to you, all in a trustworthy way without hallucinations, providing you references in the form of citations. This is made possible by harnessing the power of LLMs along with retrieval augmented generation from traditional search engines and indexes.

We will also show you how information discovery can now be fully personalized to you: personalization through prompt engineering. Finally, we will see use cases of how this search copilot can help you in your day to day tasks in a data team: be it a data engineer, data scientist, or a data analyst.

Talk by: Aravind Srinivas

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc