talk-data.com talk-data.com

Topic

Databricks

big_data analytics spark

561

tagged

Activity Trend

515 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: Databricks DATA + AI Summit 2023 ×
Photon for Dummies: How Does this New Execution Engine Actually Work?

Did you finish the Photon whitepaper and think, wait, what? I know I did; it’s my job to understand it, explain it, and then use it. If your role involves using Apache Spark™ on Databricks, then you need to know about Photon and where to use it. Join me, chief dummy, nay "supreme" dummy, as I break down this whitepaper into easy to understand explanations that don’t require a computer science degree. Together we will unravel mysteries such as:

  • Why is a Java Virtual Machine the current bottleneck for Spark enhancements?
  • What does vectorized even mean? And how was it done before?
  • Why is the relationship status between Spark and Photon "complicated?"

In this session, we’ll start with the basics of Apache Spark, the details we pretend to know, and where those performance cracks are starting to show through. Only then will we start to look at Photon, how it’s different, where the clever design choices are and how you can make the most of this in your own workloads. I’ve spent over 50 hours going over the paper in excruciating detail; every reference, and in some instances, the references of the references so that you don’t have to.

Talk by: Holly Smith

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Top Mistakes to Avoid in Streaming Applications

Are you a data engineer seeking to enhance the performance of your streaming applications? Join our session where we will share valuable insights and best practices gained from handling diverse customer streaming use cases using Apache Spark™ Structured Streaming.

In this session, we will delve into the common pitfalls that can hinder your streaming workflows. Learn practical tips and techniques to overcome these challenges during different stages of application development. By avoiding these errors, you can unlock faster performance, improved data reliability, and smoother data processing.

Don't miss out on this opportunity to level up your streaming skills and excel in your data engineering journey. Join us to gain valuable knowledge and practical techniques that will empower you to optimize your streaming applications and drive exceptional results.

Talk by: Vikas Reddy Aravabhumi

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

The English SDK for Apache Spark™

In the fast-paced world of data science and AI, we will explore how large language models (LLMs) can elevate the development process of Apache Spark applications.

We'll demonstrate how LLMs can simplify SQL query creation, data ingestion, and DataFrame transformations, leading to faster development and clearer code that's easier to review and understand. We'll also show how LLMs can assist in creating visualizations and clarifying data insights, making complex data easy to understand.

Furthermore, we'll discuss how LLMs can be used to create user-defined data sources and functions, offering a higher level of adaptability in Apache Spark applications.

Our session, filled with practical examples, highlights the innovative role of LLMs in the realm of Apache Spark development. We invite you to join us in this exploration of how these advanced language models can drive innovation and boost efficiency in the sphere of data science and AI.

Talk by: Gengliang Wang and Allison Wang

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Learn How to Reliably Monitor Your Data and Model Quality in the Lakehouse

Developing and upkeep of production data engineering and machine learning pipelines is a challenging process for many data teams. Even more challenging is monitoring the quality of your data and models once they go into production. Building upon untrustworthy data can cause many complications for data teams. Without a monitoring service, it is challenging to proactively discover when your ML models degrade over time, and the root causes behind it. Furthermore, with a lack of lineage tracking, it is even more painful to debug errors in your models and data. Databricks Lakehouse Monitoring offers a unified service to monitor the quality of all your data and ML assets.

In this session, you’ll learn how to:

  • Use one unified tool to monitor the quality of any data product: data or AI 
  • Quickly diagnose errors in your data products with root cause analysis
  • Set up a monitor with low friction, requiring only a button click or a single API call to start and automatically generate out-of-the-box metrics
  • Enable self-serve experiences for data analysts by providing reliability status for every data asset

Talk by: Kasey Uhlenhuth and Alkis Polyzotis

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Advancements in Open Source LLM Tooling, Including MLflow

MLflow is one of the most used open source machine learning frameworks with over 13 million monthly downloads. With the recent advancements in generative AI, MLflow has been rapidly integrating support for a lot of the popular AI tools being used such as Hugging Face, LangChain, and OpenAI. This means that it’s becoming easier than ever to build AI pipelines with your data as the foundation, yet expanding your capabilities with the incredible advancements of the AI community.

Come to this session to learn how MLflow can help you:

  • Easily grab open source models from Hugging Face and use Transformers pipelines in MLflow
  • Integrate LangChain for more advanced services and to add context into your model pipelines
  • Bring in OpenAI APIs as part of your pipelines
  • Quickly track and deploy models on the lakehouse using MLflow

Talk by: Corey Zumar and Ben Wilson

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

What’s New in Databricks Workflows -- With Live Demos

Databricks Workflows provides unified orchestration for the Lakehouse. Since it was first announced last year, thousands of organizations have been leveraging Workflows for orchestrating lakehouse workloads such as ETL, BI dashboard refresh and ML model training.

In this session, the Workflows product team will cover and demo the latest features and capabilities of Databricks Workflows in the areas of workflow authoring, observability and more. This session will also include an outlook for future innovations you can expect to see in the coming months.

Talk by: Muhammad Bilal Aslam

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Databricks Asset Bundles: A Standard, Unified Approach to Deploying Data Products on Databricks

In this session, we will introduce Databricks Asset Bundles, provide a demonstration of how they work for a variety of data products, and how to fit them into an overall CICD strategy for the well-architected Lakehouse.

Data teams produce a variety of assets; datasets, reports and dashboards, ML models, and business applications. These assets depend upon code (notebooks, repos, queries, pipelines), infrastructure (clusters, SQL warehouses, serverless endpoints), and supporting services/resources like Unity Catalog, Databricks Workflows, and DBSQL dashboards. Today, each organization must figure out a deployment strategy for the variety of data products they build on Databricks as there is no consistent way to describe the infrastructure and services associated with project code.

Databricks Asset Bundles is a new capability on Databricks that standardizes and unifies the deployment strategy for all data products developed on the platform. It allows developers to describe the infrastructure and resources of their project through a YAML configuration file, regardless of whether they are producing a report, dashboard, online ML model, or Delta Live Tables pipeline. Behind the scenes, these configuration files use Terraform to manage resources in a Databricks workspace, but knowledge of Terraform is not required to use Databricks Asset Bundles.

Talk by: Rafi Kurlansik and Pieter Noordhuis

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

A Technical Deep Dive into Unity Catalog's Practitioner Playbook

Get ready to take a deep dive into Unity Catalog and explore how it can simplify data, analytics and AI governance across multiple clouds. In this session, take a deep dive into Unity Catalog and the expert Databricks team will guide you through a hands-on demo, showcasing the latest features and best practices for data governance. You'll learn how to master Unity Catalog and gain a practical understanding of how it can streamline your analytics and AI initiatives. Whether you're migrating from Hive Metastore or just looking to expand your knowledge of Unity Catalog, this session is for you. Join us for a practical, hands-on deep dive into Unity Catalog and learn how to achieve seamless data governance while following best practices for data, analytics and AI governance.

Talk by: Zeashan Pappa and Ifigeneia Derekli

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin

Introduction to Data Engineering on the Lakehouse

Data engineering is a requirement for any data, analytics or AI workload. With the increased complexity of data pipelines, the need to handle real-time streaming data and the challenges of orchestrating reliable pipelines, data engineers require the best tools to help them achieve their goals. The Databricks Lakehouse Platform offers a unified platform to ingest, transform and orchestrate data and simplifies the task of building reliable ETL pipelines.

This session will provide an introductory overview of the end-to-end data engineering capabilities of the platform, including Delta Live Tables and Databricks Workflows. We’ll see how these capabilities come together to provide a complete data engineering solution and how they are used in the real world by organizations leveraging the lakehouse turning raw data into insights.

Talk by: Jibreal Hamenoo and Ori Zohar

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Introduction to Data Streaming on the Lakehouse

Streaming is the future of all data pipelines and applications. It enables businesses to make data-driven decisions sooner and react faster, develop data-driven applications considered previously impossible, and deliver new and differentiated experiences to customers. However, many organizations have not realized the promise of streaming to its full potential because it requires them to completely redevelop their data pipelines and applications on new, complex, proprietary, and disjointed technology stacks.

The Databricks Lakehouse Platform is a simple, unified, and open platform that supports all streaming workloads ranging from ingestion, ETL to event processing, event-driven application, and ML inference. In this session, we will discuss the streaming capabilities of the Databricks Lakehouse Platform and demonstrate how easy it is to build end-to-end, scalable streaming pipelines and applications, to fulfill the promise of streaming for your business.

Talk by: Zoe Durand and Yue Zhang

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

LLMOps: Everything You Need to Know to Manage LLMs

With the recent surge in popularity of ChatGPT and other LLMs such as Dolly, many people are going to start training, tuning, and deploying their own custom models to solve their domain-specific challenges. When training and tuning these models, there are certain considerations that need to be accounted for in the MLOps process that differ from traditional machine learning. Come watch this session where you’ll gain a better understanding of what to look out for when starting to enter the world of applying LLMs in your domain.

In this session, you’ll learn about:

  • Grabbing foundational models and fine-tuning them
  • Optimizing resource management such as GPUs
  • Integrating human feedback and reinforcement learning to improve model performance
  • Different evaluation methods for LLMs

Talk by: Joseph Bradley and Eric Peter

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Delta Live Tables A to Z: Best Practices for Modern Data Pipelines

Join Databricks' Distinguished Principal Engineer Michael Armbrust for a technical deep dive into how Delta Live Tables (DLT) reduces the complexity of data transformation and ETL. Learn what’s new; what’s coming; and how to easily master the ins-and-outs of DLT.

Michael will describe and demonstrate:

  • What’s new in Delta Live Tables (DLT) - Enzyme, Enhanced Autoscaling, and more
  • How to easily create and maintain your DLT pipelines
  • How to monitor pipeline operations
  • How to optimize data for analytics and ML
  • Sneak Peek into the DLT roadmap

Talk by: Michael Armbrust

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

What’s New in Unity Catalog -- With Live Demos

Join the Unity Catalog product team and dive into the cutting-edge world of data, analytics and AI governance. With Unity Catalog’s unified governance solution for data, analytics, and AI on any cloud, you’ll discover the latest and greatest enhancements we’re shipping, including fine-grained governance with row/column filtering, new enhancements with automated data lineage and governance for ML assets.

In this demo-packed session, You’ll learn how new capabilities in Unity Catalog can further simplify your data governance and accelerated analytics and AI initiatives. Plus, get an exclusive sneak peek at our upcoming roadmap. And don’t forget, you’ll have the chance to ask the product teams themselves any burning questions you have about the best governance solution for the lakehouse. Don’t miss out on this exciting opportunity to level up your data game with Unity Catalog.

Talk by: Paul Roome

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Live from the Lakehouse: AI governance, Unity Catalog, Ethics in AI, and Industry Perspectives

Hear from three guests. First, Matei Zaharia (co-founder and Chief Technologist, Databricks) on AI governance and Unity Catalog. Second guest, Scott Starbird (General Counsel, Public Affairs and Strategic Partnerships, Databricks) on Ethics in AI. Third guest, Bryan Saftler (Industry Solutions Marketing Director, Databricks) on industry perspectives and solution accelerators. Hosted by Ari Kaplan (Head of Evangelism, Databricks) and Pearl Ubaru (Sr Technical Marketing Engineer, Databricks)

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Live from the Lakehouse: Data sharing, Databricks marketplace, and Fivetran & cloud data platforms

Hear from two guests. First, Zaheera Valani (Sr Director, Engineering at Databricks) on data sharing and Databricks marketplace. Second guest, Taylor Brown (COO and co-founder, Fivetran), discusses cloud data platforms and automating data pulling from thousands of disparate data sources - how Fivetran and Databricks partner. Hosted by Holly Smith (Sr Resident Solutions Architect, Databricks) and Jimmy Obeyeni (Strategic Account Executive, Databricks)

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Live from the Lakehouse: Day 1 wrap-up with Ari Kaplan & Pearl Ubaru, & interviews with attendees

Day 1 wrap-up of all the exciting happenings at the Data & AI Summit by Databricks, and hear directly from a variety of attendees on their thoughts of the day. Hosted by Ari Kaplan (Head of Evangelism, Databricks) and Pearl Ubaru (Sr Technical Marketing Engineer, Databricks)

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Live from the Lakehouse: Day 2 pre-show sideline reporting, from the Data & AI Summit by Databricks

With 75k attendees (and 12k in person at the sold-out show), Day 2 of the conference is kicked off by co-hosts Holly Smith (Sr Resident Solutions Architect, Databricks) and Jimmy Obeyeni (Strategic Account Executive, Databricks). Hear their take on Day 1 of the conference, the state of data and AI, Databricks, and what to expect for the excitement and buzz of Day 2.

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Live from the Lakehouse: Developer relations, generative AI, and conference wrap-up

Hear from two guests: Mary Grace Moesta and Sam Raymond (both Sr Data Scientists at Databricks) on developer relations, and generative AI. Plus the co-hosts wrap up the entire conference with all the exciting happenings at the Data & AI Summit by Databricks. Hosted by Holly Smith (Sr Resident Solutions Architect, Databricks) and Jimmy Obeyeni (Strategic Account Executive, Databricks)

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Live from the Lakehouse: Ethics in AI with Adi Polak & gaining from open source with Vini Jaiswal

Hear from two guests. First, Adi Polak (VP of Developer Experience, Treeverse, and author of #1 new release - Scaling ML with Spark) on how AI helps us be more productive. Second guest, Vini Jaiswal (Principal Developer Advocate, ByteDance) on gaining with the open source community, overcoming scalability challenges, and taking innovation to the next stage. Hosted by Pearl Ubaru (Sr Technical Marketing Engineer, Databricks)

Live from the Lakehouse: industry outlook from Simon Whiteley & AI policy from Matteo Quattrocchi

Hear from two guests. First, Simon Whiteley (co-owner, Advancing Analytics) on his reaction to industry announcements, where he sees the industry heading, and an introduction to his community at Advancing Analytics. Second guest, Matteo Quattrocchi (Director - Policy, EMEA at BSA | The Software Alliance) on the current state of AI policies - by international governments, global committees, and individual companies.. Hosted by Ari Kaplan (Head of Evangelism, Databricks) and Pearl Ubaru (Sr Technical Marketing Engineer, Databricks)