Topic

Pandas

data_manipulation data_analysis python

Activities

3

tagged

Activity Trend

17 peak/qtr

2020-Q1 2026-Q2

Top Events

O'Reilly Data Science Books 72 Data Engineering Podcast 11 O'Reilly Data Visualization Books 10 O'Reilly Data Engineering Books 9 Databricks DATA + AI Summit 2023 8 PyConDE & PyData Berlin 2023 8 SciPy 2025 8 Data + AI Summit 2025 3 The Joe Reis Show 3 PyData Paris 2024 3 DataTopics: All Things Data, AI & Tech 2 April 5-6: FREE 2-Day Deep Learning Fundamentals NVIDIA DLI Certification Course 2

Top Speakers

Tobias Macey 11 Wes McKinney (Posit) 6 Dr. Yasin Ceran (KAIST) 4 Joe Reis (DeepLearning.AI) 3 Michael Heydt 3 Patrick Hoefler 3 Stefanie Molin 3 Thomas Joseph 3 Antonio Rueda-Toicen (Hasso Plattner Institute) 3 Fabio Nelli 3 Robert Thas John 3 Marco Gorelli (Narwhals) 3

Activities

Showing filtered results

All Video Podcast Book

Filtering by: Data + AI Summit 2025 ×

No-Code Change in Your Python UDF for Arrow Optimization

2025-06-10 · Data + AI Summit 2025 Watch

lightning_talk

by Hyukjin Kwon (Databricks)

API Arrow Python Spark

Apache Spark™ has introduced Arrow-optimized APIs such as Pandas UDFs and the Pandas Functions API, providing high performance for Python workloads. Yet, many users continue to rely on regular Python UDFs due to their simple interface, especially when advanced Python expertise is not readily available. This talk introduces a powerful new feature in Apache Spark that brings Arrow optimization to regular Python UDFs. With this enhancement, users can leverage performance gains without modifying their existing UDFs — simply by enabling a configuration setting or toggling a UDF-level parameter. Additionally, we will dive into practical tips and features for using Arrow-optimized Python UDFs effectively, exploring their strengths and limitations. Whether you’re a Spark beginner or an experienced user, this session will allow you to achieve the best of both simplicity and performance in your workflows with regular Python UDFs.

Data Preparation for Machine Learning

2025-06-09 · Data + AI Summit 2025

talk

AI/ML DataViz Databricks Matplotlib PySpark Python Scikit-learn

In this course, you’ll learn the fundamentals of preparing data for machine learning using Databricks. We’ll cover topics like exploring, cleaning, and organizing data tailored for traditional machine learning applications. We’ll also cover data visualization, feature engineering, and optimal feature storage strategies. By building a strong foundation in data preparation, this course equips you with the essential skills to create high-quality datasets that can power accurate and reliable machine learning and AI models. Whether you're developing predictive models or enabling downstream AI applications, these capabilities are critical for delivering impactful, data-driven solutions. Pre-requisites: Familiarity with Databricks workspace, notebooks, as well as Unity Catalog. An intermediate level knowledge of Python (scikit-learn, Matplotlib), Pandas, and PySpark. As well as with concepts of exploratory data analysis, feature engineering, standardization, and imputation methods). Labs: Yes Certification Path: Databricks Certified Machine Learning Associate

Machine Learning at Scale

2025-06-09 · Data + AI Summit 2025

talk

AI/ML API Databricks Python Spark

The course intends to equip professional-level machine learning practitioners with knowledge and hands-on experience in utilizing Apache Spark™ for machine learning purposes, including model fine-tuning. Additionally, the course covers using the Pandas library for scalable machine learning tasks. The initial section of the course focuses on comprehending the fundamentals of Apache Spark™ along with its machine learning capabilities. Subsequently, the second section delves into fine-tuning models using the hyperopt library. The final segment involves learning the implementation of the Pandas API within Apache Spark™, encompassing guidance on Pandas UDFs (User-Defined Functions) and the Functions API for model inference. Pre-requisites: Familiarity with Databricks workspace and notebooks; knowledge of machine learning model development and deployment with MLflow (e.g. basic understanding of DS/ML concepts, common model metrics and python libraries as well as a basic understanding of scaling workloads with Spark) Labs: Yes Certification Path: Databricks Certified Machine Learning Professional