Topic

PySpark

big_data distributed_computing python

Activities

2

tagged

Activity Trend

14 peak/qtr

2020-Q1 2026-Q1

Top Events

O'Reilly Data Engineering Books 19 Databricks DATA + AI Summit 2023 16 Data + AI Summit 2025 13 Data Engineering Podcast 4 O'Reilly Data Science Books 2 PyData Berlin 2025 2 PyData Cardiff - July 2025 1 From a Fintech lens: MCP server live-coding & feature selection data hacks 1 dbt Coalesce 2025 1 PyData Seattle 2025 1 PyConDE & PyData Berlin 2023 1 SciPy 2025 1

Top Speakers

Tobias Macey 4 Marco Gorelli (Narwhals) 3 Denny Lee (Databricks) 3 Pramod Singh 3 Sundar Krishnan 2 Tomasz Drabas 2 Raju Kumar Mishra 2 Allison Wang (Databricks) 2 Ramcharan Kakarla 2 Xiao Li (Databricks) 2 Stuart Moncada (Google Cloud) 1 Benjamin Bengfort 1

Activities

Showing filtered results

All Video Podcast Book

Filtering by: Tomasz Drabas ×

PySpark Cookbook

2018-06-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Denny Lee (Databricks) , Tomasz Drabas

AI/ML Analytics Big Data Cloud Computing Python Spark Data Streaming apache-spark data data-engineering

Dive into the world of big data processing and analytics with the "PySpark Cookbook". This book provides over 60 hands-on recipes for implementing efficient data-intensive solutions using Apache Spark and Python. By mastering these recipes, you'll be equipped to tackle challenges in large-scale data processing, machine learning, and stream analytics. What this Book will help me do Set up and configure PySpark environments effectively, including working with Jupyter for enhanced interactivity. Understand and utilize DataFrames for data manipulation, analysis, and transformation tasks. Develop end-to-end machine learning solutions using the ML and MLlib modules in PySpark. Implement structured streaming and graph-processing solutions to analyze and visualize data streams and relationships. Deploy PySpark applications to the cloud infrastructure efficiently using best practices. Author(s) This book is co-authored by None Lee and None Drabas, who are experienced professionals in data processing and analytics leveraging Python and Apache Spark. With their deep technical expertise and a passion for teaching through practical examples, they aim to make the complex concepts of PySpark accessible to developers of varied experience levels. Who is it for? This book is ideal for Python developers who are keen to delve into the Apache Spark ecosystem. Whether you're just starting with big data or have some experience with Spark, this book provides practical recipes to enhance your skills. Readers looking to solve real-world data-intensive challenges using PySpark will find this resource invaluable.

Learning PySpark

2017-02-27 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Denny Lee (Databricks) , Tomasz Drabas

AI/ML Big Data Cloud Computing Data Engineering Python Spark Data Streaming apache-spark data data-engineering

"Learning PySpark" guides you through mastering the integration of Python with Apache Spark to build scalable and efficient data applications. You'll delve into Spark 2.0's architecture, efficiently process data, and explore PySpark's capabilities ranging from machine learning to structured streaming. By the end, you'll be equipped to craft and deploy robust data pipelines and applications. What this Book will help me do Master the Spark 2.0 architecture and its Python integration with PySpark. Leverage PySpark DataFrames and RDDs for effective data manipulation and analysis. Develop scalable machine learning models using PySpark's ML and MLlib libraries. Understand advanced PySpark features such as GraphFrames for graph processing and TensorFrames for deep learning models. Gain expertise in deploying PySpark applications locally and on the cloud for production-ready solutions. Author(s) Authors None Drabas and None Lee bring extensive experience in data engineering and Python programming. They combine a practical, example-driven approach with deep insights into Apache Spark's ecosystem. Their expertise and clarity in writing make this book accessible for individuals aiming to excel in big data technologies with Python. Who is it for? This book is best suited for Python developers who want to integrate Apache Spark 2.0 into their workflow to process large-scale data. Ideal readers will have foundational knowledge of Python and seek to build scalable data-intensive applications using Spark, regardless of prior experience with Spark itself.