Topic

Scala

programming_language functional_programming jvm

Activities

3

tagged

Activity Trend

12 peak/qtr

2020-Q1 2026-Q2

Top Events

O'Reilly Data Engineering Books 34 Data Engineering Podcast 33 Databricks DATA + AI Summit 2023 11 ADSP: Algorithms + Data Structures = Programs 3 Scala Talks: Hands-On Capture Checking & Scala Native live-coding ☀️ 2 Scala Talks: A deep dive into streaming with fs2 & Scala Meets GenAI 2 Data + AI Summit 2025 2 Scala Talks: Tour of error handling & Functional Programming at Huge Companies 2 Women in Scala: From Paradigms to Percussion & Hands On with Creative Scala 2 Meetup Paris Scala User Group (PSUG) – Hébergé par DataDome! 2 Scala Talks: Write a book about Scala during Covid & AI tooling for developers 2 DataDome x PSUG #116 : My First Year in Scala! + TBA 2

Top Speakers

Tobias Macey 33 Holden Karau (Fight Health Insurance) 3 Conor Hoekstra 3 Bryce Adelstein Lelbach (NVIDIA) 3 Raúl Estrada 2 Zainab Ali (London Scala User Group) 2 Josh Wills 2 Sourav Gulati (Databricks) 2 Mohammed Guller 2 Romeo Kienzler 2 Sandy Ryza (Databricks) 2 Sean Owen (Databricks) 2

Activities

Showing filtered results

All Video Podcast Book

Filtering by: Holden Karau ×

High Performance Spark

2017-05-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Rachel Warren , Holden Karau (Fight Health Insurance)

AI/ML Spark SQL Data Streaming apache-spark data data-engineering

Apache Spark is amazing when everything clicks. But if you haven’t seen the performance improvements you expected, or still don’t feel confident enough to use Spark in production, this practical book is for you. Authors Holden Karau and Rachel Warren demonstrate performance optimizations to help your Spark queries run faster and handle larger data sizes, while using fewer resources. Ideal for software engineers, data engineers, developers, and system administrators working with large-scale data applications, this book describes techniques that can reduce data infrastructure costs and developer hours. Not only will you gain a more comprehensive understanding of Spark, you’ll also learn how to make it sing. With this book, you’ll explore: How Spark SQL’s new interfaces improve performance over SQL’s RDD data structure The choice between data joins in Core Spark and Spark SQL Techniques for getting the most out of standard RDD transformations How to work around performance issues in Spark’s key/value pair paradigm Writing high-performance Spark code without Scala or the JVM How to test for functionality and performance when applying suggested improvements Using Spark MLlib and Spark ML machine learning libraries Spark’s Streaming components and external community packages

Fast Data Processing with Spark 2 - Third Edition

2016-10-24 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Holden Karau (Fight Health Insurance) , Krishna Sankar

AI/ML Analytics API Big Data Cloud Computing Data Analytics Data Engineering Java Spark apache-spark data data-engineering

Fast Data Processing with Spark 2 takes you through the essentials of leveraging Spark for big data analysis. You will learn how to install and set up Spark, handle data using its APIs, and apply advanced functionality like machine learning and graph processing. By the end of the book, you will be well-equipped to use Spark in real-world data processing tasks. What this Book will help me do Install and configure Apache Spark for optimal performance. Interact with distributed datasets using the resilient distributed dataset (RDD) API. Leverage the flexibility of DataFrame API for efficient big data analytics. Apply machine learning models using Spark MLlib to solve complex problems. Perform graph analysis using GraphX to uncover structural insights in data. Author(s) Krishna Sankar is an experienced data scientist and thought leader in big data technologies. With a deep understanding of machine learning, distributed systems, and Apache Spark, Krishna has guided numerous projects in data engineering and big data processing. Matei Zaharia, the co-author, is also widely recognized in the field of distributed systems and cloud computing, contributing to Apache Spark development. Who is it for? This book is catered to software developers and data engineers with a foundational understanding of Scala or Java programming. Beginner to medium-level understanding of big data processing concepts is recommended for readers. If you are aspiring to solve big data problems using scalable distributed computing frameworks, this book is perfect for you. By the end, you will be confident in building Spark-powered applications and analyzing data efficiently.

Learning Spark

2015-02-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Andy Konwinski (Databricks) , Holden Karau (Fight Health Insurance) , Matei Zaharia (Databricks) , Patrick Wendell (Databricks)

Analytics API Data Analytics Java Python Spark SQL Data Streaming apache-spark data data-engineering

Data in all domains is getting bigger. How can you work with it efficiently? Recently updated for Spark 1.3, this book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. With Spark, you can tackle big datasets quickly through simple APIs in Python, Java, and Scala. This edition includes new information on Spark SQL, Spark Streaming, setup, and Maven coordinates.