Topic

ORC

Optimized Row Columnar (ORC)

columnar_storage big_data compression file_format storage

Activities

1

tagged

Activity Trend

1 peak/qtr

2020-Q1 2026-Q2

Top Events

Data Engineering Podcast 7 O'Reilly Data Engineering Books 2 Databricks DATA + AI Summit 2023 1

Top Speakers

Tobias Macey 7 Dipti Borkar (Microsoft) 2 Brock Noland (PhData) 1 Toby Mao (SQLMesh) 1 Jordan Birdsell (PhData) 1 Ryan Blue (Tabular) 1 Brooke Wenig 1 Jules S. Damji (Anyscale Inc) 1 Tanmay Deshpande 1 Tathagata Das (Databricks) 1 Aneesh Karve (Quilt Data) 1 Yoni Iny (Upsolver) 1

Activities

Showing filtered results

All Video Podcast Book

Filtering by: Brooke Wenig ×

Learning Spark, 2nd Edition

2020-07-16 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Denny Lee (Databricks) , Brooke Wenig , Jules S. Damji (Anyscale Inc) , Tathagata Das (Databricks)

AI/ML Analytics API Avro CSV Data Analytics Delta Hive Java JSON Kafka Parquet +9 more

Data is bigger, arrives faster, and comes in a variety of formatsâ??and it all needs to be processed at scale for analytics or machine learning. But how can you process such varied workloads efficiently? Enter Apache Spark. Updated to include Spark 3.0, this second edition shows data engineers and data scientists why structure and unification in Spark matters. Specifically, this book explains how to perform simple and complex data analytics and employ machine learning algorithms. Through step-by-step walk-throughs, code snippets, and notebooks, youâ??ll be able to: Learn Python, SQL, Scala, or Java high-level Structured APIs Understand Spark operations and SQL Engine Inspect, tune, and debug Spark operations with Spark configurations and Spark UI Connect to data sources: JSON, Parquet, CSV, Avro, ORC, Hive, S3, or Kafka Perform analytics on batch and streaming data using Structured Streaming Build reliable data pipelines with open source Delta Lake and Spark Develop machine learning pipelines with MLlib and productionize models using MLflow