Josh Wills

Activities

4

talks

author

Frequent Collaborators

Sandy Ryza Databricks 3 Sean Owen Databricks 3 Uri Laserson 3

Filter by Event / Source

O'Reilly Data Engineering Books 3 DuckCon #3 San Francisco 2023 1

Talks & appearances

4 activities · Newest first

Search activities →

Advanced Analytics with PySpark

2022-06-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

with Sandy Ryza (Databricks) , Sean Owen (Databricks) , Akash Tandon , Josh Wills , Uri Laserson

data data-engineering apache-spark PySpark AI/ML Analytics

The amount of data being generated today is staggering and growing. Apache Spark has emerged as the de facto tool to analyze big data and is now a critical part of the data science toolbox. Updated for Spark 3.0, this practical guide brings together Spark, statistical methods, and real-world datasets to teach you how to approach analytics problems using PySpark, Spark's Python API, and other best practices in Spark programming. Data scientists Akash Tandon, Sandy Ryza, Uri Laserson, Sean Owen, and Josh Wills offer an introduction to the Spark ecosystem, then dive into patterns that apply common techniques-including classification, clustering, collaborative filtering, and anomaly detection, to fields such as genomics, security, and finance. This updated edition also covers NLP and image processing. If you have a basic understanding of machine learning and statistics and you program in Python, this book will get you started with large-scale data analysis. Familiarize yourself with Spark's programming model and ecosystem Learn general approaches in data science Examine complete implementations that analyze large public datasets Discover which machine learning tools make sense for particular problems Explore code that can be adapted to many uses

Advanced Analytics with Spark, 2nd Edition

2017-06-12 · O'Reilly Data Engineering Books O'Reilly Amazon

book

with Sandy Ryza (Databricks) , Sean Owen (Databricks) , Josh Wills , Uri Laserson

data data-engineering apache-spark AI/ML Analytics Data Science

In the second edition of this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example. Updated for Spark 2.1, this edition acts as an introduction to these techniques and other best practices in Spark programming. You’ll start with an introduction to Spark and its ecosystem, and then dive into patterns that apply common techniques—including classification, clustering, collaborative filtering, and anomaly detection—to fields such as genomics, security, and finance. If you have an entry-level understanding of machine learning and statistics, and you program in Java, Python, or Scala, you’ll find the book’s patterns useful for working on your own data applications. With this book, you will: Familiarize yourself with the Spark programming model Become comfortable within the Spark ecosystem Learn general approaches in data science Examine complete implementations that analyze large public data sets Discover which machine learning tools make sense for particular problems Acquire code that can be adapted to many uses

Advanced Analytics with Spark

2015-04-10 · O'Reilly Data Engineering Books O'Reilly Amazon

book

with Sandy Ryza (Databricks) , Sean Owen (Databricks) , Josh Wills , Uri Laserson

data data-engineering apache-spark AI/ML Analytics Java

In this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example. You’ll start with an introduction to Spark and its ecosystem, and then dive into patterns that apply common techniques—classification, collaborative filtering, and anomaly detection among others—to fields such as genomics, security, and finance. If you have an entry-level understanding of machine learning and statistics, and you program in Java, Python, or Scala, you’ll find these patterns useful for working on your own data applications.

DuckDBT: Not a database or a dbt adapter but a secret third thing – DuckCon #3 (San Francisco)

· DuckCon #3 San Francisco 2023 Watch

video

dbt DuckDB

Speaker: Josh Wills Slides: https://blobs.duckdb.org/events/duckcon3/josh-wills-duckdbt.pdf