talk-data.com
People (141 results)
See all 141 →Activities & events
| Title & Speakers | Event |
|---|---|
|
Delta Lake: The Definitive Guide
2024-10-31
Ready to simplify the process of building data lakehouses and data pipelines at scale? In this practical guide, learn how Delta Lake is helping data engineers, data scientists, and data analysts overcome key data reliability challenges with modern data engineering and management techniques. Authors Denny Lee, Tristen Wentling, Scott Haines, and Prashanth Babu (with contributions from Delta Lake maintainer R. Tyler Croy) share expert insights on all things Delta Lake--including how to run batch and streaming jobs concurrently and accelerate the usability of your data. You'll also uncover how ACID transactions bring reliability to data lakehouses at scale. This book helps you: Understand key data reliability challenges and how Delta Lake solves them Explain the critical role of Delta transaction logs as a single source of truth Learn the Delta Lake ecosystem with technologies like Apache Flink, Kafka, and Trino Architect data lakehouses with the medallion architecture Optimize Delta Lake performance with features like deletion vectors and liquid clustering |
O'Reilly Data Engineering Books
|
|
Scott Haines
– Databricks Beacon
@ Databricks
Join Scott Haines (Databricks Beacon) as he teaches you to write your own Notebook style service (like Jupyter / Zeppelin / Databricks) for both fun (and profit?). Cause haven't we all just been a little curious how Notebook environments work? From the outside things probably seem magical, however just below the surface there is a literal world of possibilities waiting to be exploited (both figuratively and literally) to assist in the building of unimaginable new creations. Curiosity is of course the foundation for creativity and novel ideation, and when armed with the knowledge you'll pick up in this session, you'll have gained an additional perspective and way of thinking (mental model) for solving complex problems using dynamic procedural (on-the-fly) code compilation. Did I mention you'll use Spark Structured Streaming in order to generate a "live" communication channel between your Notebook service and the "outside world"? Overview During this session you'll learn to build your own Notebook-style service on top of Apache Spark & the Scala ILoop. Along the way, you'll uncover how to harness the SparkContext to manage, drive, and scale your own procedurally defined Apache Spark applications by mixing core configuration and other "magic". As we move through the steps necessary to achieve this end result, you'll learn to run individual paragraphs, or the entire synchronous waterfall of paragraphs, leading to the dynamic generation of applications. Deep dive into the world of possibilities that fork from a solid understanding of procedurally generated, on-the-fly, code compilation (live injection), the security ramifications (cause of course this is unsafe!), but come away with a new mental model focused on architecting composite applications, or auto-generated Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/ |
Databricks DATA + AI Summit 2023 |
|
The Rise of Operational Analytics
2019-12-25
Scott Haines
– author
Fast access to data has become a critical game changer. Today, a new breed of company understands that the faster they can build, access, and share well-defined datasets, the more competitive they’ll be in our data-driven world. In this practical report, Scott Haines from Twilio introduces you to operational analytics, a new approach for making sense of all the data flooding into business systems. Data architects and data scientists will see how Apache Kafka and other tools and processes laid the groundwork for fast analytics on a mix of historical and near-real-time data. You’ll learn how operational analytics feeds minute-by-minute customer interactions, and how NewSQL databases have entered the scene to drive machine learning algorithms, AI programs, and ongoing decision-making within an organization. Understand the key advantages that data-driven companies have over traditional businesses Explore the rise of operational analytics—and how this method relates to current tech trends Examine the impact of can’t wait business decisions and won’t wait customer experiences Discover how NewSQL databases support cloud native architecture and set the stage for operational databases Learn how to choose the right database to support operational analytics in your organization |
O'Reilly Data Engineering Books
|