BI – talk-data.com

Elasticsearch Query Language the Definitive Guide

2026-06-26 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Bahaaldine Azarmi , Alexis Charveriat , Stephen Brown , Farbod Shirzadian , Alejandro Sanchez

Analytics Data Analytics Data Management ELK Cyber Security data data-engineering elasticsearch search

Streamline your workflow with ESQL enhance data analysis with real-time insights, and speed up aggregations and visualizations Key Features Apply ESQL efficiently in analytics, observability, and cybersecurity Optimize performance and scalability for high-demand environments Discover how to visualize and debug ESQL queries Purchase of the print or Kindle book includes a free PDF eBook Book Description Built to simplify high-scale data analytics in Elasticsearch, this practical guide will take you from foundational concepts to advanced applications across search, observability, and security. It will help you overcome common challenges such as efficiently querying large datasets, applying advanced analytics without deep prior knowledge, and resolving for a unique and consolidated query language. Written by senior experts at Elastic with extensive field experience, this book delivers actionable guidance rooted in solving today’s data challenges at scale. After introducing ESQL and its architecture, the chapters explore real-world applications across various domains, including analytics, raw log analysis, observability, and cybersecurity. Advanced topics such as scaling, optimization, and future developments are also covered to help you maximize your ESQL capabilities. By the end of this book, you’ll be able to leverage ESQL for comprehensive data management and analysis, optimizing your workflows and enhancing your productivity with Elasticsearch. What you will learn Gain a solid understanding of ESQL and its architecture Use ESQL for data analysis and performance monitoring Apply ESQL in cybersecurity for threat detection and incident response Find out how to perform advanced searches using ESQL Prepare for future ESQL developments Showcase ESQL in action through real-world, persona-driven use cases Who this book is for If you’re an Elasticsearch user, this book is essential for your growth. Whether you’re a data analyst looking to build analytics on top of Elasticsearch, an SRE monitoring the health of your IT system, or a cybersecurity analyst, this book will give you a complete understanding of how ESQL is built and used. Additionally, database administrators, business intelligence professionals, and operational intelligence professionals will find this book invaluable. Even with a beginner-level knowledge of Elasticsearch, you’ll be able to get started and make the most of this comprehensive guide.

Understanding ETL (Updated Edition)

2025-09-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Matt Palmer

AI/ML Data Lakehouse ETL/ELT data data-engineering etl

"Extract, transform, load" (ETL) is at the center of every application of data, from business intelligence to AI. Constant shifts in the data landscape—including the implementations of lakehouse architectures and the importance of high-scale real-time data—mean that today's data practitioners must approach ETL a bit differently. This updated technical guide offers data engineers, engineering managers, and architects an overview of the modern ETL process, along with the challenges you're likely to face and the strategic patterns that will help you overcome them. You'll come away equipped to make informed decisions when implementing ETL and confident about choosing the technology stack that will help you succeed. Discover what ETL looks like in the new world of data lakehouses Learn how to deal with real-time data Explore low-code ETL tools Understand how to best achieve scale, performance, and observability

Jumpstart Snowflake: A Step-by-Step Guide to Modern Cloud Analytics

2025-08-01 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Donna Strok , Dmitry Foshin , Dmitry Anoshin

Analytics Cloud Computing Data Analytics Databricks DWH ETL/ELT Iceberg Matillion Cyber Security Snowflake Tableau data +1 more

This book is your guide to the modern market of data analytics platforms and the benefits of using Snowflake, the data warehouse built for the cloud. As organizations increasingly rely on modern cloud data platforms, the core of any analytics framework—the data warehouse—is more important than ever. This updated 2nd edition ensures you are ready to make the most of the industry’s leading data warehouse. This book will onboard you to Snowflake and present best practices for deploying and using the Snowflake data warehouse. The book also covers modern analytics architecture, integration with leading analytics software such as Matillion ETL, Tableau, and Databricks, and migration scenarios for on-premises legacy data warehouses. This new edition includes expanded coverage of SnowPark for developing complex data applications, an introduction to managing large datasets with Apache Iceberg tables, and instructions for creating interactive data applications using Streamlit, ensuring readers are equipped with the latest advancements in Snowflake's capabilities. What You Will Learn Master key functionalities of Snowflake Set up security and access with cluster Bulk load data into Snowflake using the COPY command Migrate from a legacy data warehouse to Snowflake Integrate the Snowflake data platform with modern business intelligence (BI) and data integration tools Manage large datasets with Apache Iceberg Tables Implement continuous data loading with Snowpipe and Dynamic Tables Who This Book Is For Data professionals, business analysts, IT administrators, and existing or potential Snowflake users

Narrative SQL: Crafting Data Analysis Queries That Tell Stories

2025-07-11 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Hamed Tabrizchi

Analytics Data Analytics SQL data data-engineering

This book addresses an important gap in data analytics education: the interplay between complex query-making and storytelling. While many resources cover the fundamentals of SQL queries and the technical skills required to manipulate data, few also explore moving beyond the numbers and figures to tell stories that drive strategic business decisions. By weaving together both SQL and narrative mechanics, author Hamed Tabrizchi has assembled a powerful tool for data analysts, aspiring database professionals, and business intelligence specialists. A strong foundation is laid in the first part of the book, which examines the technical skills necessary to access and manipulate data. You’ll explore foundational SQL commands, advanced querying techniques, data manipulation, data integrity, and optimization of queries for performance. The second half moves from the "how" of SQL to the "why," examining the meaning-making practices we can apply to data, and the stories data can tell. You'll learn how SQL queries can be interpreted, how to prepare data for visualization, and most importantly, how to convey the findings in a way that engages and informs the audience. In each chapter, practical exercises reinforce the techniques learned and help you apply them in real-world situations. In addition to strengthening technical skills, these exercises encourage readers to take a critical view of the data they are studying, considering the larger story it represents. Upon completing this book, you will not only be proficient in SQL, but also possess the key skill of converting data into narratives that can influence strategic direction and operational decisions in the modern workplace. What You Will Learn Advanced SQL Techniques: Master data manipulation and retrieval skills using advanced SQL queries Data Analysis Proficiency: Develop analytical skills to uncover key insights and understand significant data patterns Storytelling with Data: Learn to translate data analytics into compelling narratives for effective stakeholder communication Complex Querying Skills: Understand advanced SQL concepts such as common table expressions (CTEs), subqueries, and window functions Query Optimization: Optimize query execution time, resource usage, and scalability by mastering Indexes and Views Practical Application of Techniques: Gain hands-on experience with practical examples of advanced SQL techniques in real-world data analysis scenarios Effective Data Presentation: Discover strategies for visually presenting data stories to enhance engagement and understanding among diverse audiences Who This Book Is For Data analysts and business analysts, SQL developers, data-driven managers and executives and academics and students looking to enhance advanced querying and narrative building skills to better interpret and convey data.

SnowPro Core Certification Study Guide

2025-02-27 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jatin Verma

Analytics Cloud Computing Data Analytics Snowflake SQL data data-engineering

The "SnowPro Core Certification Study Guide" provides a comprehensive resource for mastering Snowflake data cloud concepts and passing the SnowPro Core exam. Through detailed explanations and practical exercises, you will gain the knowledge and skills necessary to successfully implement and manage Snowflake's powerful features and integrate data solutions effectively. What this Book will help me do Efficiently load and manage data in Snowflake for modern data processing. Optimize queries and configure Snowflake's performance features for data analytics. Securely implement access control and user roles to ensure data privacy. Apply Snowflake's sharing features to collaborate within and between organizations. Prepare effectively for the SnowPro Core exam with mock tests and review tools. Author(s) Jatin Verma is a renowned expert in Snowflake technologies and a certified SnowPro Core professional. With years of hands-on experience working with data solutions, Jatin excels at breaking down complex concepts into digestible lessons. His approachable writing style and dedication to education make this book a trusted resource for both aspiring and current professionals. Who is it for? This book is perfect for data engineers, analysts, database administrators, and business intelligence professionals who are looking to gain expertise in Snowflake and achieve SnowPro Core certification. It is particularly suited for those with foundational knowledge of databases, data warehouses, and SQL, seeking to advance their skills in Snowflake and become certified professionals. By leveraging this guide, readers can solidify their Snowflake knowledge and confidently approach the SnowPro Core certification exam.

Practical Lakehouse Architecture

2024-07-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Gaurav Ashok Thalpati

AI/ML Data Governance Data Lakehouse Cyber Security data data-engineering data-lake storage-repositories

This concise yet comprehensive guide explains how to adopt a data lakehouse architecture to implement modern data platforms. It reviews the design considerations, challenges, and best practices for implementing a lakehouse and provides key insights into the ways that using a lakehouse can impact your data platform, from managing structured and unstructured data and supporting BI and AI/ML use cases to enabling more rigorous data governance and security measures. Practical Lakehouse Architecture shows you how to: Understand key lakehouse concepts and features like transaction support, time travel, and schema evolution Understand the differences between traditional and lakehouse data architectures Differentiate between various file formats and table formats Design lakehouse architecture layers for storage, compute, metadata management, and data consumption Implement data governance and data security within the platform Evaluate technologies and decide on the best technology stack to implement the lakehouse for your use case Make critical design decisions and address practical challenges to build a future-ready data platform Start your lakehouse implementation journey and migrate data from existing systems to the lakehouse

Big Data on Kubernetes

2024-07-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Neylson Crepalde

Airflow Big Data Docker Kafka Kubernetes Python Spark SQL YAML data data-engineering streaming-messaging

Big Data on Kubernetes is your comprehensive guide to leveraging Kubernetes for scalable and efficient big data solutions. You will learn key concepts of Kubernetes architecture and explore tools like Apache Spark, Airflow, and Kafka. Gain hands-on experience building complete data pipelines to tackle real-world data challenges. What this Book will help me do Understand Kubernetes architecture and learn to deploy and manage clusters. Build and orchestrate big data pipelines using Spark, Airflow, and Kafka. Develop scalable and resilient data solutions with Docker and Kubernetes. Integrate and optimize data tools for real-time ingestion and processing. Apply concepts to hands-on projects addressing actual big data scenarios. Author(s) Neylson Crepalde is an experienced data specialist with extensive knowledge of Kubernetes and big data solutions. With deep practical experience, Neylson brings real-world insights to his writing. His approach emphasizes actionable guidance and relatable problem-solving with a strong foundation in scalable architecture. Who is it for? This book is ideal for data engineers, BI analysts, data team leaders, and tech managers familiar with Python, SQL, and YAML. Targeted at professionals seeking to develop or expand their expertise in scalable big data solutions, it provides practical insights into Docker, Kubernetes, and prominent big data tools.

IBM Storage DS8900F Architecture and Implementation: Updated for Release 9.3.2

2024-05-07 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Daniel Beukers , Connie Riggins , Jörg Klemm , Peter Kimmel , Bozhidar Feraliev , Gauurav Sabharwal , Jeff Cook

AI/ML IBM data data-engineering

This IBM® Redbooks® publication describes the concepts, architecture, and implementation of the IBM Storage DS8900F family. The book provides reference information to assist readers who need to plan for, install, and configure the DS8900F systems. This edition applies to DS8900F systems with IBM Storage DS8000® Licensed Machine Code (LMC) 7.9.30 (bundle version 89.30.xx.x), referred to as Release 9.3. The DS8900F systems are all-flash exclusively, and they are offered as three classes: DS8980F: Analytic Class: The DS8980F Analytic Class offers best performance for organizations that want to expand their workload possibilities to artificial intelligence (AI), Business Intelligence (BI), and machine learning (ML). IBM DS8950F: Agility Class: The Agility Class consolidates all your mission-critical workloads for IBM Z®, IBM LinuxONE, IBM Power, and distributed environments under a single all-flash storage solution. IBM DS8910F: Flexibility Class: The Flexibility Class reduces complexity while addressing various workloads at the lowest DS8900F family entry cost. The DS8900F architecture relies on powerful IBM POWER9™ processor-based servers that manage the cache to streamline disk input/output (I/O), which maximizes performance and throughput. These capabilities are further enhanced by High-Performance Flash Enclosures (HPFE) Gen2. Like its predecessors, the DS8900F supports advanced disaster recovery (DR) solutions, business continuity solutions, and thin provisioning.

IBM Storage FlashSystem 9500 Product Guide for IBM Storage Virtualize 8.6

2024-04-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jon Herd , Hartmut Lonzer , Gucer Vasfi

Cloud Computing IBM Cyber Security data data-engineering

This IBM® Redpaper® Product Guide describes the IBM Storage FlashSystem® 9500 solution, which is a next-generation IBM Storage FlashSystem control enclosure. It combines the performance of flash and a Non-Volatile Memory Express (NVMe)-optimized architecture with the reliability and innovation of IBM FlashCore® technology and the rich feature set and high availability (HA) of IBM Storage Virtualize. Often, applications exist that are foundational to the operations and success of an enterprise. These applications might function as prime revenue generators, guide or control important tasks, or provide crucial business intelligence, among many other jobs. Whatever their purpose, they are mission critical to the organization. They demand the highest levels of performance, functionality, security, and availability. They also must be protected against the newer threat of cyberattacks. To support such mission-critical applications, enterprises of all types and sizes turn to the IBM Storage FlashSystem 9500. IBM Storage FlashSystem 9500 provides a rich set of software-defined storage (SDS) features that are delivered by IBM Storage Virtualize, including the following examples: Data reduction and deduplication Dynamic tiering Thin-provisioning Snapshots Cloning Replication and data copy services Cyber resilience Transparent Cloud Tiering IBM HyperSwap® including 3-site replication for HA Scale-out and scale-up configurations that further enhance capacity and throughput for better availability This Redpaper applies to IBM Storage Virtualize V8.6.

Architecting a Modern Data Warehouse for Large Enterprises: Build Multi-cloud Modern Distributed Data Warehouses with Azure and AWS

2023-12-27 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Abhishek Mishra , Anjani Kumar , Sanjeev Kumar (Tesa SE)

AWS Azure Big Data Cloud Computing Data Governance Data Lake Data Lakehouse Delta DWH Pandas Cyber Security Data Streaming +4 more

Design and architect new generation cloud-based data warehouses using Azure and AWS. This book provides an in-depth understanding of how to build modern cloud-native data warehouses, as well as their history and evolution. The book starts by covering foundational data warehouse concepts, and introduces modern features such as distributed processing, big data storage, data streaming, and processing data on the cloud. You will gain an understanding of the synergy, relevance, and usage data warehousing standard practices in the modern world of distributed data processing. The authors walk you through the essential concepts of Data Mesh, Data Lake, Lakehouse, and Delta Lake. And they demonstrate the services and offerings available on Azure and AWS that deal with data orchestration, data democratization, data governance, data security, and business intelligence. After completing this book, you will be ready to design and architect enterprise-grade, cloud-based modern data warehouses using industry best practices and guidelines. What You Will Learn Understand the core concepts underlying modern data warehouses Design and build cloud-native data warehousesGain a practical approach to architecting and building data warehouses on Azure and AWS Implement modern data warehousing components such as Data Mesh, Data Lake, Delta Lake, and Lakehouse Process data through pandas and evaluate your model’s performance using metrics such as F1-score, precision, and recall Apply deep learning to supervised, semi-supervised, and unsupervised anomaly detection tasks for tabular datasets and time series applications Who This Book Is For Experienced developers, cloud architects, and technology enthusiasts looking to build cloud-based modern data warehouses using Azure and AWS

Learning and Operating Presto

2023-09-21 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Tim Meehan , Ying Su , Angelica Lo Duca , Vivek Bharathan

Cloud Computing DWH Hadoop IBM Presto Cyber Security SQL data data-engineering

The Presto community has mushroomed since its origins at Facebook in 2012. But ramping up this open source distributed SQL query engine can be challenging even for the most experienced engineers. With this practical book, data engineers and architects, platform engineers, cloud engineers, and software engineers will learn how to use Presto operations at your organization to derive insights on datasets wherever they reside. Authors Angelica Lo Duca, Tim Meehan, Vivek Bharathan, and Ying Su explain what Presto is, where it came from, and how it differs from other data warehousing solutions. You'll discover why Facebook, Uber, Alibaba Cloud, Hewlett Packard Enterprise, IBM, Intel, and many more use Presto and how you can quickly deploy Presto in production. With this book, you will: Learn how to install and configure Presto Use Presto with business intelligence tools Understand how to connect Presto to a variety of data sources Extend Presto for real-time business insight Learn how to apply best practices and tuning Get troubleshooting tips for logs, error messages, and more Explore Presto's architectural concepts and usage patterns Understand Presto security and administration

IBM Storage DS8900F Product Guide Release 9.3.2

2023-03-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Connie Riggins , Peter Kimmel , Jeff Cook

AI/ML IBM data data-engineering

This IBM® Redbooks Product Guide provides an overview of the features and functions that are available with the IBM Storage DS8900F models that run microcode Release 9.3.2 (Bundle 89.32/Licensed Machine Code 7.9.32). As of February 2023, the DS8900F with DS8000 Release 9.3.2 is the latest addition. The DS8900F is an all-flash system exclusively, and it offers three classes: IBM DS8980F: Analytic Class: The DS8980F Analytic Class offers best performance for organizations that want to expand their workload possibilities to artificial intelligence (AI), Business Intelligence, and Machine Learning. IBM DS8950F: Agility Class: The agility class is efficiently designed to consolidate all your mission-critical workloads for IBM zSystems, IBM LinuxONE, IBM Power Systems, and distributed environments under a single all-flash storage solution. IBM DS8910F: Flexibility Class: The flexibility class delivers significant performance for midrange organizations that are looking to meet storage challenges with advanced functionality delivered as a single rack solution.

IBM FlashSystem 9500 Product Guide

2023-01-16 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Shu Mookerjee , Konrad Trojok , Jon Herd , Hartmut Lonzer , Carsten Larsen , Douwe van Terwisga , Kendall Williams , Corne Lottering , Gucer Vasfi

Cloud Computing IBM Cyber Security data data-engineering

This IBM® Redpaper® Product Guide describes the IBM FlashSystem® 9500 solution, which is a next-generation IBM FlashSystem control enclosure. It combines the performance of flash and a Non-Volatile Memory Express (NVMe)-optimized architecture with the reliability and innovation of IBM FlashCore® technology and the rich feature set and high availability (HA) of IBM Spectrum® Virtualize. Often, applications exist that are foundational to the operations and success of an enterprise. These applications might function as prime revenue generators, guide or control important tasks, or provide crucial business intelligence, among many other jobs. Whatever their purpose, they are mission critical to the organization. They demand the highest levels of performance, functionality, security, and availability. They also must be protected against the modern scourge, cyberattacks. To support such mission-critical applications, enterprises of all types and sizes turn to the IBM FlashSystem 9500. IBM FlashSystem 9500 provides a rich set of software-defined storage (SDS) features that are delivered by IBM Spectrum Virtualize, including the following examples: Data reduction and deduplication Dynamic tiering Thin-provisioning Snapshots Cloning Replication and data copy services Cyber resilience Transparent Cloud Tiering IBM HyperSwap® including 3-site replication for HA Scale-out and scale-up configurations that further enhance capacity and throughput for better availability With the release of IBM Spectrum Virtualize V8.5, extra functions and features are available, including support for new third-generation IBM FlashCore Modules NVMe-type drives within the control enclosure, and 100 Gbps Ethernet adapters that provide NVMe Remote Direct Memory Access (RDMA) options. New software features include GUI enhancements and security enhancements, including multifactor authentication (MFA) and single sign-on (SSO), and Fibre Channel (FC) portsets.

Azure Data Engineering Cookbook - Second Edition

2022-09-26 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Nagaraj Venkatesan , Ahmad Osama , Luca Zanna

Analytics Azure ADF Cloud Computing Data Engineering Data Lake Databricks Microsoft Power BI RDBMS Synapse data +1 more

Azure Data Engineering Cookbook is your ultimate guide to mastering data engineering on Microsoft's Azure platform. Through an engaging collection of recipes, this book breaks down procedures to build sophisticated data pipelines, leveraging tools like Azure Data Factory, Data Lake, Databricks, and Synapse Analytics. What this Book will help me do Efficiently process large datasets using Azure Synapse analytics and Azure Databricks pipelines. Transform and shape data within systems by leveraging Azure Synapse data flows. Implement and manage relational databases in Azure with performance tuning and administration. Configure data pipeline solutions integrated with Power BI for insightful reporting. Monitor, optimize, and ensure lineage tracking for your data systems efficiently with Purview and Log analytics. Author(s) Nagaraj Venkatesan is an experienced cloud architect specializing in Microsoft Azure, with years of hands-on data engineering expertise. Ahmad Osama is a seasoned data professional and author's shared emphasis is on practical learning and bridging this with actionable skills effectively. Who is it for? This book is essential for data engineers seeking expertise in Azure's rich engineering capabilities. It's tailored for professionals with a foundational knowledge of cloud services, looking to achieve advanced proficiency in Azure data engineering pipelines.

Simplifying Data Engineering and Analytics with Delta

2022-07-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Anindita Mahapatra (Databricks)

AI/ML Analytics Data Engineering Data Governance Data Modelling Delta Python SQL Data Streaming data data-engineering

This book will guide you through mastering Delta, a robust and versatile protocol for data engineering and analytics. You'll discover how Delta simplifies data workflows, supports both batch and streaming data, and is optimized for analytics applications in various industries. By the end, you will know how to create high-performing, analytics-ready data pipelines. What this Book will help me do Understand Delta's unique offering for unifying batch and streaming data processing. Learn approaches to address data governance, reliability, and scalability challenges. Gain technical expertise in building data pipelines optimized for analytics and machine learning use. Master core concepts like data modeling, distributed computing, and Delta's schema evolution features. Develop and deploy production-grade data engineering solutions leveraging Delta for business intelligence. Author(s) Anindita Mahapatra is an experienced data engineer and author with years of expertise in working on Delta and data-driven solutions. Her hands-on approach to explaining complex data concepts makes this book an invaluable resource for professionals in data engineering and analytics. Who is it for? Ideal for data engineers, data analysts, and anyone involved in AI/BI workflows, this book suits learners with some basic knowledge of SQL and Python. Whether you're an experienced professional or looking to upgrade your skills with Delta, this book will provide practical insights and actionable knowledge.

The Azure Data Lakehouse Toolkit: Building and Scaling Data Lakehouses on Azure with Delta Lake, Apache Spark, Databricks, Synapse Analytics, and Snowflake

2022-07-13 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ron L'Esteve

AI/ML Analytics Azure Cloud Computing Data Lakehouse Databricks Delta ETL/ELT Microsoft PySpark Snowflake Spark +6 more

Design and implement a modern data lakehouse on the Azure Data Platform using Delta Lake, Apache Spark, Azure Databricks, Azure Synapse Analytics, and Snowflake. This book teaches you the intricate details of the Data Lakehouse Paradigm and how to efficiently design a cloud-based data lakehouse using highly performant and cutting-edge Apache Spark capabilities using Azure Databricks, Azure Synapse Analytics, and Snowflake. You will learn to write efficient PySpark code for batch and streaming ELT jobs on Azure. And you will follow along with practical, scenario-based examples showing how to apply the capabilities of Delta Lake and Apache Spark to optimize performance, and secure, share, and manage a high volume, high velocity, and high variety of data in your lakehouse with ease. The patterns of success that you acquire from reading this book will help you hone your skills to build high-performing and scalable ACID-compliant lakehouses using flexible and cost-efficient decoupled storage and compute capabilities. Extensive coverage of Delta Lake ensures that you are aware of and can benefit from all that this new, open source storage layer can offer. In addition to the deep examples on Databricks in the book, there is coverage of alternative platforms such as Synapse Analytics and Snowflake so that you can make the right platform choice for your needs. After reading this book, you will be able to implement Delta Lake capabilities, including Schema Evolution, Change Feed, Live Tables, Sharing, and Clones to enable better business intelligence and advanced analytics on your data within the Azure Data Platform. What You Will Learn Implement the Data Lakehouse Paradigm on Microsoft’s Azure cloud platform Benefit from the new Delta Lake open-source storage layer for data lakehouses Take advantage of schema evolution, change feeds, live tables, and more Writefunctional PySpark code for data lakehouse ELT jobs Optimize Apache Spark performance through partitioning, indexing, and other tuning options Choose between alternatives such as Databricks, Synapse Analytics, and Snowflake Who This Book Is For Data, analytics, and AI professionals at all levels, including data architect and data engineer practitioners. Also for data professionals seeking patterns of success by which to remain relevant as they learn to build scalable data lakehouses for their organizations and customers who are migrating into the modern Azure Data Platform.

Getting Started with Elastic Stack 8.0

2022-03-23 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Asjad Athick

ELK Kibana Logstash Cyber Security data data-engineering elastic-stack-elk-stack elastic stack (elk stack) elasticsearch search

Discover how to harness the power of the Elastic Stack 8.0 to manage, analyze, and secure complex data environments. You will learn to combine components such as Elasticsearch, Kibana, Logstash, and more to build scalable and effective solutions for your organization. By focusing on hands-on implementations, this book ensures you can apply your knowledge to real-world use cases. What this Book will help me do Set up and manage Elasticsearch clusters tailored to various architecture scenarios. Utilize Logstash and Elastic Agent to ingest and process diverse data sources efficiently. Create interactive dashboards and data models in Kibana, enabling business intelligence insights. Implement secure and effective search infrastructures for enterprise applications. Deploy Elastic SIEM to fortify your organization's security against modern cybersecurity threats. Author(s) Asjad Athick is a seasoned technologist and author with expertise in developing scalable data solutions. With years of experience working with the Elastic Stack, Asjad brings a pragmatic approach to teaching complex architectures. His dedication to explaining technical concepts in an accessible manner makes this book a valuable resource for learners. Who is it for? This book is ideal for developers seeking practical knowledge in search, observability, and security solutions using Elastic Stack. Solutions architects who aim to design scalable data platforms will also benefit greatly. Even tech leads or managers keen to understand the Elastic Stack's impact on their operations will find the insights valuable. No prior experience with Elastic Stack is needed.

Mastering Snowflake Solutions: Supporting Analytics and Data Sharing

2022-02-27 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Adam Morton

Agile/Scrum Analytics GDPR/CCPA Cyber Security Snowflake data data-engineering

Design for large-scale, high-performance queries using Snowflake’s query processing engine to empower data consumers with timely, comprehensive, and secure access to data. This book also helps you protect your most valuable data assets using built-in security features such as end-to-end encryption for data at rest and in transit. It demonstrates key features in Snowflake and shows how to exploit those features to deliver a personalized experience to your customers. It also shows how to ingest the high volumes of both structured and unstructured data that are needed for game-changing business intelligence analysis. Mastering Snowflake Solutions starts with a refresher on Snowflake’s unique architecture before getting into the advanced concepts that make Snowflake the market-leading product it is today. Progressing through each chapter, you will learn how to leverage storage, query processing, cloning, data sharing, and continuous data protection features. This approach allows for greater operational agility in responding to the needs of modern enterprises, for example in supporting agile development techniques via database cloning. The practical examples and in-depth background on theory in this book help you unleash the power of Snowflake in building a high-performance system with little to no administrative overhead. Your result from reading will be a deep understanding of Snowflake that enables taking full advantage of Snowflake’s architecture to deliver value analytics insight to your business. What You Will Learn Optimize performance and costs associated with your use of the Snowflake data platform Enable data security to help in complying with consumer privacy regulations such as CCPA and GDPR Share data securely both inside your organization and with external partners Gain visibility to each interaction with your customersusing continuous data feeds from Snowpipe Break down data silos to gain complete visibility your business-critical processes Transform customer experience and product quality through real-time analytics Who This Book Is for Data engineers, scientists, and architects who have had some exposure to the Snowflake data platform or bring some experience from working with another relational database. This book is for those beginning to struggle with new challenges as their Snowflake environment begins to mature, becoming more complex with ever increasing amounts of data, users, and requirements. New problems require a new approach and this book aims to arm you with the practical knowledge required to take advantage of Snowflake’s unique architecture to get the results you need.

Analytics Optimization with Columnstore Indexes in Microsoft SQL Server: Optimizing OLAP Workloads

2022-02-26 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Edward Pollack

Analytics Microsoft SQL SQL Server data data-engineering microsoft-sql-server relational-databases

Meet the challenge of storing and accessing analytic data in SQL Server in a fast and performant manner. This book illustrates how columnstore indexes can provide an ideal solution for storing analytic data that leads to faster performing analytic queries and the ability to ask and answer business intelligence questions with alacrity. The book provides a complete walk through of columnstore indexing that encompasses an introduction, best practices, hands-on demonstrations, explanations of common mistakes, and presents a detailed architecture that is suitable for professionals of all skill levels. With little or no knowledge of columnstore indexing you can become proficient with columnstore indexes as used in SQL Server, and apply that knowledge in development, test, and production environments. This book serves as a comprehensive guide to the use of columnstore indexes and provides definitive guidelines. You will learn when columnstore indexes shouldbe used, and the performance gains that you can expect. You will also become familiar with best practices around architecture, implementation, and maintenance. Finally, you will know the limitations and common pitfalls to be aware of and avoid. As analytic data can become quite large, the expense to manage it or migrate it can be high. This book shows that columnstore indexing represents an effective storage solution that saves time, money, and improves performance for any applications that use it. You will see that columnstore indexes are an effective performance solution that is included in all versions of SQL Server, with no additional costs or licensing required. What You Will Learn Implement columnstore indexes in SQL Server Know best practices for the use and maintenance of analytic data in SQL Server Use metadata to fully understand the size and shape of data stored in columnstore indexes Employ optimal ways to load, maintain, and delete data from large analytic tables Know how columnstore compression saves storage, memory, and time Understand when a columnstore index should be used instead of a rowstore index Be familiar with advanced features and analytics Who This Book Is For Database developers, administrators, and architects who are responsible for analytic data, especially for those working with very large data sets who are looking for new ways to achieve high performance in their queries, and those with immediate or future challenges to analytic data and query performance who want a methodical and effective solution

IBM DS8900F Architecture and Implementation: Updated for Release 9.2

2022-02-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Connie Riggins , Lisa Martinez , Bertrand Dufrasne , Mike Stenson , Jeff Cook , Sherri Brunson

AI/ML IBM data data-engineering

This IBM® RedpaperRedbooks® publication describes the concepts, architecture, and implementation of the IBM DS8900F family. The WhitepaperRedpaperbook provides reference information to assist readers who need to plan for, install, and configure the DS8900F systems. This edition applies to DS8900F systems with IBM DS8000® Licensed Machine Code (LMC) 7.9.20 (bundle version 89.20.xx.x), referred to as Release 9.2. The DS8900F is an all-flash system exclusively, and it offers three classes: DS8980F: Analytic Class: The DS8980F Analytic Class offers best performance for organizations that want to expand their workload possibilities to artificial intelligence (AI), Business Intelligence (BI), and machine learning (ML). IBM DS8950F: Agility Class all-flash: The Agility Class consolidates all your mission-critical workloads for IBM Z®, IBM LinuxONE, IBM Power Systems, and distributed environments under a single all-flash storage solution.. IBM DS8910F: Flexibility Class all-flash: The Flexibility Class reduces complexity while addressing various workloads at the lowest DS8900F family entry cost. . TThe DS8900F architecture relies on powerful IBM POWER9™ processor-based servers that manage the cache to streamline disk input/output (I/O), which maximizes performance and throughput. These capabilities are further enhanced by High-Performance Flash Enclosures (HPFE) Gen2. Like its predecessors, the DS8900F supports advanced disaster recovery (DR) solutions, business continuity solutions, and thin provisioning. The IBM DS8910F Rack-Mounted model 993 is described in IBM DS8910F Model 993 Rack-Mounted Storage System Release 9.1, REDP-5566.

talk-data.com

BI

Activity Trend

Top Events

Top Speakers

Elasticsearch Query Language the Definitive Guide

Understanding ETL (Updated Edition)

Jumpstart Snowflake: A Step-by-Step Guide to Modern Cloud Analytics

Narrative SQL: Crafting Data Analysis Queries That Tell Stories

SnowPro Core Certification Study Guide

Practical Lakehouse Architecture

Big Data on Kubernetes

IBM Storage DS8900F Architecture and Implementation: Updated for Release 9.3.2

IBM Storage FlashSystem 9500 Product Guide for IBM Storage Virtualize 8.6

Architecting a Modern Data Warehouse for Large Enterprises: Build Multi-cloud Modern Distributed Data Warehouses with Azure and AWS

Learning and Operating Presto

IBM Storage DS8900F Product Guide Release 9.3.2

IBM FlashSystem 9500 Product Guide

Azure Data Engineering Cookbook - Second Edition

Simplifying Data Engineering and Analytics with Delta

The Azure Data Lakehouse Toolkit: Building and Scaling Data Lakehouses on Azure with Delta Lake, Apache Spark, Databricks, Synapse Analytics, and Snowflake

Getting Started with Elastic Stack 8.0

Mastering Snowflake Solutions: Supporting Analytics and Data Sharing

Analytics Optimization with Columnstore Indexes in Microsoft SQL Server: Optimizing OLAP Workloads

IBM DS8900F Architecture and Implementation: Updated for Release 9.2