O'Reilly Data Engineering Books

Mastering Ceph - Second Edition

2019-03-05 O'Reilly Amazon

book

Nick Fisk

data data-engineering ceph Cloud Computing

Mastering Ceph is your comprehensive guide to understanding and deploying Ceph for scalable storage solutions. From planning and design to advanced disaster recovery practices, this book equips you with practical knowledge and hands-on techniques to harness the power of Ceph effectively. What this Book will help me do Design and deploy scalable Ceph clusters tailored to your needs. Optimize Ceph's performance with state-of-the-art tuning techniques. Implement effective disaster recovery strategies for robust storage systems. Extend Ceph's functionality with programming using Librados. Troubleshoot and maintain Ceph to ensure reliability and performance. Author(s) None Fisk is a recognized expert in storage infrastructure. With years of hands-on experience with Ceph and storage systems, None has been involved in numerous successful deployments and performance optimizations. Drawing from real-world scenarios, the author's insights make this guide invaluable for professionals. Who is it for? This book is tailored for storage administrators, cloud engineers, and system administrators aiming to enhance their expertise in storage technologies. Whether you're new to Ceph or looking to deepen your knowledge, the clear examples and practical advice make it a perfect pick.

Walmart and the CICS Asynchronous API: An Adoption Experience

2019-03-01 O'Reilly Amazon

book

Frank De Gilio Pradeep Gohil Nick Garrod, Randy Frerking, Rich Jackson, Kellie Mathis

data data-engineering IBM API

Abstract This IBM® Redbooks® publication discusses practical uses of the IBM CICS asynchronous API capability. It describes the methodology, design and thought process used by a large client, Walmart, and the considerations of the choices made. The Redbooks publication provides real life examples and application patterns that benefit from the performance and scalability offered by the new API. The book discusses the homegrown methodology used by Walmart before the API was available and compares it with the design using the new API. A discussion of the process used to migrate older applications to begin using the new API is included so the reader will understand the ease of implementing the new API. A description of real world usage patterns describes the current production application Walmart has deployed as well as other patterns to give the reader a sense of what's possible applying creative thinking with technology improvements. Finally, a section is included on the areas to be considered as you begin to plan and implement asynchronous API capabilities. This book should be read by: Enterprise Architects searching for faster ways to service strategic applications across the enterprise. Solution Architects who want to better understand implementation possibilities for improved response times and better performance for CICS applications. CICS programmers looking to modernize and provide improved response times.

Mastering Hadoop 3

2019-02-28 O'Reilly Amazon

book

Timothy Wong , Chanchal Singh , Manish Kumar

data data-engineering Hadoop Flink Big Data Data Engineering

"Mastering Hadoop 3" is your in-depth guide to understanding and mastering the advanced features of the Hadoop ecosystem. With a focus on distributed computing and data processing, this book covers essential tools such as YARN, MapReduce, and Apache Spark to help you build scalable, efficient data pipelines. What this Book will help me do Gain a comprehensive understanding of Hadoop Distributed File System (HDFS) and YARN for effective resource management. Master data processing with MapReduce and learn to integrate with real-time processing engines like Spark and Flink. Develop and secure enterprise-grade Hadoop-based data pipelines by implementing robust security and governance measures. Explore techniques for batch data processing, data modeling, and designing applications tailored for Hadoop environments. Understand best practices for optimizing and troubleshooting Hadoop clusters for enhanced performance and reliability. Author(s) The authors, including None Wong, None Singh, and None Kumar, bring together years of experience in big data engineering, distributed systems, and enterprise application development. They aim to provide a clear pathway to mastering Hadoop ecosystem tools. Who is it for? This book is ideal for budding big data professionals who have some familiarity with Java and basic Hadoop concepts and wish to elevate their expertise. If you're a Hadoop career practitioner keen to expand your understanding of the ecosystem's advanced capabilities or a professional looking to implement Hadoop in organizational workflows, this book is well-suited for you.

IBM DS8880 Architecture and Implementation (Release 8.51)

2019-02-26 O'Reilly Amazon

book

Sherry Brunson Bert Dufrasne Peter Kimmel, Stephen Manthorpe, Andreas Reinhardt, Connie Riggins, Tamas Toser, Axel Westphal

data data-engineering IBM Analytics Cloud Computing

Abstract * Updated for R8.51 * This IBM® Redbooks® publication describes the concepts, architecture, and implementation of the IBM DS8880 family. The book provides reference information to assist readers who need to plan for, install, and configure the DS8880 systems. The IBM DS8000® family is a high-performance, high-capacity, highly secure, and resilient series of disk storage systems. The DS8880 family is the latest and most advanced of the DS8000 offerings to date. The high availability, multiplatform support, including IBM Z, and simplified management tools help provide a cost-effective path to an on-demand and cloud-based infrastructures. The IBM DS8880 family now offers business-critical, all-flash, and hybrid data systems that span a wide range of price points: DS8882F: Rack Mounted storage system DS8884: Business Class DS8886: Enterprise Class DS8888: Analytics Class The DS8884 and DS8886 are available as either hybrid models, or can be configured as all-flash. Each model represents the most recent in this series of high-performance, high-capacity, flexible, and resilient storage systems. These systems are intended to address the needs of the most demanding clients. Two powerful IBM POWER8® processor-based servers manage the cache to streamline disk I/O, maximizing performance and throughput. These capabilities are further enhanced with the availability of the second generation of high-performance flash enclosures (HPFEs Gen-2) and newer flash drives. Like its predecessors, the DS8880 supports advanced disaster recovery (DR) solutions, business continuity solutions, and thin provisioning. All disk drives in the DS8880 storage system include the Full Disk Encryption (FDE) feature. The DS8880 can automatically optimize the use of each storage tier, particularly flash drives, by using the IBM Easy Tier® feature. Release 8.5 introduces the Safeguarded Copy feature. The DS8882F Rack Mounted is decribed in a separate publication, Introducing the IBM DS8882F Rack Mounted Storage System, REDP-5505.

IBM FlashSystem A9000 and A9000R Architecture and Implementation (Version 12.3.1)

2019-02-18 O'Reilly Amazon

book

Lisa Martinez , Francesco Anderloni , Stephen Solewin , Bert Dufrasne , Roger Eriksson

data data-engineering IBM

Abstract * Version 12.3.1 * This IBM® Redbooks publication presents the architecture, design, concepts, and technology that are used in IBM FlashSystem® A9000 and IBM FlashSystem A9000R. FlashSystem A9000 and FlashSystem A9000R deliver the microsecond latency and high availability of IBM FlashCore® technology with grid architecture, simple scalability, and industry-leading IBM software that is designed to drive your business into the cognitive era. The Hyper-Scale Manager highly intuitive user interface simplifies management. Comprehensive data reduction capabilities, including inline deduplication and a powerful compression engine, help lower total cost of ownership. With software version 12.3.1 and Hyper-Scale Manager version 5.5.1 (or later) the system can compute reclaimable and attributed capacity information, without performance impact. From a functional standpoint, FlashSystem A9000 and FlashSystem A9000R take advantage of most of the software-defined storage features that are offered by the IBM Spectrum™ Accelerate software, including multi-tenancy and business continuity functions. FlashSystem A9000 and FlashSystem A9000R supports HyperSwap and Multi-site High Availabilty / Disaster Recovery (HA/DR) configurations. This publication is intended for those individuals who need to plan, install, tailor, and configure FlashSystem A9000 and FlashSystem A9000R. For detailed information about configuration, management, and replication functions and their usage, see the following publications: , SG24-8376 IBM Spectrum Accelerate Family Storage Configuration and Usage for IBM FlashSystem A9000, IBM FlashSystem A9000R, and IBM XIV Gen3 , REDP-5401 IBM FlashSystem A9000 and A9000R Business Continuity Solutions , REDP-5434 IBM HyperSwap and Multi-site HA/DR solution for IBM FlashSystem A9000 and A9000R , SG24-8368. IBM Spectrum Accelerate Family: Host Attachment and Interoperability

IBM Elastic Storage Server Implementation Guide for Version 5.3

2019-02-05 O'Reilly Amazon

book

Kiran Ghag , Ravindra Sure , Vasfi Gucer , Nikhil Khandelwal , Poornima Gupte , Puneet Chaudhary , Luis Bolinches

data data-engineering IBM Big Data Cloud Computing ELK

This IBM® Redpaper™ publication introduces and describes the IBM Elastic Storage™ Server as a scalable, high-performance data and file management solution. The solution is built on proven IBM Spectrum™ Scale technology, formerly IBM General Parallel File System (GPFS™). IBM Elastic Storage Servers can be implemented for a range of diverse requirements, providing reliability, performance, and scalability. This publication helps you to understand the solution and its architecture and helps you to plan the installation and integration of the environment. The following combination of physical and logical components are required: Hardware Operating system Storage Network Applications This paper provides guidelines for several usage and integration scenarios. Typical scenarios include Cluster Export Services (CES) integration, disaster recovery, and multicluster integration. This paper addresses the needs of technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) who must deliver cost-effective cloud services and big data solutions.

Using the IBM Spectrum Accelerate Family in VMware Environments: IBM XIV, IBM FlashSystem A9000 and IBM FlashSystem A9000R, and IBM Spectrum Accelerate

2019-02-04 O'Reilly Amazon

book

Markus Oscheka , Abilio Oliveira , Grant Kabobel , Bert Dufrasne

data data-engineering IBM VMware

This IBM® Redpaper™ publication is a brief overview of synergistic aspects between various VMware offerings and the IBM Spectrum™ Accelerate family, including IBM XIV® and IBM FlashSystem® A9000 and IBM FlashSystem A9000R servers. After reviewing different integration concepts and explaining general implementation aspects for attaching the IBM Spectrum Accelerate™ family to VMware ESXi deployments, the paper focuses on components that are enabled by IBM Spectrum Connect v3.4. This paper is intended for planning to use or implementing the IBM Spectrum Accelerate family of storage systems in a VMware environment.

Advanced MySQL 8

2019-01-31 O'Reilly Amazon

book

Tejaswi Malepati , Birju Shah , Eric Vanier

data data-engineering relational-databases MySQL Cyber Security

Dive into the world of MySQL 8.0 with this comprehensive guide, tailored for professionals seeking to optimize and expand their database capabilities. You will master techniques to improve performance, scalability, and security in your database applications, making them robust and efficient. What this Book will help me do Learn to implement and analyze large queries efficiently in MySQL 8.0. Gain insights into optimizing performance by leveraging MySQL indexing and settings. Understand advanced replication techniques, including Group Replication and its applications in InnoDB clusters. Master the essentials of database monitoring and managing large distributed instances. Explore methods for backup, recovery, and enhancing data security within MySQL. Author(s) None Vanier, None Shah, and None Malepati are seasoned database experts with extensive experience in MySQL and database management. They have worked on scaling enterprise-level database applications, focusing on improving performance and reliability. They bring real-world insights and a clear, practical approach to this book, making it an invaluable resource for developers and administrators alike. Who is it for? This book is perfect for database administrators, developers, and system architects who already have a foundational understanding of MySQL and are looking to deepen their expertise. If you're someone interested in enhancing database application performance, mastering advanced techniques, or handling distributed databases and scaling challenges, this book will surely cater to your professional aspirations.

Apache Spark Quick Start Guide

2019-01-31 O'Reilly Amazon

book

Akash Grade , Shrey Mehrotra

data data-engineering apache-spark AI/ML API Big Data

Dive into the world of scalable data processing with the "Apache Spark Quick Start Guide." This book offers a foundational introduction to Spark, empowering readers to harness its capabilities for big data processing. With clear explanations and hands-on examples, you'll learn to implement Spark applications that handle complex data tasks efficiently. What this Book will help me do Understand and implement Spark's RDDs and DataFrame APIs to process large datasets effectively. Set up a local development environment for Spark-based projects. Develop skills to debug and optimize slow-performing Spark applications. Harness built-in modules of Spark for SQL, streaming, and machine learning applications. Adopt best practices and optimization techniques for high-performance Spark applications. Author(s) Shrey Mehrotra is a seasoned software developer with expertise in big data technologies, particularly Apache Spark. With years of hands-on industry experience, Shrey focuses on making complex technical concepts accessible to all. Through his writing, he aims to share clear, practical guidance for developers of all levels. Who is it for? This guide is perfect for big data enthusiasts and professionals looking to learn Apache Spark's capabilities from scratch. It's aimed at data engineers interested in optimizing application performance and data scientists wanting to integrate machine learning with Spark. A basic familiarity with either Scala, Python, or Java is recommended.

Ceph: Designing and Implementing Scalable Storage Systems

2019-01-31 O'Reilly Amazon

book

Vikhyat Umrao , Nick Fisk , Michael Hackett , Karan Singh

data data-engineering ceph Ansible Cloud Computing Linux

Get to grips with the unified, highly scalable distributed storage system and learn how to design and implement it. Key Features Explore Ceph's architecture in detail Implement a Ceph cluster successfully and gain deep insights into its best practices Leverage the advanced features of Ceph, including erasure coding, tiering, and BlueStore Book Description This Learning Path takes you through the basics of Ceph all the way to gaining in-depth understanding of its advanced features. You'll gather skills to plan, deploy, and manage your Ceph cluster. After an introduction to the Ceph architecture and its core projects, you'll be able to set up a Ceph cluster and learn how to monitor its health, improve its performance, and troubleshoot any issues. By following the step-by-step approach of this Learning Path, you'll learn how Ceph integrates with OpenStack, Glance, Manila, Swift, and Cinder. With knowledge of federated architecture and CephFS, you'll use Calamari and VSM to monitor the Ceph environment. In the upcoming chapters, you'll study the key areas of Ceph, including BlueStore, erasure coding, and cache tiering. More specifically, you'll discover what they can do for your storage system. In the concluding chapters, you will develop applications that use Librados and distributed computations with shared object classes, and see how Ceph and its supporting infrastructure can be optimized. By the end of this Learning Path, you'll have the practical knowledge of operating Ceph in a production environment. This Learning Path includes content from the following Packt products: Ceph Cookbook by Michael Hackett, Vikhyat Umrao and Karan Singh Mastering Ceph by Nick Fisk Learning Ceph, Second Edition by Anthony D'Atri, Vaibhav Bhembre and Karan Singh What you will learn Understand the benefits of using Ceph as a storage solution Combine Ceph with OpenStack, Cinder, Glance, and Nova components Set up a test cluster with Ansible and virtual machine with VirtualBox Develop solutions with Librados and shared object classes Configure BlueStore and see its interaction with other configurations Tune, monitor, and recover storage systems effectively Build an erasure-coded pool by selecting intelligent parameters Who this book is for If you are a developer, system administrator, storage professional, or cloud engineer who wants to understand how to deploy a Ceph cluster, this Learning Path is ideal for you. It will help you discover ways in which Ceph features can solve your data storage problems. Basic knowledge of storage systems and GNU/Linux will be beneficial.

Hands-On Deep Learning with Apache Spark

2019-01-31 O'Reilly Amazon

book

Guglielmo Iozzia

data data-engineering apache-spark AI/ML Keras RNNs

"Hands-On Deep Learning with Apache Spark" is an essential resource for mastering distributed deep learning frameworks and applications on Apache Spark. Through practical examples and guided tutorials, this book teaches you to deploy scalable deep learning solutions for handling complex data challenges efficiently. What this Book will help me do Understand how to set up Apache Spark for deep learning workflows. Gain practical insight into implementing neural networks, including CNNs and RNNs, on distributed platforms. Learn to train and optimize models using popular frameworks like TensorFlow and Keras. Develop expertise in analyzing large datasets with textual and image-based deep learning methods. Acquire skills to deploy trained models for real-world applications in distributed environments. Author(s) None Iozzia is an accomplished software engineer and data scientist with a strong background in distributed computing and machine learning. With years of experience working with Apache Spark and deep learning technologies, None brings a wealth of practical knowledge to the table. Their passion for providing clear, hands-on guidance makes this book an approachable and valuable resource for learners of all levels. Who is it for? This book is aimed at Scala developers, data scientists, and data analysts who are looking to extend their skill set to include distributed deep learning on Apache Spark. It's ideally suited for readers familiar with machine learning basics and those with prior exposure to Apache Spark workflows. If you aim to create scalable machine learning solutions that handle complex data, this book offers precisely what you need.

Learning PostgreSQL 11 - Third Edition

2019-01-31 O'Reilly Amazon

book

Christopher Travers , Andrey Volkov

data data-engineering relational-databases postgresql Data Modelling DWH

Immerse yourself in the capabilities of PostgreSQL 11 with this comprehensive beginner's guide. Learning PostgreSQL 11 will take you through relational database fundamentals and advanced database functionality, empowering you to build efficient and scalable database solutions with confidence. By the end of this book, you'll have mastery over PostgreSQL's features to develop, manage, and optimize your own databases. What this Book will help me do Gain a solid understanding of relational database principles and the PostgreSQL ecosystem. Learn to install PostgreSQL, create a database, and design a data model effectively. Develop skills to create, manipulate, and optimize tables, views, and efficient indexes. Utilize server-side programming with PL/pgSQL and advanced data types like JSONB. Enhance database reliability and performance, and connect to your Python applications seamlessly. Author(s) Christopher Travers and None Volkov bring their collective expertise and practical experience to this book. Christopher has a strong background in software development and database systems, with years of hands-on involvement with PostgreSQL. None has contributed significantly to innovative database solutions, emphasizing clear and actionable instructions. Together, they aim to demystify PostgreSQL for learners of all backgrounds. Who is it for? This book is crafted for developers, database administrators, and tech enthusiasts who want to delve into PostgreSQL. Beginners with no prior database experience will find its approach accessible, while those aiming to enhance their skills with PostgreSQL's latest features will benefit immensely. It's ideal for anyone seeking to build solid database or data warehousing applications with modern capabilities and best practices.

QGIS Quick Start Guide

2019-01-31 O'Reilly Amazon

book

Andrew Cutts

data data-engineering location-data geographic-information-system-gis geographic information system (gis) GIS

QGIS Quick Start Guide is your hands-on introduction to QGIS 3.4, an open-source Geographic Information System (GIS) software. Through step-by-step instructions, you'll explore creating, loading, and styling geospatial data while developing the skills to generate professional-grade maps and analyses. What this Book will help me do Understand and navigate QGIS 3.4's interface and core functionality. Import and manage spatial datasets for analysis and visualization. Create custom-styled vector and raster layers for effective presentation. Design detailed and aesthetically pleasing maps for various uses. Extend QGIS with plugins and automate tasks using modeling tools. Author(s) The author, None Cutts, is a passionate GIS professional with years of experience working with geographic data and related software. With a strong focus on making GIS knowledge accessible, they provide practical advice and workflows in an easy-to-follow manner, helping readers achieve proficiency with a powerful tool like QGIS. Who is it for? This book is perfect for GIS beginners and intermediate-level users looking to develop hands-on skills with QGIS 3.4. If you're a student, geospatial analyst, or enthusiast aiming to master an accessible and robust GIS toolset, this guide aligns with your learning goals. Gain the confidence to analyze geographic data and create compelling maps as part of your studies or projects.

Best Practices Guide for Databases on IBM FlashSystem

2019-01-30 O'Reilly Amazon

book

Jagadeesh Papaiah

data data-engineering IBM

Best Practices Guide for Databases on IBM FlashSystem

DS8000 Global Mirror Best Practices

2019-01-11 O'Reilly Amazon

book

Nick Clayton , Peter Klee , Robert Tondini , Alcides Bertazi , Bert Dufrasne

data data-engineering IBM

This IBM® Redpaper™ publication reviews the architecture and operations of the IBM DS8000® Global Mirror function. The document looks at different aspects of the solution in terms of performance, infrastructure requirements, data integrity, business continuity, and impact on production. Hints and tips are provided on how to best configure the overall Global Mirror environment, in terms of connectivity, storage configuration, and specific parameters tuning. The guidelines that are provided are in general related to performance, which ultimately ensures a better recovery point objective (RPO). Therefore, we encourage you to follow those guidelines.

IBM Spectrum Scale and IBM StoredIQ: Identifying and securing your business data to support regulatory requirements

2019-01-11 O'Reilly Amazon

book

Atul V Gore , Nils Haustein , Sasikanth Eda , Sandeep R Patil

data data-engineering IBM GDPR/CCPA Cyber Security

Having the appropriate storage for hosting business critical data and the proper analytic software for deep inspection of that data is becoming necessary to get deeper insights into the data so that users can categorize which data qualifies for compliance. This IBM® Redpaper™ publication explains why the storage features of IBM Spectrum™ Scale, when combined with the data analysis and categorization features of IBM StoredIQ®, provide an excellent platform for hosting unstructured business data that is subject to regulatory compliance guidelines, such as General Data Protection Regulation (GDPR). In this paper, we describe how IBM StoredIQ can be used to identify files that are stored in an IBM Spectrum Scale™ file system that include personal information, such as phone numbers. These files can be secured in another file system partition by encrypting those files by using IBM Spectrum Scale functions. Encrypting files prevents unauthorized access to those files because only users that can access the encryption key can decrypt those files. This paper is intended for chief technology officers, solution, and security architects and systems administrators.

Java XML and JSON: Document Processing for Java SE

2019-01-10 O'Reilly Amazon

book

Jeff Friesen

data data-engineering storage-formats XML API Java

Use this guide to master the XML metalanguage and JSON data format along with significant Java APIs for parsing and creating XML and JSON documents from the Java language. New in this edition is coverage of Jackson (a JSON processor for Java) and Oracle’s own Java API for JSON processing (JSON-P), which is a JSON processing API for Java EE that also can be used with Java SE. This new edition of Java XML and JSON also expands coverage of DOM and XSLT to include additional API content and useful examples. All examples in this book have been tested under Java 11. In some cases, source code has been simplified to use Java 11’s var language feature. The first six chapters focus on XML along with the SAX, DOM, StAX, XPath, and XSLT APIs. The remaining six chapters focus on JSON along with the mJson, GSON, JsonPath, Jackson, and JSON-P APIs. Each chapter ends with select exercises designed to challenge your grasp of the chapter's content.An appendix provides the answers to these exercises. What You'll Learn Master the XML language Create, validate, parse, and transform XML documents Apply Java’s SAX, DOM, StAX, XPath, and XSLT APIs Master the JSON format for serializing and transmitting data Code against third-party APIs such as Jackson, mJson, Gson, JsonPath Master Oracle’s JSON-P API in a Java SE context Who This Book Is For Intermediate and advanced Java programmers who are developing applications that must access data stored in XML or JSON documents. The book also targets developers wanting to understand the XML language and JSON data format.

IBM Z Connectivity Handbook

2019-01-03 O'Reilly Amazon

book

Octavian Lascu

data data-engineering IBM

This IBM® Redbooks® publication describes the connectivity options that are available for use within and beyond the data center for the IBM Z family of mainframes, which includes these systems: IBM z14® IBM z14 Model ZR1 IBM z13® IBM z13s™ IBM zEnterprise® EC12 (zEC12) IBM zEnterprise BC12 (zBC12) This book highlights the hardware and software components, functions, typical uses, coexistence, and relative merits of these connectivity features. It helps readers understand the connectivity alternatives that are available when planning and designing their data center infrastructures. The changes to this edition are based on the IBM Z hardware announcement dated April 10, 2018. This book is intended for data center planners, IT professionals, systems engineers, and network planners who are involved in the planning of connectivity solutions for IBM mainframes.

IBM DS8880 Product Guide (Release 8.51)

2019-01-02 O'Reilly Amazon

book

Peter Kimmel , Tamas Toser , Stephen Manthorpe , Bert Dufrasne

data data-engineering IBM

This IBM Redbooks® Product Guide gives an overview of the features and functions that are available with the IBM DS8880 models running microcode Release 8.51 (DS8000 License Machine Code 8.8.51.xx.xx). The IBM DS8880 architecture relies on powerful IBM POWER8® processor-based servers that manage the cache to streamline disk input/output (I/O), maximizing performance and throughput. These capabilities are further enhanced with the availability of the second generation of high-performance flash enclosures (HPFE Gen-2). The IBM DS8888, DS8886, and DS8884 models excel at supporting the IBM Z Enterprise server and IBM Power server environments, offering many synergy features.

Apache Kafka Quick Start Guide

2018-12-27 O'Reilly Amazon

book

Raúl Estrada

data data-engineering streaming-messaging Kafka Java Data Streaming

Dive into the world of Apache Kafka with this concise guide that focuses on its practical use for real-time data processing in distributed systems. You'll explore Kafka's capabilities, covering essentials like configuration, messaging, serialization, and handling complex data streams using Kafka Streams and KSQL. By the end, you'll be equipped to tackle real-world streaming challenges confidently. What this Book will help me do Understand how to set up and configure Apache Kafka for real-time processing environments. Master key concepts like message validation, enrichment, and serialization. Learn to use the Schema Registry for data validation and versioning. Gain hands-on experience with data streaming and aggregation using Kafka Streams. Develop skills in using KSQL for data manipulation and stream querying. Author(s) None Estrada is an experienced software engineer with a deep understanding of distributed systems and real-time data processing. With expertise in Apache Kafka and other event-streaming platforms, Estrada approaches technical writing with an emphasis on clarity and practical application. Their passion for helping developers achieve success is reflected in their authoritative yet approachable style. Who is it for? This book is perfect for software engineers and backend developers interested in mastering real-time data processing using Apache Kafka. It is designed for readers who are eager to solve practical problems in distributed systems, irrespective of whether they have prior Kafka experience. Some familiarity with Java or other JVM languages will be helpful, although not strictly necessary. This is an ideal resource for learners seeking a hands-on, practical approach to Apache Kafka.

Dynamic SQL: Applications, Performance, and Security in Microsoft SQL Server

2018-12-27 O'Reilly Amazon

book

Edward Pollack

data data-engineering relational-databases microsoft-sql-server Analytics BI

Take a deep dive into the many uses of dynamic SQL in Microsoft SQL Server. This edition has been updated to use the newest features in SQL Server 2016 and SQL Server 2017 as well as incorporating the changing landscape of analytics and database administration. Code examples have been updated with new system objects and functions to improve efficiency and maintainability. Executing dynamic SQL is key to large-scale searching based on user-entered criteria. Dynamic SQL can generate lists of values and even code with minimal impact on performance. Dynamic SQL enables dynamic pivoting of data for business intelligence solutions as well as customizing of database objects. Yet dynamic SQL is feared by many due to concerns over SQL injection or code maintainability. Dynamic SQL: Applications, Performance, and Security in Microsoft SQL Server helps you bring the productivity and user-satisfaction of flexible and responsive applications to your organization safely and securely. Your organization’s increased ability to respond to rapidly changing business scenarios will build competitive advantage in an increasingly crowded and competitive global marketplace. With a focus on new applications and modern database architecture, this edition illustrates that dynamic SQL continues to evolve and be a valuable tool for administration, performance optimization, and analytics. What You'ill Learn Build flexible applications that respond to changing business needs Take advantage of creative, innovative, and productive uses of dynamic SQL Know about SQL injection and be confident in your defenses against it Address performance concerns in stored procedures and dynamic SQL Troubleshoot and debug dynamic SQL to ensure correct results Automate your administration of features within SQL Server Who This Book is For Developers and database administrators looking to hone and build their T-SQL coding skills. The book is ideal for developers wanting to plumb the depths of application flexibility and troubleshoot performance issues involving dynamic SQL. The book is also ideal for programmers wanting to learn what dynamic SQL is about and how it can help them deliver competitive advantage to their organizations.

BizTalk Server 2016: Performance Tuning and Optimization

2018-12-26 O'Reilly Amazon

book

Agustín Mántaras

data data-engineering streaming-messaging enterprise-service-bus microsoft-biztalk-server

Gain an in depth view of optimizing the performance of BizTalk Server. This book provides best practices and techniques for improving development of high mission critical solutions. You'll see how the BizTalk Server engine works and how to proactively detect and remedy potential bottlenecks before they occur. The book starts with an overview of the BizTalk Server internal mechanisms that will help you understand the optimizations detailed throughout the book. You'll then see how the mechanisms can be applied to a BizTalk Server environment to improve low and high latency throughput scenarios. A section on testing BizTalk server solutions will guide you through the most frequently adopted techniques used to develop solutions such as performance and unit testing as part of the development cycle. With BizTalk Server 2016 you'll see how to apply side-by-side versioning to your solutions to reduce the chances of downtime, You'll also review instrumentation techniques using Event Traces for windows and business activity monitoring (BAM). While the book is focused on the latest version of BizTalk Server, most of the topics discussed will also work with BizTalk Server 2013R2. What You'll Learn Review BizTalk Server internals and how the message engine works Understand BizTalk Server architecture Gather and analyze BizTalk Server performance data Develop BizTalk Server performance solutions Use advanced troubleshooting tools to help diagnose your platform Who This Book Is For Those who have strong BizTalk and .NET Framework knowledge and want to get their BizTalk Server knowledge to the next level

Machine Learning with Apache Spark Quick Start Guide

2018-12-26 O'Reilly Amazon

book

Jillur Quddus

data data-engineering apache-spark AI/ML Analytics Big Data

"Machine Learning with Apache Spark Quick Start Guide" introduces you to the fundamental concepts and tools needed to harness the power of Apache Spark for data processing and machine learning. This book combines practical examples and real-world scenarios to show you how to manage big data efficiently while uncovering actionable insights through advanced analytics. What this Book will help me do Understand the role of Apache Spark in the big data ecosystem. Set up and configure an Apache Spark development environment. Learn and implement supervised and unsupervised learning models using Spark MLlib. Apply advanced analytical algorithms to real-world big data problems. Develop and deploy real-time machine learning pipelines with Apache Spark. Author(s) None Quddus is an experienced practitioner in the fields of big data, distributed technologies, and machine learning. With a career dedicated to using advanced analytics to solve real-world problems, Quddus brings practical expertise to each topic addressed. Their approachable writing style ensures readers can apply concepts effectively, even in complex scenarios. Who is it for? This book is ideal for business analysts, data analysts, and data scientists who are eager to gain hands-on experience with big data technologies. Whether you are new to Apache Spark or looking to expand your knowledge of its machine learning capabilities, this guide provides the tools and insights necessary to achieve those goals. Technical professionals wanting to develop their skills in processing and analyzing big data will find this resource invaluable.

Fast Data Architectures for Streaming Applications, 2nd Edition

2018-12-25 O'Reilly Amazon

book

Dean Wampler

data data-engineering streaming-messaging Kafka Flink Big Data

Why have stream-oriented data systems become so popular, when batch-oriented systems have served big data needs for many years? In the updated edition of this report, Dean Wampler examines the rise of streaming systems for handling time-sensitive problems—such as detecting fraudulent financial activity as it happens. You’ll explore the characteristics of fast data architectures, along with several open source tools for implementing them. Batch processing isn’t going away, but exclusive use of these systems is now a competitive disadvantage. You’ll learn that, while fast data architectures using tools such as Kafka, Akka, Spark, and Flink are much harder to build, they represent the state of the art for dealing with mountains of data that require immediate attention. Learn how a basic fast data architecture works, step-by-step Examine how Kafka’s data backplane combines the best abstractions of log-oriented and message queue systems for integrating components Evaluate four streaming engines, including Kafka Streams, Akka Streams, Spark, and Flink Learn which streaming engines work best for different use cases Get recommendations for making real-world streaming systems responsive, resilient, elastic, and message driven Explore an example IoT streaming application that includes telemetry ingestion and anomaly detection

Apache Spark 2: Data Processing and Real-Time Analytics

2018-12-21 O'Reilly Amazon

book

Romeo Kienzler , Sridhar Alla , Md. Rezaul Karim , Siamak Amirghodsi

data data-engineering apache-spark AI/ML Analytics Big Data

Build efficient data flow and machine learning programs with this flexible, multi-functional open-source cluster-computing framework Key Features Master the art of real-time big data processing and machine learning Explore a wide range of use-cases to analyze large data Discover ways to optimize your work by using many features of Spark 2.x and Scala Book Description Apache Spark is an in-memory, cluster-based data processing system that provides a wide range of functionalities such as big data processing, analytics, machine learning, and more. With this Learning Path, you can take your knowledge of Apache Spark to the next level by learning how to expand Spark's functionality and building your own data flow and machine learning programs on this platform. You will work with the different modules in Apache Spark, such as interactive querying with Spark SQL, using DataFrames and datasets, implementing streaming analytics with Spark Streaming, and applying machine learning and deep learning techniques on Spark using MLlib and various external tools. By the end of this elaborately designed Learning Path, you will have all the knowledge you need to master Apache Spark, and build your own big data processing and analytics pipeline quickly and without any hassle. This Learning Path includes content from the following Packt products: Mastering Apache Spark 2.x by Romeo Kienzler Scala and Spark for Big Data Analytics by Md. Rezaul Karim, Sridhar Alla Apache Spark 2.x Machine Learning Cookbook by Siamak Amirghodsi, Meenakshi Rajendran, Broderick Hall, Shuen MeiCookbook What you will learn Get to grips with all the features of Apache Spark 2.x Perform highly optimized real-time big data processing Use ML and DL techniques with Spark MLlib and third-party tools Analyze structured and unstructured data using SparkSQL and GraphX Understand tuning, debugging, and monitoring of big data applications Build scalable and fault-tolerant streaming applications Develop scalable recommendation engines Who this book is for If you are an intermediate-level Spark developer looking to master the advanced capabilities and use-cases of Apache Spark 2.x, this Learning Path is ideal for you. Big data professionals who want to learn how to integrate and use the features of Apache Spark and build a strong big data pipeline will also find this Learning Path useful. To grasp the concepts explained in this Learning Path, you must know the fundamentals of Apache Spark and Scala.

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

Mastering Ceph - Second Edition

Walmart and the CICS Asynchronous API: An Adoption Experience

Mastering Hadoop 3

IBM DS8880 Architecture and Implementation (Release 8.51)

IBM FlashSystem A9000 and A9000R Architecture and Implementation (Version 12.3.1)

IBM Elastic Storage Server Implementation Guide for Version 5.3

Using the IBM Spectrum Accelerate Family in VMware Environments: IBM XIV, IBM FlashSystem A9000 and IBM FlashSystem A9000R, and IBM Spectrum Accelerate

Advanced MySQL 8

Apache Spark Quick Start Guide

Ceph: Designing and Implementing Scalable Storage Systems

Hands-On Deep Learning with Apache Spark

Learning PostgreSQL 11 - Third Edition

QGIS Quick Start Guide

Best Practices Guide for Databases on IBM FlashSystem

DS8000 Global Mirror Best Practices

IBM Spectrum Scale and IBM StoredIQ: Identifying and securing your business data to support regulatory requirements

Java XML and JSON: Document Processing for Java SE

IBM Z Connectivity Handbook

IBM DS8880 Product Guide (Release 8.51)

Apache Kafka Quick Start Guide

Dynamic SQL: Applications, Performance, and Security in Microsoft SQL Server

BizTalk Server 2016: Performance Tuning and Optimization

Machine Learning with Apache Spark Quick Start Guide

Fast Data Architectures for Streaming Applications, 2nd Edition

Apache Spark 2: Data Processing and Real-Time Analytics