O'Reilly Data Engineering Books

SQL Server Advanced Data Types: JSON, XML, and Beyond

2018-08-23 O'Reilly Amazon

book

Peter A Carter

data data-engineering relational-databases microsoft-sql-server GIS JSON

Deliver advanced functionality faster and cheaper by exploiting SQL Server's ever-growing amount of built-in support for modern data formats. Learn about the growing support within SQL Server for operations and data transformations that have previously required third-party software and all the associated licensing and development costs. Benefit through a better understanding of what can be done inside the database engine with no additional costs or development time invested in outside software. Widely used types such as JSON and XML are well-supported by the database engine. The same is true of hierarchical data and even temporal data. Knowledge of these advanced types is crucial to unleashing the full power that's available from your organization's SQL Server database investment. SQL Server Advanced Data Types explores each of the complex data types supplied within SQL Server. Common usage scenarios for eachcomplex data type are discussed, followed by a detailed discussion on how to work with each data type. Each chapter demystifies the complex data and you learn how to use the data types most efficiently. The book offers a practical guide to working with complex data, using real-world examples to demonstrate how each data type can be leveraged. Performance considerations are also discussed, including the implementation of special indexes such as XML indexes and spatial indexes. What You'll Learn Understand the implementation of basic data types and why using the correct type is so important Work with XML data through the XML data type Construct XML data from relational result sets Store and manipulate JSON data using the JSON data type Model and analyze spatial data for geographic information systems Define hierarchies and query them efficiently through the HierarchyID type Who This Book Is For SQL Server developers and application developers who need to store and access complex data structures

Beginning Apache Spark 2: With Resilient Distributed Datasets, Spark SQL, Structured Streaming and Spark Machine Learning library

2018-08-16 O'Reilly Amazon

book

Hien Luu

data data-engineering apache-spark AI/ML Analytics Big Data

Develop applications for the big data landscape with Spark and Hadoop. This book also explains the role of Spark in developing scalable machine learning and analytics applications with Cloud technologies. Beginning Apache Spark 2 gives you an introduction to Apache Spark and shows you how to work with it. Along the way, you’ll discover resilient distributed datasets (RDDs); use Spark SQL for structured data; and learn stream processing and build real-time applications with Spark Structured Streaming. Furthermore, you’ll learn the fundamentals of Spark ML for machine learning and much more. After you read this book, you will have the fundamentals to become proficient in using Apache Spark and know when and how to apply it to your big data applications. What You Will Learn Understand Spark unified data processing platform Howto run Spark in Spark Shell or Databricks Use and manipulate RDDs Deal with structured data using Spark SQL through its operations and advanced functions Build real-time applications using Spark Structured Streaming Develop intelligent applications with the Spark Machine Learning library Who This Book Is For Programmers and developers active in big data, Hadoop, and Java but who are new to the Apache Spark platform.

SAP HANA and ESS: A Winning Combination

2018-08-16 O'Reilly Amazon

book

Olaf Weiser

data data-engineering relational-databases sap-hana ELK IBM

SAP HANA on IBM® POWER® is an established HANA solution with which customers can run HANA-based analytic and business applications on a flexible IBM Power based infrastructure. IT assets, such as servers, storage, and skills and operation procedures, can easily be used and reused instead of enforcing more investment into dedicated SAP HANA only appliances. In this scenario, IBM Spectrum™ Scale as the underlying block storage and files system adds further benefits to this solution stack to take advantage of scale effects, higher availability, simplification, and performance. With the IBM Elastic Storage™ Server (ESS) based on IBM Spectrum Scale™, RAID capabilities are added to the file system. By using the intelligent internal logic of the IBM Spectrum Scale RAID code, reasonable performance and significant disk failure recovery improvements are achieved. This IBM Redpaper™ publication focuses on the benefits and advantages of implementing a HANA solution on top of IBM Spectrum Scale storage file system. This paper is intended to help experienced administrators and IT specialists to plan and set up an IBM Spectrum Scale cluster and configure an ESS for SAP HANA workloads. It provides important tips and bestpreferred practices about how to manage IBM Spectrum Scale''s availability and performance. If you are familiar with ESS, IBM Spectrum Scale, and IBM Spectrum Scale RAID, and you need only the pertinent documentation about how to configure a IBM Spectrum Scale cluster with an ESS for SAP HANA, see Chapter 5, "IBM Spectrum Scale customization for HANA" on page 25. Before reading this IBM Redpaper publication, you should be familiar with the basic concepts of IBM Spectrum Scale and IBM Spectrum Scale RAID. This IBM Redpaper publication can be helpful for architects and specialists who are planning an SAP HANA on POWER deployment with the IBM Spectrum Scale file system. For more information about planning considerations for Power, see the SAP HANA on Power Planning Guide.

Cosmos DB for MongoDB Developers: Migrating to Azure Cosmos DB and Using the MongoDB API

2018-08-09 O'Reilly Amazon

book

Manish Sharma

data data-engineering nosql-databases MongoDB API Azure

Learn Azure Cosmos DB and its MongoDB API with hands-on samples and advanced features such as the multi-homing API, geo-replication, custom indexing, TTL, request units (RU), consistency levels, partitioning, and much more. Each chapter explains Azure Cosmos DB’s features and functionalities by comparing it to MongoDB with coding samples. Cosmos DB for MongoDB Developers starts with an overview of NoSQL and Azure Cosmos DB and moves on to demonstrate the difference between geo-replication of Azure Cosmos DB compared to MongoDB. Along the way you’ll cover subjects including indexing, partitioning, consistency, and sizing, all of which will help you understand the concepts of read units and how this calculation is derived from an existing MongoDB’s usage. The next part of the book shows you the process and strategies for migrating to Azure Cosmos DB. You will learn the day-to-day scenarios of using Azure Cosmos DB, its sizing strategies, and optimizing techniques for the MongoDB API. This information will help you when planning to migrate from MongoDB or if you would like to compare MongoDB to the Azure Cosmos DB MongoDB API before considering the switch. What You Will Learn Migrate to MongoDB and understand its strategies Develop a sample application using MongoDB’s client driver Make use of sizing best practices and performance optimization scenarios Optimize MongoDB’s partition mechanism and indexing Who This Book Is For MongoDB developers who wish to learn Azure Cosmos DB. It specifically caters to a technical audience, working on MongoDB.

Getting Started with IBM zHyperLink for z/OS

2018-08-02 O'Reilly Amazon

book

Marcelo Lopes de Moraes , Preben Esbensen , Lydia Parziale , Jose Gilberto Biondo Junior

data data-engineering IBM

With the pressures to drive transaction processing 24/7 because of online banking and other business demands, IBM® zHyperLink on the IBM DS8880 is making it easy to accelerate transaction processing for the mainframe. This IBM Redpaper™ publication helps you to understand the concepts, business perspectives, and reference architecture of installing, tailoring, and configuring zHyperLink in your own environment.

Expert GeoServer

2018-07-31 O'Reilly Amazon

book

Ben Mearns

data data-engineering location-data geographic-information-system-gis geographic information system (gis) Cloud Computing

"Expert GeoServer" guides readers through the process of building, optimizing, and securing GeoServer-powered web mapping applications. By exploring concepts like spatial analysis platforms, tile caching, and secure authentication, this book equips you to create highly performant and secure geospatial applications. What this Book will help me do Learn to develop spatial analysis platforms using web processing services. Master tile caching to significantly enhance the speed of your mapping applications. Implement secure authentication to protect sensitive geospatial data. Optimize GeoServer for improved performance and resource utilization. Deploy your GeoServer-backed applications on modern cloud-hosting infrastructures. Author(s) None Mearns is an experienced software developer and geospatial technology expert. With a strong background in GeoServer implementation, None has helped organizations optimize and secure their geospatial platforms. Their writing aims to provide clear and actionable instructions for professionals and learners alike. Who is it for? This book is perfect for geospatial developers and professionals aiming to take their GeoServer skills to the next level. A basic understanding of GeoServer is assumed, as this guide tackles advanced topics like performance optimization and security. If you are looking to enhance the speed, usability, and security of your mapping applications, this is for you. Those aiming to confidently deploy production-ready applications will find it invaluable.

Professional Azure SQL Database Administration

2018-07-31 O'Reilly Amazon

book

Ahmad Osama

data data-engineering relational-databases azure-sql-database Azure Cloud Computing

Learn everything you need to manage Azure SQL Database with 'Professional Azure SQL Database Administration'. This book covers critical tasks such as migration, performance optimization, security, and disaster recovery. Perfect for those transitioning to the cloud, it equips you with skills to ensure your database runs smoothly and efficiently. What this Book will help me do Effectively migrate on-premise SQL Server databases to Azure. Master backup, restore, and security operations with Azure SQL Database. Optimize performance and scalability using monitoring and tuning techniques. Implement high availability and disaster recovery strategies. Simplify database management through automation and advanced techniques. Author(s) Ahmad Osama is a seasoned database admin and Azure expert with extensive experience in SQL Server and cloud database management. As a consultant and trainer, he has guided numerous organizations through cloud transitions. Ahmad's teaching philosophy blends practical insights with clear instruction. Who is it for? This book is intended for database administrators and developers looking to transition their skills to Azure SQL Database. If you have some experience with on-premise SQL Server and are familiar with PowerShell, you'll find this guide invaluable. Ideal for those wanting to develop, migrate, or manage Azure SQL solutions.

Introduction to IBM Common Data Provider for z Systems

2018-07-26 O'Reilly Amazon

book

Fabio Riva , Keith Miller , Michael Bonett , Eric Goodson , Domenico D'Alterio , John Strymecki , Matt Hunter , Volkmar Burke Siegemund

data data-engineering IBM Analytics

IBM Common Data Provider for z Systems collects, filters, and formats IT operational data in near real-time and provides that data to target analytics solutions. IBM Common Data Provider for z Systems enables authorized IT operations teams using a single web-based interface to specify the IT operational data to be gathered and how it needs to be handled. This data is provided to both on- and off-platform analytic solutions, in a consistent, consumable format for analysis. This Redpaper discusses the value of IBM Common Data Provider for z Systems, provides a high-level reference architecture for IBM Common Data Provider for z Systems, and introduces key components of the architecture. It shows how IBM Common Data Provider for z Systems provides operational data to various analytic solutions. The publication provides high-level integration guidance, preferred practices, tips on planning for IBM Common Data Provider for z Systems, and example integration scenarios.

Streaming Systems

2018-07-23 O'Reilly Amazon

book

Slava Chernyak , Reuven Lax , Tyler Akidau

data data-engineering streaming-messaging streaming-architecture Big Data SQL

Streaming data is a big deal in big data these days. As more and more businesses seek to tame the massive unbounded data sets that pervade our world, streaming systems have finally reached a level of maturity sufficient for mainstream adoption. With this practical guide, data engineers, data scientists, and developers will learn how to work with streaming data in a conceptual and platform-agnostic way. Expanded from Tyler Akidau’s popular blog posts "Streaming 101" and "Streaming 102", this book takes you from an introductory level to a nuanced understanding of the what, where, when, and how of processing real-time data streams. You’ll also dive deep into watermarks and exactly-once processing with co-authors Slava Chernyak and Reuven Lax. You’ll explore: How streaming and batch data processing patterns compare The core principles and concepts behind robust out-of-order data processing How watermarks track progress and completeness in infinite datasets How exactly-once data processing techniques ensure correctness How the concepts of streams and tables form the foundations of both batch and streaming data processing The practical motivations behind a powerful persistent state mechanism, driven by a real-world example How time-varying relations provide a link between stream processing and the world of SQL and relational algebra

IBM Software-Defined Storage Guide

2018-07-21 O'Reilly Amazon

book

Eric Forestier , Brian Sherman , Christopher D. Maestas , Larry Coyne , Robert Haas , Tony Pearson , Christopher Vollmar , Joe Dain , Antoine Maille , Patrizia Guaitani

data data-engineering IBM Cloud Computing

Today, new business models in the marketplace coexist with traditional ones and their well-established IT architectures. They generate new business needs and new IT requirements that can only be satisfied by new service models and new technological approaches. These changes are reshaping traditional IT concepts. Cloud in its three main variants (Public, Hybrid, and Private) represents the major and most viable answer to those IT requirements, and software-defined infrastructure (SDI) is its major technological enabler. IBM® technology, with its rich and complete set of storage hardware and software products, supports SDI both in an open standard framework and in other vendors' environments. IBM services are able to deliver solutions to the customers with their extensive knowledge of the topic and the experiences gained in partnership with clients. This IBM Redpaper™ publication focuses on software-defined storage (SDS) and IBM Storage Systems product offerings for software-defined environments (SDEs). It also provides use case examples across various industries that cover different client needs, proposed solutions, and results. This paper can help you to understand current organizational capabilities and challenges, and to identify specific business objectives to be achieved by implementing an SDS solution in your enterprise.

Apache Spark Deep Learning Cookbook

2018-07-13 O'Reilly Amazon

book

Ahmed Sherif , Amrith Ravindra , Michal Malohlava , Adnan Masood

data data-engineering apache-spark AI/ML Big Data Keras

Embark on a journey to master distributed deep learning with the "Apache Spark Deep Learning Cookbook". Designed specifically for leveraging the capabilities of Apache Spark, TensorFlow, and Keras, this book offers over 80 problem-solving recipes to efficiently train and deploy state-of-the-art neural networks, addressing real-world AI challenges. What this Book will help me do Set up and configure a working Apache Spark environment optimized for deep learning tasks. Implement distributed training practices for deep learning models using TensorFlow and Keras. Develop and test neural networks such as CNNs and RNNs targeting specific big data problems. Apply Spark's built-in libraries and integrations for enhanced NLP and computer vision applications. Effectively manage and preprocess large datasets using Spark DataFrames for machine learning tasks. Author(s) Authors Ahmed Sherif and None Ravindra bring years of experience in deep learning, Apache Spark use cases, and hands-on practical training. Their collective expertise has contributed to designing this cookbook approach, focusing on clarity and usability for readers tackling challenging machine learning scenarios. Who is it for? This book is ideal for IT professionals, data scientists, and software developers with foundational understanding of machine learning concepts and Apache Spark framework capabilities. If you aim to scale deep learning and integrate efficient computing with Spark's power, this guide is for you. Familiarity with Python will help maximize the book's potential.

Getting Started with Kudu

2018-07-09 O'Reilly Amazon

book

Brock Noland , Mladen Kovacevic , Jean-Marc Spaggiari , Ryan Bosshart

data data-engineering Hadoop kudu Analytics API

Fast data ingestion, serving, and analytics in the Hadoop ecosystem have forced developers and architects to choose solutions using the least common denominator—either fast analytics at the cost of slow data ingestion or fast data ingestion at the cost of slow analytics. There is an answer to this problem. With the Apache Kudu column-oriented data store, you can easily perform fast analytics on fast data. This practical guide shows you how. Begun as an internal project at Cloudera, Kudu is an open source solution compatible with many data processing frameworks in the Hadoop environment. In this book, current and former solutions professionals from Cloudera provide use cases, examples, best practices, and sample code to help you get up to speed with Kudu. Explore Kudu’s high-level design, including how it spreads data across servers Fully administer a Kudu cluster, enable security, and add or remove nodes Learn Kudu’s client-side APIs, including how to integrate Apache Impala, Spark, and other frameworks for data manipulation Examine Kudu’s schema design, including basic concepts and primitives necessary to make your project successful Explore case studies for using Kudu for real-time IoT analytics, predictive modeling, and in combination with another storage engine

Apache Hive Essentials - Second Edition

2018-06-30 O'Reilly Amazon

book

Dayong Du

data data-engineering Hadoop apache-hive Big Data Hive

"Apache Hive Essentials" provides a focused guide to mastering the essential techniques of processing and analyzing big data with Apache Hive. What this Book will help me do Set up and configure a Hive environment for big data analysis. Compose effective queries using Hive's SQL-like language to extract insights. Optimize Hive performance to handle complex datasets efficiently. Implement data security and user-defined functions to extend capabilities. Integrate Hive with Hadoop tools for comprehensive data solutions. Author(s) Dayong Du, the author of "Apache Hive Essentials," has years of experience working with big data technologies and tools. With hands-on expertise in Hadoop and the entire ecosystem, he brings a practical and informed perspective to this complex field. His approach is to make these technologies accessible to developers and analysts of all levels. Who is it for? This book is perfect for data analysts, developers, or professionals familiar with SQL who are looking to start with Apache Hive for big data processing. It is suitable for those acquainted with Hadoop and its environment and want to expand their skills into efficient data querying and management. Readers should have an interest in how to leverage big data tools for real-world solutions.

PySpark Cookbook

2018-06-29 O'Reilly Amazon

book

Denny Lee , Tomasz Drabas

data data-engineering apache-spark PySpark AI/ML Analytics

Dive into the world of big data processing and analytics with the "PySpark Cookbook". This book provides over 60 hands-on recipes for implementing efficient data-intensive solutions using Apache Spark and Python. By mastering these recipes, you'll be equipped to tackle challenges in large-scale data processing, machine learning, and stream analytics. What this Book will help me do Set up and configure PySpark environments effectively, including working with Jupyter for enhanced interactivity. Understand and utilize DataFrames for data manipulation, analysis, and transformation tasks. Develop end-to-end machine learning solutions using the ML and MLlib modules in PySpark. Implement structured streaming and graph-processing solutions to analyze and visualize data streams and relationships. Deploy PySpark applications to the cloud infrastructure efficiently using best practices. Author(s) This book is co-authored by None Lee and None Drabas, who are experienced professionals in data processing and analytics leveraging Python and Apache Spark. With their deep technical expertise and a passion for teaching through practical examples, they aim to make the complex concepts of PySpark accessible to developers of varied experience levels. Who is it for? This book is ideal for Python developers who are keen to delve into the Apache Spark ecosystem. Whether you're just starting with big data or have some experience with Spark, this book provides practical recipes to enhance your skills. Readers looking to solve real-world data-intensive challenges using PySpark will find this resource invaluable.

Streaming Change Data Capture

2018-06-29 O'Reilly Amazon

book

Kevin Petrie , Dan Potter , Itamar Ankorion

data data-engineering storage-repositories data-lake Analytics Cloud Computing

There are many benefits to becoming a data-driven organization, including the ability to accelerate and improve business decision accuracy through the real-time processing of transactions, social media streams, and IoT data. But those benefits require significant changes to your infrastructure. You need flexible architectures that can copy data to analytics platforms at near-zero latency while maintaining 100% production uptime. Fortunately, a solution already exists. This ebook demonstrates how change data capture (CDC) can meet the scalability, efficiency, real-time, and zero-impact requirements of modern data architectures. Kevin Petrie, Itamar Ankorion, and Dan Potter—technology marketing leaders at Attunity—explain how CDC enables faster and more accurate decisions based on current data and reduces or eliminates full reloads that disrupt production and efficiency. The book examines: How CDC evolved from a niche feature of database replication software to a critical data architecture building block Architectures where data workflow and analysis take place, and their integration points with CDC How CDC identifies and captures source data updates to assist high-speed replication to one or more targets Case studies on cloud-based streaming and streaming to a data lake and related architectures Guiding principles for effectively implementing CDC in cloud, data lake, and streaming environments The Attunity Replicate platform for efficiently loading data across all major database, data warehouse, cloud, streaming, and Hadoop platforms

IBM Db2 11.1 Certification Guide

2018-06-28 O'Reilly Amazon

book

Robert (Kent) Collins , Mohankumar Saraswatipura

data data-engineering relational-databases ibm-db2 IBM Cyber Security

Delve into the IBM Db2 11.1 Certification Guide to comprehensively prepare for the IBM C2090-600 exam and master database programming and administration tasks in Db2 environments. Across its insightful chapters, this guide provides practical steps, expert guidance, and over 150 practice questions aimed at ensuring your success. What this Book will help me do Master Db2 server management, including configuration and maintenance tasks, to ensure optimized performance. Implement advanced features such as BLU Acceleration and Db2 pureScale to enhance database functionality. Gain expertise in security protocols, including data encryption and integrity enforcement, for secure database environments. Troubleshoot common Db2 issues using advanced diagnostic tools like db2pd and dsmtop, improving efficiency and uptime. Develop skills in creating and altering database objects, enabling robust database design and management. Author(s) The authors, None Collins and None Saraswatipura, are seasoned database professionals with vast experience in administering and optimizing Db2 environments. Their expertise in guiding students and professionals shines through in the accessible language and practical approach of the book. They bring a blend of theoretical and hands-on insights to ensure learners not only understand but also apply the knowledge effectively. Who is it for? This book is ideal for database administrators, architects, and application developers who are pursuing certification in Db2. It caters to readers with basic Db2 understanding seeking to advance their skills. Whether you're aiming for professional growth or practical expertise, this guide serves your goals by covering certification essentials while enriching your practical knowledge.

Practical Enterprise Data Lake Insights: Handle Data-Driven Challenges in an Enterprise Big Data Lake

2018-06-27 O'Reilly Amazon

book

Saurabh Gupta , Venkata Giri

data data-engineering storage-repositories data-lake Big Data Data Lake

Use this practical guide to successfully handle the challenges encountered when designing an enterprise data lake and learn industry best practices to resolve issues. When designing an enterprise data lake you often hit a roadblock when you must leave the comfort of the relational world and learn the nuances of handling non-relational data. Starting from sourcing data into the Hadoop ecosystem, you will go through stages that can bring up tough questions such as data processing, data querying, and security. Concepts such as change data capture and data streaming are covered. The book takes an end-to-end solution approach in a data lake environment that includes data security, high availability, data processing, data streaming, and more. Each chapter includes application of a concept, code snippets, and use case demonstrations to provide you with a practical approach. You will learn the concept, scope, application, and starting point. What You'll Learn Get to know data lake architecture and design principles Implement data capture and streaming strategies Implement data processing strategies in Hadoop Understand the data lake security framework and availability model Who This Book Is For Big data architects and solution architects

Hortonworks Data Platform with IBM Spectrum Scale: Reference Guide for Building an Integrated Solution

2018-06-26 O'Reilly Amazon

book

Muthu Muthiah , Wei G. Gong , Piyush Chaudhary , Larry Coyne , Yong ZY Zheng , Sandeep R Patil , Pallavi Galgali

data data-engineering IBM Analytics Data Lake Hadoop

This IBM® Redpaper™ publication provides guidance on building an enterprise-grade data lake by using IBM Spectrum™ Scale and Hortonworks Data Platform for performing in-place Hadoop or Spark-based analytics. It covers the benefits of the integrated solution, and gives guidance about the types of deployment models and considerations during the implementation of these models. Hortonworks Data Platform (HDP) is a leading Hadoop and Spark distribution. HDP addresses the complete needs of data-at-rest, powers real-time customer applications, and delivers robust analytics that accelerate decision making and innovation. IBM Spectrum Scale™ is flexible and scalable software-defined file storage for analytics workloads. Enterprises around the globe have deployed IBM Spectrum Scale to form large data lakes and content repositories to perform high-performance computing (HPC) and analytics workloads. It can scale performance and capacity both without bottlenecks.

Big Data Architect???s Handbook

2018-06-21 O'Reilly Amazon

book

Syed Muhammad Fahad Akhtar

data data-engineering Hadoop AI/ML Big Data Cloud Computing

Big Data Architect's Handbook is your comprehensive guide to mastering the art of building sophisticated big data solutions. As you delve into this book, you'll learn to design end-to-end big data pipelines and integrate data from various sources for insightful analysis. What this Book will help me do Understand the Hadoop ecosystem and familiarize yourself with major Apache projects. Make informed decisions when designing cloud infrastructures for big data needs. Gain expertise in analyzing structured and unstructured data using machine learning. Develop skills to implement scalable and efficient big data pipelines. Enhance your ability to visualize and monitor data insights effectively. Author(s) None Akhtar has amassed a wealth of experience in big data architecture and related technologies. With years of hands-on involvement in development, analysis, and implementation of big data systems, None brings a pragmatic and insightful perspective. This passion for educating others about data-driven technologies shines through in a user-first approach to making complex topics accessible. Who is it for? This book caters to aspiring data professionals, software developers, and tech enthusiasts aiming to enhance their expertise in big data. Readers with basic programming and data analysis skills will find the content approachable yet challenging enough to deepen their understanding. If your career goal involves managing, analyzing, and making decisions based on large datasets, this book will help bridge the gap between skill and application.

Introducing the MySQL 8 Document Store

2018-06-20 O'Reilly Amazon

book

Charles Bell

data data-engineering relational-databases MySQL API JSON

Learn the new Document Store feature of MySQL 8 and build applications around a mix of the best features from SQL and NoSQL database paradigms. Don’t allow yourself to be forced into one paradigm or the other, but combine both approaches by using the Document Store. MySQL 8 was designed from the beginning to bridge the gap between NoSQL and SQL. Oracle recognizes that many solutions need the capabilities of both. More specifically, developers need to store objects as loose collections of schema-less documents, but those same developers also need the ability to run structured queries on their data. With MySQL 8, you can do both! Introducing the MySQL 8 Document Store presents new tools and features that make creating a hybrid database solution far easier than ever before. This book covers the vitally important MySQL Document Store, the new X Protocol for developing applications, and a new client shell called the MySQL Shell. Also covered are supporting technologies and concepts such as JSON, schema-less documents, and more. The book gives insight into how features work and how to apply them to get the most out of your MySQL experience. The book covers topics such as: The headline feature in MySQL 8 MySQL's answer to NoSQL New APIs and client protocols What You'll Learn Create NoSQL-style applications by using the Document Store Mix the NoSQL and SQL approaches by using each to its best advantage in a hybrid solution Work with the new X Protocol for application connectivity in MySQL 8 Master the new X Developer Application Programming Interfaces Combine SQL and JSON in the same database and application Migrate existing applications to MySQL Document Store Who This Book Is For Developers and database professionals wanting to learn about the most profound paradigm-changing features of the MySQL 8 Document Store

Designing Fast Data Application Architectures

2018-06-15 O'Reilly Amazon

book

Sean Glover , Stavros Kontopoulos , Gerard Maas

data data-engineering Cassandra Kafka Spark Data Streaming

Today’s digital companies demand real-time insights and immediate action for everything from purchase to fulfillment, recommendation, and more. As a result, many organizations are adopting fast data applications to accelerate the value they extract from data as it flows into the system. With this practical ebook, you’ll learn the common architectural patterns that form the foundation of successful fast data deployments. Engineers from Lightbend identify the key characteristics of fast data architectures, separate them into functional blocks, and show you how to implement those functions using components like those in the SMACK stack—Spark, Mesos, Akka, Cassandra, and Kafka, as well as others. Architects will learn how to choose, combine, and run SMACK stack technologies to build resilient, scalable, and responsive systems that your company requires. This ebook examines: The anatomy of fast data applications: the application model, streaming data sources, processing engines, and data sinks Functional composition of the SMACK stack and extensions The event backbone that connects all the major components of a fast data platform together Compute engines for transforming data into valuable insights Storage systems that form the transition between the fast data domain and client applications Patterns you can use in the data serving layer, including data-driven microservices Container orchestrators in the substrate layer that provide resources to services, frameworks, and applications

Security on IBM z/VSE

2018-06-14 O'Reilly Amazon

book

Ingo Franzki Helmut Hellner Antoinette Kaschner, Joerg Schmidbauer, Heiko Schnell, Klaus-Dieter Wacker

data data-engineering IBM Cyber Security

Abstract One of a firm’s most valuable resources is its data: client lists, accounting data, employee information, and so on. This critical data must be securely managed and controlled, and simultaneously made available to those users authorized to see it. The IBM® z/VSE® system features extensive capabilities to simultaneously share the firm’s data among multiple users and protect them. Threats to this data come from various sources. Insider threats and malicious hackers are not only difficult to detect and prevent, they might be using resources with the business being unaware. This IBM Redbooks® publication was written to assist z/VSE support and security personnel in providing the enterprise with a safe, secure and manageable environment. This book provides an overview of the security that is provided by z/VSE and the processes for the implementation and configuration of z/VSE security components, Basic Security Manager (BSM), IBM CICS® security, TCP/IP security, single sign-on using LDAP, and connector security.

Next-Generation Big Data: A Practical Guide to Apache Kudu, Impala, and Spark

2018-06-12 O'Reilly Amazon

book

Butch Quinto

data data-engineering Alteryx Analytics BI Big Data

Utilize this practical and easy-to-follow guide to modernize traditional enterprise data warehouse and business intelligence environments with next-generation big data technologies. Next-Generation Big Data takes a holistic approach, covering the most important aspects of modern enterprise big data. The book covers not only the main technology stack but also the next-generation tools and applications used for big data warehousing, data warehouse optimization, real-time and batch data ingestion and processing, real-time data visualization, big data governance, data wrangling, big data cloud deployments, and distributed in-memory big data computing. Finally, the book has an extensive and detailed coverage of big data case studies from Navistar, Cerner, British Telecom, Shopzilla, Thomson Reuters, and Mastercard. What You’ll Learn Install Apache Kudu, Impala, and Spark to modernize enterprise data warehouse and business intelligence environments, complete with real-world, easy-to-follow examples, and practical advice Integrate HBase, Solr, Oracle, SQL Server, MySQL, Flume, Kafka, HDFS, and Amazon S3 with Apache Kudu, Impala, and Spark Use StreamSets, Talend, Pentaho, and CDAP for real-time and batch data ingestion and processing Utilize Trifacta, Alteryx, and Datameer for data wrangling and interactive data processing Turbocharge Spark with Alluxio, a distributed in-memory storage platform Deploy big data in the cloud using Cloudera Director Perform real-time data visualization and time series analysis using Zoomdata, Apache Kudu, Impala, and Spark Understand enterprise big data topics such as big data governance, metadata management, data lineage, impact analysis, and policy enforcement, and how to use Cloudera Navigator to perform common data governance tasks Implement big data use cases such as big data warehousing, data warehouse optimization, Internet of Things, real-time data ingestion and analytics, complex event processing, and scalable predictive modeling Study real-world big data case studies from innovative companies, including Navistar, Cerner, British Telecom, Shopzilla, Thomson Reuters, and Mastercard Who This Book Is For BI and big data warehouse professionals interested in gaining practical and real-world insight into next-generation big data processing and analytics using Apache Kudu, Impala, and Spark; and those who want to learn more about other advanced enterprise topics

BizTalk

2018-06-11 O'Reilly Amazon

book

Suren Machiraju , Suraj Gaurav

data data-engineering streaming-messaging enterprise-service-bus microsoft-biztalk-server Azure

Why do businesses continue to use Microsoft’s BizTalk Server as the backbone to integrate line-of-business applications with their trading partners and how do recent changes make it even more effective? With the advent of Azure, we have a unique opportunity to enhance BizTalk functionality including reducing the cost of operations and maintenance. This book offers three solutions for the reader on ways to leverage BizTalk to get more from existing deployments or find ways to modernize the deployment via Azure. Microsoft partners are playing a significant role in enhancing the capabilities of BizTalk and this book includes sections that provide an in-depth review of BizTalk 360 © and the WPC HIPAA DB Toolkit ©. Over the recent past, Web 3.0 has also introduced many new concepts and open source technologies and this book covers ways to leverage these to enhance your BizTalk deployment. The authors start with a survey of the existing BizTalk Server – its history, patterns, and state of affairs –and go on to provide an in-depth elaboration of three messaging patterns that customers use for BizTalk; the advantages of updating to SQL Server 2016; a review of partner solutions that enhance BizTalk; and BizTalk with Web 3.0 for custom solutions. The book concludes with a comparison of the three viable BizTalk Azure application solutions that will enable you to make the best choice for your business.

Implementing IBM FlashSystem 900 Model AE3

2018-06-11 O'Reilly Amazon

book

Jim Cioffi Detlef Helmbrecht Jon Herd, Jeffrey Irving, Christian Karpp, Volker Kiemes, Carsten Larsen, Adrian Orban

data data-engineering IBM Analytics Cloud Computing

Abstract Today’s global organizations depend on being able to unlock business insights from massive volumes of data. Now, with IBM® FlashSystem 900 Model AE3, powered by IBM FlashCore® technology, they can make faster decisions based on real-time insights and unleash the power of the most demanding applications, including online transaction processing (OLTP) and analytics databases, virtual desktop infrastructures (VDIs), technical computing applications, and cloud environments. This IBM Redbooks® publication introduces clients to the IBM FlashSystem® 900 Model AE3. It provides in-depth knowledge of the product architecture, software and hardware, implementation, and hints and tips. Also illustrated are use cases that show real-world solutions for tiering, flash-only, and preferred-read, and also examples of the benefits gained by integrating the FlashSystem storage into business environments. This book is intended for pre-sales and post-sales technical support professionals and storage administrators, and for anyone who wants to understand how to implement this new and exciting technology.

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

SQL Server Advanced Data Types: JSON, XML, and Beyond

Beginning Apache Spark 2: With Resilient Distributed Datasets, Spark SQL, Structured Streaming and Spark Machine Learning library

SAP HANA and ESS: A Winning Combination

Cosmos DB for MongoDB Developers: Migrating to Azure Cosmos DB and Using the MongoDB API

Getting Started with IBM zHyperLink for z/OS

Expert GeoServer

Professional Azure SQL Database Administration

Introduction to IBM Common Data Provider for z Systems

Streaming Systems

IBM Software-Defined Storage Guide

Apache Spark Deep Learning Cookbook

Getting Started with Kudu

Apache Hive Essentials - Second Edition

PySpark Cookbook

Streaming Change Data Capture

IBM Db2 11.1 Certification Guide

Practical Enterprise Data Lake Insights: Handle Data-Driven Challenges in an Enterprise Big Data Lake

Hortonworks Data Platform with IBM Spectrum Scale: Reference Guide for Building an Integrated Solution

Big Data Architect???s Handbook

Introducing the MySQL 8 Document Store

Designing Fast Data Application Architectures

Security on IBM z/VSE

Next-Generation Big Data: A Practical Guide to Apache Kudu, Impala, and Spark

BizTalk

Implementing IBM FlashSystem 900 Model AE3