O'Reilly Data Engineering Books

IBM FlashSystem A9000 Product Guide (Version 12.3.2)

2019-06-10 O'Reilly Amazon

book

Lisa Martinez , Francesco Anderloni , Stephen Solewin , Bert Dufrasne , Roger Eriksson

data data-engineering IBM Cloud Computing VMware

This IBM® Redbooks® Product Guide is an overview of the main characteristics, features, and technology that are used in IBM FlashSystem® A9000Model 425, with IBM FlashSystem A9000 Software V12.3.2. Software version 12.3.2, with Hyper-Scale Manager version 5.6 or later, introduces support for VLAN tagging and port trunking. IBM FlashSystem A9000 storage system uses the IBM FlashCore® technology to help realize higher capacity and improved response times over disk-based systems and other competing flash and solid-state drive (SSD)-based storage. The extreme performance of IBM FlashCore technology with a grid architecture and comprehensive data reduction creates one powerful solution. Whether you are a service provider who requires highly efficient management or an enterprise that is implementing cloud on a budget, FlashSystem A9000 provides consistent and predictable microsecond response times and the simplicity that you need. The A9000 features always on data reduction and now offers intelligent capacity management for deduplication. As a cloud optimized solution, FlashSystem A9000 suits the requirements of public and private cloud providers who require features, such as inline data deduplication, multi-tenancy, and quality of service. It also uses powerful software-defined storage capabilities from IBM Spectrum™ Accelerate, such as Hyper-Scale technology, VMware, and storage container integration.

IBM FlashSystem A9000R Product Guide (Version 12.3.2)

2019-06-10 O'Reilly Amazon

book

Lisa Martinez , Francesco Anderloni , Stephen Solewin , Bert Dufrasne , Roger Eriksson

data data-engineering IBM Cloud Computing Cloud Storage

This IBM® Redbooks® Product Guide is an overview of the main characteristics, features, and technology that are used in IBM FlashSystem® A9000R Model 415 and Model 425, with IBM FlashSystem A9000R Software V12.3.2. Software version 12.3.2, with Hyper-Scale Manager version 5.6 or later, introduces support for VLAN tagging and port trunking.. IBM FlashSystem A9000R is a grid-scale, all-flash storage platform designed for industry leaders with rapidly growing cloud storage and mixed workload environments to help drive your business into the cognitive era. FlashSystem A9000R provides consistent, extreme performance for dynamic data at scale, integrating the microsecond latency and high availability of IBM FlashCore® technology. The rack-based offering comes integrated with the world class software features that are built with IBM Spectrum™ Accelerate. For example, comprehensive data reduction, including inline pattern removal, data deduplication, and compression, helps lower total cost of ownership (TCO) while the grid architecture and IBM Hyper-Scale framework simplify and automate storage administration. The A9000R features always on data reduction and now offers intelligent capacity management for deduplication. Ready for the cloud and well-suited for large deployments, FlashSystem A9000R delivers predictable high performance and ultra-low latency, even under heavy workloads with full data reduction enabled. As a result, the grid-scale architecture maintains this performance by automatically self-optimizing workloads across all storage resources without manual intervention.

Managing Your Data Science Projects: Learn Salesmanship, Presentation, and Maintenance of Completed Models

2019-06-07 O'Reilly Amazon

book

Robert de Graaf

data data-engineering data-models Analytics Data Science

At first glance, the skills required to work in the data science field appear to be self-explanatory. Do not be fooled. Impactful data science demands an interdisciplinary knowledge of business philosophy, project management, salesmanship, presentation, and more. In Managing Your Data Science Projects, author Robert de Graaf explores important concepts that are frequently overlooked in much of the instructional literature that is available to data scientists new to the field. If your completed models are to be used and maintained most effectively, you must be able to present and sell them within your organization in a compelling way. The value of data science within an organization cannot be overstated. Thus, it is vital that strategies and communication between teams are dexterously managed. Three main ways that data science strategy is used in a company is to research its customers, assess risk analytics, and log operational measurements. These all require different managerial instincts, backgrounds, and experiences, and de Graaf cogently breaks down the unique reasons behind each. They must align seamlessly to eventually be adopted as dynamic models. Data science is a relatively new discipline, and as such, internal processes for it are not as well-developed within an operational business as others. With Managing Your Data Science Projects, you will learn how to create products that solve important problems for your customers and ensure that the initial success is sustained throughout the product’s intended life. Your users will trust you and your models, and most importantly, you will be a more well-rounded and effectual data scientist throughout your career. Who This Book Is For Early-career data scientists, managers of data scientists, and those interested in entering the fieldof data science

Stream Processing with Apache Spark

2019-06-05 O'Reilly Amazon

book

Francois Garillot , Gerard Maas

data data-engineering apache-spark AI/ML Analytics Flink

Before you can build analytics tools to gain quick insights, you first need to know how to process data in real time. With this practical guide, developers familiar with Apache Spark will learn how to put this in-memory framework to use for streaming data. You’ll discover how Spark enables you to write streaming jobs in almost the same way you write batch jobs. Authors Gerard Maas and François Garillot help you explore the theoretical underpinnings of Apache Spark. This comprehensive guide features two sections that compare and contrast the streaming APIs Spark now supports: the original Spark Streaming library and the newer Structured Streaming API. Learn fundamental stream processing concepts and examine different streaming architectures Explore Structured Streaming through practical examples; learn different aspects of stream processing in detail Create and operate streaming jobs and applications with Spark Streaming; integrate Spark Streaming with other Spark APIs Learn advanced Spark Streaming techniques, including approximation algorithms and machine learning algorithms Compare Apache Spark to other stream processing projects, including Apache Storm, Apache Flink, and Apache Kafka Streams

Obtaining Value from Big Data for Service Systems, Volume II, 2nd Edition

2019-06-03 O'Reilly Amazon

book

Willilam H. Money , J. Alberto Espinosa , Frank Armour , Stephen H. Kaisler

data data-engineering Big Data

Volume II of this series discusses the technology used to implement a big data analysis capability within a service-oriented organization. It discusses the technical architecture necessary to implement a big data analysis capability, some issues and challenges in big data analysis and utilization that an organization will face, and how to capture value from it. It will help readers understand what technology is required for a basic capability and what the expected benefits are from establishing a big data capability within their organization.

Pro SQL Server 2019 Wait Statistics: A Practical Guide to Analyzing Performance in SQL Server

2019-06-03 O'Reilly Amazon

book

Enrico van de Laar

data data-engineering relational-databases microsoft-sql-server SQL

Here is a practical guide for analyzing and troubleshooting SQL Server performance using wait statistics. Learn to identify precisely why your queries are running slowly. Measure the amount of time consumed by each bottleneck so that you can focus attention on making the largest improvements first. This edition is updated to cover analysis of wait statistics inside Query Store, the CXCONSUMER wait event, and to be current with SQL Server 2019. Whether you are new to wait statistics, or already familiar with them, this book provides a deeper understanding on how wait statistics are generated and what they can mean for your SQL Server instance’s performance. Pro SQL Server 2019 Wait Statistics goes beyond the most common wait types into the more complex and performance-threatening wait types. You’ll learn about per-query wait statistics and session-based wait statistics, and the types of problems they each can help you solve. The different wait types are categorized by their area of impact, including CPU, IO, Lock, and many more. The book presents clear examples to help you gain practical knowledge of why and how specific wait times increase or decrease, and how they impact your SQL Server’s performance. After reading this book you won’t want to be without the valuable information that wait statistics provide regarding where you should be spending your limited tuning time to maximize performance and value to your business. What You'll Learn Identify resource bottlenecks in a running SQL Server instance Locate wait statistics information inside DMVs and Query Store Analyze the root cause of sub-optimal performance Diagnose I/O contention and locking contention Benchmark SQL Server performance Lower the wait time of the most popular wait types Who This Book Is For Database administrators who want to identify and resolve performance bottlenecks, those who want to learn more about how the SQL Server engine accesses and uses resources inside SQL Server, and administrators concerned with achieving—and knowing they have achieved—optimal performance

Geospatial Data Science Quick Start Guide

2019-05-31 O'Reilly Amazon

book

Jayakrishnan Vijayaraghavan , Abdishakur Hassan

data data-engineering location-data geographic-information-system-gis geographic information system (gis) Data Science

"Geospatial Data Science Quick Start Guide" provides a practical and effective introduction to leveraging geospatial data in data science. In this book, you will learn techniques for analyzing location-based data, building intelligent models, and performing geospatial operations for various applications. What this Book will help me do Understand the principles and techniques for analyzing geospatial data. Set up Python tools to work effectively with location intelligence. Perform advanced spatial operations such as geocoding and proximity analysis. Develop systems such as geofencing and location-based recommendation engines. Obtain actionable insights by visualizing and processing spatial data effectively. Author(s) Abdishakur Hassan and Jayakrishnan Vijayaraghavan are experts in geospatial analysis. With extensive experience in applying data science to location intelligence, they bring a practical and hands-on approach to coding, teaching, and problem-solving. They are passionate about sharing their knowledge through their clear explanations and structured learning paths. Who is it for? This book is ideal for data scientists interested in integrating geospatial analysis into their models and workflows. It is also suitable for GIS developers looking to enhance existing systems with advanced data analysis capabilities. Readers should have experience with Python and a basic understanding of data science concepts. If location-based data intrigues you, this book is your guide.

Learning Elastic Stack 7.0 - Second Edition

2019-05-31 O'Reilly Amazon

book

Sharath Kumar , Pranav Shukla

data data-engineering search elasticsearch elastic-stack-elk-stack elastic stack (elk stack)

"Learning Elastic Stack 7.0" introduces you to the tools and techniques of Elastic Stack, covering Elasticsearch, Logstash, Beats, and Kibana. With clear explanations and practical examples, this book helps you grasp the 7.0 version's new features and capabilities, empowering you to build and deploy robust, real-time data processing applications. What this Book will help me do Gain the necessary skills to install and configure Elastic Stack for professional use. Master the data handling capabilities of Elasticsearch for distributed search and analytics. Develop expertise in creating data pipelines with Logstash and other ingestion tools. Learn to utilize Kibana to visualize and interpret complex datasets. Acquire knowledge of deploying Elastic Stack solutions both on-premise and in cloud environments. Author(s) Pranav Shukla and Sharath Kumar M N are experienced software engineers and data professionals with a profound knowledge of databases, distributed systems, and cloud architectures. They specialize in educating developers through structured guidance and proven methodologies related to data handling and visualization. Who is it for? This book is designed for software engineers, data analysts, and technical architects interested in learning the Elastic Stack tools from the ground up. Readers familiar with database concepts but new to Elastic Stack will find this book particularly helpful. Advanced users seeking to understand the updates in Elastic Stack 7.0 are also a complementary audience. If you wish to apply Elastic Stack to real-time data processing and analytics, this book provides a strong foundation.

Mastering SAP ABAP

2019-05-31 O'Reilly Amazon

book

Philipp Deth , Wojciech ƒÜwik , Pawe≈Ç Grze≈õkowiak , Wojciech Ciesielski

data data-engineering SAP

Mastering SAP ABAP guides you through learning and applying the powerful SAP ABAP programming language. You will start with foundational concepts of programming within SAP environments and progress towards advanced topics such as UI development with SAPUI5 and optimizing ABAP code performance. What this Book will help me do Master the ABAP programming language, from fundamental constructs to advanced techniques. Learn to design and implement efficient and maintainable SAP applications. Gain expertise in creating modern UIs for SAP systems using SAPUI5. Understand performance optimization techniques for SAP ABAP programs. Acquire skills to handle exceptions and perform robust testing in ABAP. Author(s) The authors, Paweł Grzełkowiak, Philipp Deth, Wojciech Ciesielski, and Wojciech Łuźwik, are seasoned SAP technologists with years of practical experience in development and consulting. Their dedication to clarity and usefulness is evident in this book, where they share their collective expertise. Who is it for? This book is for SAP developers, both budding and experienced, who want to increase their efficiency in ABAP programming. Prior exposure to programming concepts and a desire to understand SAP-specific technologies are required prerequisites. Whether you are delving deeper into your career as an SAP developer or are aiming to bring new technical solutions to your organization, this guide is ideal for you.

Obtaining Value from Big Data for Service Systems, Volume I, 2nd Edition

2019-05-29 O'Reilly Amazon

book

Willilam H. Money , J. Alberto Espinosa , Frank Armour , Stephen H. Kaisler

data data-engineering Big Data

This volume will assist readers in fitting big data analysis into their service-based organizations. Volume I of this two-volume series focuses on the role of big data in service delivery systems. It discusses the definition and orientation to big data, applications of it in service delivery systems, how to obtain results that can affect/enhance service delivery, and how to build an effective big data organization. This volume will assist readers in fitting big data analysis into their service-based organizations. It will also help readers understand how to improve the use of big data to enhance their service-oriented organizations.

Electronic Health Records with Epic and IBM FlashSystem 9100 Blueprint Version 2 Release 1

2019-05-09 O'Reilly Amazon

book

IBM

data data-engineering IBM

This information is intended to facilitate the deployment of IBM® FlashSystem for the Epic Corporation electronic health record (EHR) solution by describing the requirements and specifications for configuring IBM FlashSystem® 9100 and its parameters. The document also describes the steps that are required to configure the server that host the EHR application. To complete the tasks, you must have a working knowledge of IBM FlashSystem 9100 and Epic applications. The information in this document is distributed on an "as is" basis, without any warranty that is either expressed or implied. Support assistance for the use of this material is limited to situations where IBM FlashSystem storage devices are supported and entitled and where the issues are not specific to a blueprint implementation.

Pro Oracle SQL Development: Best Practices for Writing Advanced Queries

2019-05-09 O'Reilly Amazon

book

Jon Heller

data data-engineering SQL Oracle

Write SQL statements that are more powerful, simpler, and faster using Oracle SQL and its full range of features. This book provides a clearer way of thinking about SQL by building sets, and provides practical advice for using complex features while avoiding anti-patterns that lead to poor performance and wrong results. Relevant theories, real-world best practices, and style guidelines help you get the most out of Oracle SQL. Pro Oracle SQL Development is for anyone who already knows Oracle SQL and is ready to take their skills to the next level. Many developers, analysts, testers, and administrators use Oracle databases frequently, but their queries are limited because they do not have the knowledge, experience, or right environment to help them take full advantage of Oracle’s advanced features. This book will inspire you to achieve more with your Oracle SQL statements through tips for creating your own style for writing simple, yet powerful, SQL. It teaches you how to think about and solve performance problems in Oracle SQL, and covers advanced topics and shows you how to become an Oracle expert. What You'll Learn Understand the power of Oracle SQL and where to apply it Create a database development environment that is simple, scalable, and conducive to learning Solve complex problems that were previously solved in a procedural language Write large Oracle SQL statements that are powerful, simple, and fast Apply coding styles to make your SQL statements more readable Tune large Oracle SQL statements to eliminate and avoid performance problems Who This Book Is For Developers, testers, analysts, and administrators who want to harness the full power of Oracle SQL to solve their problems as simply and as quickly as possible. For traditional database professionals the book offers new ways of thinking about the language they have used for so long. For modern full stack developers the book explains how a database can be much more than simply a place to store data.

Loss Models, 5th Edition

2019-05-07 O'Reilly Amazon

book

Stuart A. Klugman , Harry H. Panjer , Gordon E. Willmot

data data-engineering data-models

A guide that provides in-depth coverage of modeling techniques used throughout many branches of actuarial science, revised and updated Now in its fifth edition, Loss Models: From Data to Decisions puts the focus on material tested in the Society of Actuaries (SOA) newly revised Exams STAM (Short-Term Actuarial Mathematics) and LTAM (Long-Term Actuarial Mathematics). Updated to reflect these exam changes, this vital resource offers actuaries, and those aspiring to the profession, a practical approach to the concepts and techniques needed to succeed in the profession. The techniques are also valuable for anyone who uses loss data to build models for assessing risks of any kind. Loss Models contains a wealth of examples that highlight the real-world applications of the concepts presented, and puts the emphasis on calculations and spreadsheet implementation. With a focus on the loss process, the book reviews the essential quantitative techniques such as random variables, basic distributional quantities, and the recursive method, and discusses techniques for classifying and creating distributions. Parametric, non-parametric, and Bayesian estimation methods are thoroughly covered. In addition, the authors offer practical advice for choosing an appropriate model. This important text: • Presents a revised and updated edition of the classic guide for actuaries that aligns with newly introduced Exams STAM and LTAM • Contains a wealth of exercises taken from previous exams • Includes fresh and additional content related to the material required by the Society of Actuaries (SOA) and the Canadian Institute of Actuaries (CIA) • Offers a solutions manual available for further insight, and all the data sets and supplemental material are posted on a companion site Written for students and aspiring actuaries who are preparing to take the SOA examinations, Loss Models offers an essential guide to the concepts and techniques of actuarial science.

Data Science from Scratch, 2nd Edition

2019-05-06 O'Reilly Amazon

book

Joel Grus

data data-science AI/ML Data Science NLP Python

To really learn data science, you should not only master the tools—data science libraries, frameworks, modules, and toolkits—but also understand the ideas and principles underlying them. Updated for Python 3.6, this second edition of Data Science from Scratch shows you how these tools and algorithms work by implementing them from scratch. If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with the hacking skills you need to get started as a data scientist. Packed with new material on deep learning, statistics, and natural language processing, this updated book shows you how to find the gems in today’s messy glut of data. Get a crash course in Python Learn the basics of linear algebra, statistics, and probability—and how and when they’re used in data science Collect, explore, clean, munge, and manipulate data Dive into the fundamentals of machine learning Implement models such as k-nearest neighbors, Naïve Bayes, linear and logistic regression, decision trees, neural networks, and clustering Explore recommender systems, natural language processing, network analysis, MapReduce, and databases

IBM GDPS Family: An Introduction to Concepts and Capabilities

2019-05-06 O'Reilly Amazon

book

Lydia Parziale

data data-engineering IBM

This IBM® Redbooks® publication presents an overview of the IBM Geographically Dispersed Parallel Sysplex™ (IBM GDPS®) offerings and the roles they play in delivering a business IT resilience solution. The book begins with general concepts of business IT resilience and disaster recovery, along with issues related to high application availability, data integrity, and performance. These topics are considered within the framework of government regulation, increasing application and infrastructure complexity, and the competitive and rapidly changing modern business environment. Next, it describes the GDPS family of offerings with specific reference to how they can help you achieve your defined goals for disaster recovery and high availability. Also covered are the features that simplify and enhance data replication activities, the prerequisites for implementing each offering, and tips for planning for the future and immediate business requirements. Tables provide easy-to-use summaries and comparisons of the offerings. The extra planning and implementation services available from IBM also are explained. Then, several practical client scenarios and requirements are described, along with the most suitable GDPS solution for each case. The introductory chapters of this publication are intended for a broad technical audience, including IT System Architects, Availability Managers, Technical IT Managers, Operations Managers, System Programmers, and Disaster Recovery Planners. The subsequent chapters provide more technical details about the GDPS offerings, and each can be read independently for those readers who are interested in specific topics. Therefore, if you read all of the chapters, be aware that some information is intentionally repeated.

Learn T-SQL Querying

2019-05-03 O'Reilly Amazon

book

Pam Lahoud , Pedro Lopes

data data-engineering SQL Azure Microsoft

Dive into the world of T-SQL with 'Learn T-SQL Querying,' a book designed to enhance your database querying skills and help you master Microsoft's SQL Server and Azure SQL Database. Through this guide, you'll explore best practices, learn advanced techniques for analyzing execution plans, and create efficient T-SQL queries. What this Book will help me do Understand the fundamentals of query optimization to write performant T-SQL queries. Analyze query execution plans to identify and troubleshoot performance issues effectively. Utilize dynamic management views and functions to monitor and optimize query performance. Implement features like Query Store to streamline troubleshooting and maintain performance changes. Avoid common T-SQL anti-patterns and embrace best practices to ensure scalable query design. Author(s) Pedro Lopes and None Lahoud bring years of expertise in SQL Server and database systems. Pedro has extensive experience as a database engineer, where he specializes in query processing and optimization. None has a deep understanding of T-SQL development, focusing on practical solutions. Together, they provide in-depth insights and actionable advice. Who is it for? This book is perfect for database administrators, database developers, and data analysts at any level looking to improve their T-SQL expertise. Beginners will gain foundational skills in T-SQL querying, while experienced professionals will find advanced strategies for optimizing SQL Server performance. Readers aiming to master both practical querying and troubleshooting will benefit the most.

PostgreSQL 11 Administration Cookbook

2019-05-03 O'Reilly Amazon

book

Simon Riggs , Gianni Ciolli , Sheldon Strauch , Sudheer Kumar Meesala

data data-engineering relational-databases postgresql Cyber Security

Discover practical solutions for administering PostgreSQL 11 databases in "PostgreSQL 11 Administration Cookbook." This recipe-style book provides actionable, step-by-step guidance for efficiently managing PostgreSQL databases, leveraging its features, and optimizing performance. You'll gain comprehensive knowledge to troubleshoot, maintain, and enhance enterprise database systems. What this Book will help me do Understand and implement robust database backup and recovery techniques. Improve the performance of PostgreSQL solutions through expert tuning and diagnostics. Master high availability and replication strategies for PostgreSQL 11. Use hands-on recipes to enhance PostgreSQL security and user management. Learn efficient database management techniques for production environments. Author(s) Simon Riggs, an experienced database architect, along with co-authors Gianni Ciolli and None Meesala, brings years of PostgreSQL expertise to this book. Their collaborative effort ensures a practical yet comprehensive approach to PostgreSQL 11. With rich industry experience, they provide readers with valuable insights to address real-world database challenges. Who is it for? The ideal readers are database administrators, architects, or developers working with PostgreSQL databases. This book is perfect for professionals seeking actionable solutions to PostgreSQL 11 challenges. Prior PostgreSQL knowledge will enhance the learning experience and practical application. If managing and optimizing databases is your goal, this book is tailored for you.

IBM High-Performance Computing Insights with IBM Power System AC922 Clustered Solution

2019-05-02 O'Reilly Amazon

book

Dino Quintero , Ahmad Y Hussein , Jan-Frode Myklebust , Miguel Gomez Gonzalez

data data-engineering IBM

This IBM® Redbooks® publication documents and addresses topics to set up a complete infrastructure environment and tune the applications to use an IBM POWER9™ hardware architecture with the technical computing software stack. This publication is driven by a CORAL project solution. It explores, tests, and documents how to implement an IBM High-Performance Computing (HPC) solution on a POWER9 processor-based system by using IBM technical innovations to help solve challenging scientific, technical, and business problems. This book documents the HPC clustering solution with InfiniBand on IBM Power Systems™ AC922 8335-GTH and 8335-GTX servers with NVIDIA Tesla V100 SXM2 graphics processing units (GPUs) with NVLink, software components, and the IBM Spectrum™ Scale parallel file system. This solution includes recommendations about the components that are used to provide a cohesive clustering environment that includes job scheduling, parallel application tools, scalable file systems, administration tools, and a high-speed interconnect. This book is divided into three parts: Part 1 focuses on the planners of the solution, Part 2 focuses on the administrators, and Part 3 focuses on the developers. This book targets technical professionals (consultants, technical support staff, IT architects, and IT specialists) who are responsible for delivering cost-effective HPC solutions that help uncover insights among clients' data so that they can act to optimize business results, product development, and scientific discoveries.

IBM zPDT Guide and Reference

2019-05-01 O'Reilly Amazon

book

Bill Ogden

data data-engineering IBM Linux

This IBM® Redbooks® publication provides both introductory information and technical details about the IBM System z® Personal Development Tool (IBM zPDT®), which produces a small System z environment suitable for application development. zPDT is a PC Linux application. When zPDT is installed (on Linux), normal System z operating systems (such as IBM z/OS®) can be run on it. zPDT provides the basic System z architecture and emulated IBM 3390 disk drives, 3270 interfaces, OSA interfaces, and so on. The systems that are discussed in this document are complex. They have elements of Linux (for the underlying PC machine), IBM z/Architecture® (for the core zPDT elements), System z I/O functions (for emulated I/O devices), z/OS (the most common System z operating system), and various applications and subsystems under z/OS. The reader is assumed to be familiar with general concepts and terminology of System z hardware and software elements, and with basic PC Linux characteristics. This book provides the primary documentation for zPDT.

Data Architecture: A Primer for the Data Scientist, 2nd Edition

2019-04-30 O'Reilly Amazon

book

Mary Levins , Daniel Linstedt , W. H. Inmon

data data-engineering Analytics Big Data Data Science DWH

Over the past 5 years, the concept of big data has matured, data science has grown exponentially, and data architecture has become a standard part of organizational decision-making. Throughout all this change, the basic principles that shape the architecture of data have remained the same. There remains a need for people to take a look at the "bigger picture" and to understand where their data fit into the grand scheme of things. Data Architecture: A Primer for the Data Scientist, Second Edition addresses the larger architectural picture of how big data fits within the existing information infrastructure or data warehousing systems. This is an essential topic not only for data scientists, analysts, and managers but also for researchers and engineers who increasingly need to deal with large and complex sets of data. Until data are gathered and can be placed into an existing framework or architecture, they cannot be used to their full potential. Drawing upon years of practical experience and using numerous examples and case studies from across various industries, the authors seek to explain this larger picture into which big data fits, giving data scientists the necessary context for how pieces of the puzzle should fit together. New case studies include expanded coverage of textual management and analytics New chapters on visualization and big data Discussion of new visualizations of the end-state architecture

Elasticsearch 7.0 Cookbook - Fourth Edition

2019-04-30 O'Reilly Amazon

book

Alberto Paro

data data-engineering search elasticsearch Analytics Big Data

"Elasticsearch 7.0 Cookbook" is a practical guide to effectively using Elasticsearch, packed with over 100 recipes that cover everything from simple setup tasks to advanced query creation. Whether you're deploying Elasticsearch nodes or integrating with various technologies, this book will empower you to make the most out of Elasticsearch's robust search capabilities. What this Book will help me do Understand how to efficiently deploy and manage Elasticsearch architectures within your enterprise. Learn to create and optimize queries for effective analytics and data retrieval. Explore advanced indexing and mapping techniques to enhance data searchability. Monitor and scale your Elasticsearch clusters to ensure optimal performance. Integrate Elasticsearch with programming languages and big data applications. Author(s) Alberto Paro, a seasoned Elasticsearch expert, brings years of experience in designing and implementing large-scale search and analytics solutions. His practical experience in guiding teams through complex Elasticsearch deployments is evident in his clear and solution-focused writing approach. Alberto's passion for technology drives his mission to make advanced technical topics accessible. Who is it for? This book is ideal for software engineers, data professionals, and Elasticsearch developers who are looking to expand their technical capabilities in search and data analytics. It is also suited for individuals in industries like e-commerce utilizing Elastic for insights. A basic understanding of Elasticsearch will allow readers to gain deeper value from this book.

Fifty Years of Data Management and Beyond

2019-04-26 O'Reilly Amazon

book

Paco Nathan

data data-engineering Big Data Cloud Computing Data Management Data Science

Every decade since the 1960s, researchers at companies like IBM, Amazon, and many others have introduced major new frameworks and techniques to handle rising data management problems. This concise ebook explains how these new systems helped data science evolve quickly—from hierarchical and relational databases to big data and cloud computing to streaming and graph data. Computer scientist Paco Nathan shows members of your data science team how major companies created each of these data management systems not just to deal with new data types but also to take full advantage of the opportunities the data presented. Their efforts over the years have propelled an entire industry. This report covers the historical progression of data management topics including: Hierarchical databases—1960s mainframe batch systems are still used in finance, healthcare, manufacturing, energy, and other industries. Relational databases—these enabled faster transactions, mathematical optimization, and budgeting guarantees for many businesses. Big data—this includes relatively cheap horizontal scale-out systems for collecting huge amounts of customer data. Cloud computing—large companies began managing reliable, scalable, cost-effective data centers; Amazon turned the concept into a business. Cluster schedulers—managing horizontal clusters was difficult before schedulers such as Apache Mesos appeared. Streaming data—data continuously generated by different sources requires responses in "real time"—generally milliseconds.

Data Science and Engineering at Enterprise Scale

2019-04-25 O'Reilly Amazon

book

Jerome Nilmeier

data data-science AI/ML Analytics Data Science Python

As enterprise-scale data science sharpens its focus on data-driven decision making and machine learning, new tools have emerged to help facilitate these processes. This practical ebook shows data scientists and enterprise developers how the notebook interface, Apache Spark, and other collaboration tools are particularly well suited to bridge the communication gap between their teams. Through a series of real-world examples, author Jerome Nilmeier demonstrates how to generate a model that enables data scientists and developers to share ideas and project code. You’ll learn how data scientists can approach real-world business problems with Spark and how developers can then implement the solution in a production environment. Dive deep into data science technologies, including Spark, TensorFlow, and the Jupyter Notebook Learn how Spark and Python notebooks enable data scientists and developers to work together Explore how the notebook environment works with Spark SQL for structured data Use notebooks and Spark as a launchpad to pursue supervised, unsupervised, and deep learning data models Learn additional Spark functionality, including graph analysis and streaming Explore the use of analytics in the production environment, particularly when creating data pipelines and deploying code

SQL All-In-One For Dummies, 3rd Edition

2019-04-23 O'Reilly Amazon

book

Allen G. Taylor

data data-engineering SQL Microsoft MySQL Oracle

The latest on SQL databases SQL All -In-One For Dummies, 3rd Edition, is a one-stop shop for everything you need to know about SQL and SQL-based relational databases. Everyone from database administrators to application programmers and the people who manage them will find clear, concise explanations of the SQL language and its many powerful applications. With the ballooning amount of data out there, more and more businesses, large and small, are moving from spreadsheets to SQL databases like Access, Microsoft SQL Server, Oracle databases, MySQL, and PostgreSQL. This compendium of information covers designing, developing, and maintaining these databases. Cope with any issue that arises in SQL database creation and management Get current on the newest SQL updates and capabilities Reference information on querying SQL-based databases in the SQL language Understand relational databases and their importance to today’s organizations SQL All-In-One For Dummies is a timely update to the popular reference for readers who want detailed information about SQL databases and queries.

IBM Spectrum Archive Enterprise Edition V1.2.6 Installation and Configuration Guide

2019-04-19 O'Reilly Amazon

book

Markus Schäfer , Illarion Borisevich , Larry Coyne , Khanh Ngo , IBM Redbooks

data data-engineering IBM

Note: This is a republication of IBM Spectrum Archive Enterprise Edition V1.2.6: Installation and Configuration Guide with new book number SG24-8445 to keep the content available on the Internet along with the recent publication IBM Spectrum Archive Enterprise Edition V1.3.0: Installation and Configuration Guide, SG24-8333. This IBM® Redbooks® publication helps you with the planning, installation, and configuration of the new IBM Spectrum™ Archive V1.2.6 for the IBM TS3310, IBM TS3500, IBM TS4300, and IBM TS4500 tape libraries. IBM Spectrum Archive™ EE enables the use of the LTFS for the policy management of tape as a storage tier in an IBM Spectrum Scale™ based environment. It helps encourage the use of tape as a critical tier in the storage environment. This is the sixth edition of IBM Spectrum Archive Installation and Configuration Guide. IBM Spectrum Archive EE can run any application that is designed for disk files on a physical tape media. IBM Spectrum Archive EE supports the IBM Linear Tape-Open (LTO) Ultrium 8, 7, 6, and 5 tape drives in IBM TS3310, TS3500, TS4300, and TS4500 tape libraries. In addition, IBM TS1155, TS1150, and TS1140 tape drives are supported in TS3500 and TS4500 tape library configurations. IBM Spectrum Archive EE can play a major role in reducing the cost of storage for data that does not need the access performance of primary disk. The use of IBM Spectrum Archive EE to replace disks with physical tape in tier 2 and tier 3 storage can improve data access over other storage solutions because it improves efficiency and streamlines management for files on tape. IBM Spectrum Archive EE simplifies the use of tape by making it transparent to the user and manageable by the administrator under a single infrastructure. This publication is intended for anyone who wants to understand more about IBM Spectrum Archive EE planning and implementation. This book is suitable for IBM clients, IBM Business Partners, IBM specialist sales representatives, and technical specialists.

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

IBM FlashSystem A9000 Product Guide (Version 12.3.2)

IBM FlashSystem A9000R Product Guide (Version 12.3.2)

Managing Your Data Science Projects: Learn Salesmanship, Presentation, and Maintenance of Completed Models

Stream Processing with Apache Spark

Obtaining Value from Big Data for Service Systems, Volume II, 2nd Edition

Pro SQL Server 2019 Wait Statistics: A Practical Guide to Analyzing Performance in SQL Server

Geospatial Data Science Quick Start Guide

Learning Elastic Stack 7.0 - Second Edition

Mastering SAP ABAP

Obtaining Value from Big Data for Service Systems, Volume I, 2nd Edition

Electronic Health Records with Epic and IBM FlashSystem 9100 Blueprint Version 2 Release 1

Pro Oracle SQL Development: Best Practices for Writing Advanced Queries

Loss Models, 5th Edition

Data Science from Scratch, 2nd Edition

IBM GDPS Family: An Introduction to Concepts and Capabilities

Learn T-SQL Querying

PostgreSQL 11 Administration Cookbook

IBM High-Performance Computing Insights with IBM Power System AC922 Clustered Solution

IBM zPDT Guide and Reference

Data Architecture: A Primer for the Data Scientist, 2nd Edition

Elasticsearch 7.0 Cookbook - Fourth Edition

Fifty Years of Data Management and Beyond

Data Science and Engineering at Enterprise Scale

SQL All-In-One For Dummies, 3rd Edition

IBM Spectrum Archive Enterprise Edition V1.2.6 Installation and Configuration Guide