talk-data.com talk-data.com

Event

O'Reilly Data Engineering Books

2001-10-19 – 2027-05-25 Oreilly Visit website ↗

Activities tracked

615

Collection of O'Reilly books on Data Engineering.

Filtering by: Cyber Security ×

Sessions & talks

Showing 226–250 of 615 · Newest first

Search within this event →
Cleaning Up the Data Lake with an Operational Data Hub

The data lake was once heralded as the answer to the flood of big data that arrived in a variety of structured and unstructured formats. But, due to the ease of integration and the lack of governance, data lakes in many companies have devolved into unusable data swamps. This short ebook shows you how to solve this problem using an Operational Data Hub (ODH) to collect, store, index, cleanse, harmonize, and master data of all shapes and formats. Gerhard Ungerer—CTO and co-founder of Random Bit LLC—explains how the ODH supports transactional integrity so that the hub can serve as integration point for enterprise applications. You’ll also learn how the ODH helps you leverage the investment in your data lake (or swamp), so that the data trapped there can finally be ingested, processed, and provisioned. With this ebook, you’ll learn how an ODH: Allows you to focus on categorizing data for easy and fast retrieval Provides flexible storage models, indexing support, query capabilities, security, and a governance framework Delivers flexible storage models; support for indexing, scripting, and automation; query capabilities; transactional integrity; and security Includes a governance model to help you access, ingest, harmonize, materialize, provision, and consume data

MarkLogic Cookbook

Learn how to get the most out of MarkLogic with recipes from people who understand this powerful multi-model database platform from the inside out. MarkLogic comes with a broad set of capabilities to help you quickly integrate data from silos, but it takes time to learn how to harness that power. In this three-part series, key members of the MarkLogic team—including engineers who built the database—provide targeted recipes to get you up to speed. In Part 1, you’ll learn how to solve real-world problems with XQuery, the functional language for working with hierarchical data structures such as XML. Part 2 helps you solve common search-related problems with recipes that work with MarkLogic 9 as well as with older versions. With recipes in Part 3, you’ll explore the multiple ways MarkLogic represents data. XQuery: Gain XQuery peak performance, and explore its use in maps, documents, document security, the task server, and administration Search-related problems: Conduct document searches, score search results, understand how data is used, and search with the Optic API MarkLogic and data: Work with input transformations, tokenization, template-driven extraction, and redaction

Camel in Action, Second Edition

Camel in Action, Second Edition is the most complete Camel book on the market. Written by core developers of Camel and the authors of the highly acclaimed first edition, this book distills their experience and practical insights so that you can tackle integration tasks like a pro. About the Technology Apache Camel is a Java framework that implements enterprise integration patterns (EIPs) and comes with over 200 adapters to third-party systems. A concise DSL lets you build integration logic into your app with just a few lines of Java or XML. By using Camel, you benefit from the testing and experience of a large and vibrant open source community. About the Book Camel in Action, Second Edition is the definitive guide to the Camel framework. It starts with core concepts like sending, receiving, routing, and transforming data. It then goes in depth on many topics such as how to develop, debug, test, deal with errors, secure, scale, cluster, deploy, and monitor your Camel applications. The book also discusses how to run Camel with microservices, reactive systems, containers, and in the cloud. What's Inside Coverage of all relevant EIPs Camel microservices with Spring Boot Camel on Docker and Kubernetes Error handling, testing, security, clustering, monitoring, and deployment Hundreds of examples in Java and XML About the Reader Readers should be familiar with Java. This book is accessible to beginners and invaluable to experts. About the Authors Claus Ibsen is a senior principal engineer working for Red Hat specializing in cloud and integration. He has worked on Apache Camel for the last nine years where he heads the project. Claus lives in Denmark. Jonathan Anstey is an engineering manager at Red Hat and a core Camel contributor. He lives in Newfoundland, Canada. Quotes I highly recommend this book to anyone with even a passing interest in Apache Camel. Do take Camel for a ride...and don't get the hump! - From the Foreword by James Strachan, Creator of Apache Camel Claus and Jon are great writers, relying on figures and diagrams where needed and presenting lots of code snippets and worked examples. - From the Foreword by Dr. Mark Little, Technical Director of JBoss The second edition of this all-time classic is an indispensable companion for your Apache Camel rides. - Gregor Zurowski, Apache Camel Committer The absolute best way to learn and use Camel - top to bottom, front to back, and all the way through. Camel is a fantastic tool - every Java coder should have a copy of this book. - Rick Wagner, Red Hat An excellent book and the definite reference for experienced engineers. - Yan Guo, EventBrite

SQL Server 2017 Administration Inside Out, First Edition

Conquer SQL Server 2017 administration—from the inside out Dive into SQL Server 2017 administration—and really put your SQL Server DBA expertise to work. This supremely organized reference packs hundreds of timesaving solutions, tips, and workarounds—all you need to plan, implement, manage, and secure SQL Server 2017 in any production environment: on-premises, cloud, or hybrid. Four SQL Server experts offer a complete tour of DBA capabilities available in SQL Server 2017 Database Engine, SQL Server Data Tools, SQL Server Management Studio, and via PowerShell. Discover how experts tackle today’s essential tasks—and challenge yourself to new levels of mastery. • Install, customize, and use SQL Server 2017’s key administration and development tools • Manage memory, storage, clustering, virtualization, and other components • Architect and implement database infrastructure, including IaaS, Azure SQL, and hybrid cloud configurations • Provision SQL Server and Azure SQL databases • Secure SQL Server via encryption, row-level security, and data masking • Safeguard Azure SQL databases using platform threat protection, firewalling, and auditing • Establish SQL Server IaaS network security groups and user-defined routes • Administer SQL Server user security and permissions • Efficiently design tables using keys, data types, columns, partitioning, and views • Utilize BLOBs and external, temporal, and memory-optimized tables • Master powerful optimization techniques involving concurrency, indexing, parallelism, and execution plans • Plan, deploy, and perform disaster recovery in traditional, cloud, and hybrid environments For Experienced SQL Server Administrators and Other Database Professionals • Your role: Intermediate-to-advanced level SQL Server database administrator, architect, developer, or performance tuning expert • Prerequisites: Basic understanding of database administration procedures

Teradata Cookbook

Are you ready to master Teradata, one of the leading relational database management systems for data warehousing? In the "Teradata Cookbook," you will find over 85 recipes covering vital tasks like querying, performance tuning, and administrative operations. With clear and practical instructions, this book will equip you with the skills necessary to optimize data storage and analytics in your organization. What this Book will help me do Master Teradata's advanced features for efficient data warehousing applications. Understand and employ Teradata SQL for effective data manipulation and analytics. Explore practical solutions for Teradata administration tasks, including user and security management. Learn performance tuning techniques to enhance the efficiency of your queries and processes. Acquire detailed knowledge about Teradata's architecture and its unique capabilities. Author(s) The authors of "Teradata Cookbook" are experienced professionals in database management and data warehousing. With a deep understanding of Teradata's architecture and use in real-world applications, they bring a wealth of knowledge to each of the book's recipes. Their focus is to provide practical, actionable insights to help you tackle challenges you may face. Who is it for? This book is ideal for database administrators, data analysts, and professionals working with data warehousing who want to leverage the power of Teradata. Whether you are new to this database management system or looking to enhance your expertise, this cookbook provides practical solutions and in-depth insights, making it an essential resource.

IBM z14 Technical Guide

Abstract This IBM® Redbooks® publication describes the new member of the IBM Z family, IBM z14®. IBM z14 is the trusted enterprise platform for pervasive encryption, integrating data, transactions, and insights into the data. A data-centric infrastructure must always be available with a 99.999% or better availability, have flawless data integrity, and be secured from misuse. It also must be an integrated infrastructure that can support new applications. Finally, it must have integrated capabilities that can provide new mobile capabilities with real-time analytics that are delivered by a secure cloud infrastructure. IBM z14 servers are designed with improved scalability, performance, security, resiliency, availability, and virtualization. The superscalar design allows z14 servers to deliver a record level of capacity over the prior IBM Z platforms. In its maximum configuration, z14 is powered by up to 170 client characterizable microprocessors (cores) running at 5.2 GHz. This configuration can run more than 146,000 million instructions per second (MIPS) and up to 32 TB of client memory. The IBM z14 Model M05 is estimated to provide up to 35% more total system capacity than the IBM z13® Model NE1. This Redbooks publication provides information about IBM z14 and its functions, features, and associated software support. More information is offered in areas that are relevant to technical planning. It is intended for systems engineers, consultants, planners, and anyone who wants to understand the IBM Z servers functions and plan for their usage. It is intended as an introduction to mainframes. Readers are expected to be generally familiar with existing IBM Z technology and terminology.

Mastering PostgreSQL 10

Mastering PostgreSQL 10 delves into the depths of PostgreSQL development and administration, guiding readers through advanced functionalities of the database. Covering topics such as query optimization, replication, high availability, and migration, this book equips you with the skills needed to harness the full power of PostgreSQL 10. What this Book will help me do Learn to optimize database queries to enhance performance in PostgreSQL 10. Understand advanced replication techniques and how to implement high availability. Gain expertise in managing security, backups and performing data migrations effectively. Explore query tuning and indexing strategies to speed up your database applications. Handle troubleshooting challenges by understanding problems and their solutions. Author(s) The authors of Mastering PostgreSQL 10 are experts in the field of databases, with years of experience in designing, developing, and managing PostgreSQL systems. They are passionate educators dedicated to helping professionals maximize their potential with PostgreSQL. Their practical and approachable style ensures that even complex topics are clearly explained. Who is it for? This book is ideal for PostgreSQL data architects and administrators who want to master advanced features of PostgreSQL 10. It is best suited for individuals who have prior database administration experience and a working knowledge of SQL. Readers aiming to enhance performance and implement transformations in their PostgreSQL setups will benefit immensely. Those tasked with ensuring high availability, migration, and recovery of PostgreSQL will find this book invaluable.

MySQL 8 Cookbook

With "MySQL 8 Cookbook," dive into over 150 practical recipes tailored for database professionals aiming to master MySQL 8. You will explore setup, querying, and advanced features like security and performance tuning. This book is your comprehensive guide to efficient database handling in MySQL 8. What this Book will help me do Efficiently set up and configure a MySQL 8 environment. Master advanced querying techniques using new MySQL features such as CTEs and window functions. Execute robust data backup and recovery strategies with MySQL 8. Implement performance improvements with tools and features like descending indexes and query optimizers. Secure, manage, and optimize databases to support scalable, high-performance applications. Author(s) Karthik Appigatla is a seasoned database administrator and developer with extensive expertise in MySQL and relational database systems. With years of industry experience, he brings a practical perspective to database solutions. His passion is to empower learners by simplifying complex database concepts with a hands-on approach. Who is it for? This book is tailored for MySQL developers or administrators who seek ready solutions for their MySQL challenges. Whether you're upgrading to MySQL 8 or want to leverage its latest features, this cookbook is for you. Ideal for those with basic Linux and SQL experience aiming to build advanced MySQL knowledge and skills.

ABCs of IBM z/OS System Programming Volume 3

Abstract The ABCs of IBM z/OS® System Programming is a 13-volume collection that provides an introduction to the z/OS operating system and the hardware architecture. Whether you are a beginner or an experienced system programmer, the ABCs collection provides the information that you need to start your research into z/OS and related subjects. The ABCs collection serves as a powerful technical tool to help you become more familiar with z/OS in your current environment, or to help you evaluate platforms to consolidate your e-business applications. This edition is updated to z/OS Version 2 Release 3. The other volumes contain the following content: Volume 1: Introduction to z/OS and storage concepts, TSO/E, ISPF, JCL, SDSF, and z/OS delivery and installation Volume 2: z/OS implementation and daily maintenance, defining subsystems, IBM Job Entry Subsystem 2 (JES2) and JES3, link pack area (LPA), LNKLST, authorized libraries, System Modification Program Extended (SMP/E), IBM Language Environment Volume 4: Communication Server, TCP/IP, and IBM VTAM® Volume 5: Base and IBM Parallel Sysplex®, System Logger, Resource Recovery Services (RRS), global resource serialization (GRS), z/OS system operations, automatic restart manager (ARM), IBM Geographically Dispersed Parallel Sysplex™ (IBM GDPS) Volume 6: Introduction to security, IBM RACF®, Digital certificates and PKI, Kerberos, cryptography and z990 integrated cryptography, zSeries firewall technologies, LDAP, and Enterprise Identity Mapping (EIM) Volume 7: Printing in a z/OS environment, Infoprint Server, and Infoprint Central Volume 8: An introduction to z/OS problem diagnosis Volume 9: z/OS UNIX System Services Volume 10: Introduction to IBM z/Architecture®, the IBM Z platform, IBM Z connectivity, LPAR concepts, HCD, and DS Storage Solution. Volume 11: Capacity planning, performance management, WLM, IBM RMF™, and SMF Volume 12: WLM Volume 13: JES3, JES3 SDSF

Liberty in IBM CICS: Deploying and Managing Java EE Applications

Abstract This IBM® Redbooks® publication is intended for IBM CICS® system programmers and IBM Z architects. It describes how to deploy and manage Java EE 7 web-based applications in an IBM CICS Liberty JVM server and access data on IBM Db2® for IBM z/OS® and IBM MQ for z/OS sub systems. In this book, we describe the key steps to create and install a Liberty JVM server within a CICS region. We then describe how to best use the different deployment techniques for Java EE applications and the specific considerations when deploying applications that use JDBC, JMS, and the new CICS link to Liberty API. Finally, we describe how to secure web applications in CICS Liberty, including transport-level security and request authentication and authorization by using IBM RACF® and LDAP registries. Information is also provided about how to build a high availability infrastructure and how to use the logging and monitoring functions that are available in the CICS Liberty environment. This book is based on IBM CICS Transaction Server (CICS TS) V5.4 that uses the embedded IBM WebSphere® Application Server Liberty technology. It is also applicable to CICS TS V5.3 with the fixes for the continuous delivery APAR PI77502 applied. Sample applications are used throughout this publication and are freely available for download from the IBM CICSDev GitHub organization along with detailed deployment instructions.

IBM QRadar Version 7.3 Planning and Installation Guide

Abstract With the advances of technology and the reoccurrence of data leaks, cyber security is a bigger challenge than ever before. Cyber attacks evolve as quickly as the technology itself, and hackers are finding more innovative ways to break security controls to access confidential data and to interrupt services. Hackers reinvent themselves using new technology features as a tool to expose companies and individuals. Therefore, cyber security cannot be reactive but must go a step further by implementing proactive security controls that protect one of the most important assets of every organization: the company's information. This IBM® Redbooks® publication provides information about implementing IBM QRadar® for Security Intelligence and Event Monitoring (SIEM) and protecting an organization's networks through a sophisticated technology, which permits a proactive security posture. It is divided in to the following major sections to facilitate the integration of QRadar with any network architecture: Chapter 2, "Before the installation" on page 3 provides a review of important requirements before the installation of the product. Chapter 3, "Installing IBM QRadar V7.3" on page 57 provides step-by-step procedures to guide you through the installation process. Chapter 4, "After the installation" on page 77 helps you to configure additional features and perform checks after the product is installed. QRadar is an IBM Security prime product that is designed to be integrated with corporate network devices to keep a real-time monitoring of security events through a centralized console. Through this book, any network or security administrator can understand the product's features and benefits.

Security and Privacy in Cyber-Physical Systems

Written by a team of experts at the forefront of the cyber-physical systems (CPS) revolution, this book provides an in-depth look at security and privacy, two of the most critical challenges facing both the CPS research and development community and ICT professionals. It explores, in depth, the key technical, social, and legal issues at stake, and it provides readers with the information they need to advance research and development in this exciting area. Cyber-physical systems (CPS) are engineered systems that are built from, and depend upon the seamless integration of computational algorithms and physical components. Advances in CPS will enable capability, adaptability, scalability, resiliency, safety, security, and usability far in excess of what today’s simple embedded systems can provide. Just as the Internet revolutionized the way we interact with information, CPS technology has already begun to transform the way people interact with engineered systems. In the years ahead, smart CPS will drive innovation and competition across industry sectors, from agriculture, energy, and transportation, to architecture, healthcare, and manufacturing. A priceless source of practical information and inspiration, Security and Privacy in Cyber-Physical Systems: Foundations, Principles and Applications is certain to have a profound impact on ongoing R&D and education at the confluence of security, privacy, and CPS.

MongoDB Administrator's Guide

The "MongoDB Administrator's Guide" is an indispensable resource for database administrators and developers looking to gain mastery over administrating MongoDB systems. This book offers over 100 practical recipes, designed to simplify the tasks of maintaining, optimizing, and securing MongoDB deployments. What this Book will help me do Deploy and configure production-grade MongoDB environments efficiently. Manage and optimize MongoDB indexing to improve query performance. Implement and maintain high availability through replication and sharding. Ensure database security with robust authentication and authorization. Perform efficient backups, recovery, and database performance monitoring. Author(s) None Dasadia is a seasoned MongoDB expert with extensive experience in database administration and optimization. Having worked extensively in developing and managing high-performance database systems, None ensures a hands-on and practical approach in their writing. Their aim is to guide readers to effectively solve real-world database challenges with MongoDB. Who is it for? This book is ideal for database administrators with a foundational understanding of MongoDB, as well as developers aiming to enhance their administration skills in this NoSQL ecosystem. Whether you're seeking best practices for routine tasks or scalable solutions for enterprise-level applications, this guide has comprehensive coverage tailored for you.

Learning Neo4j 3.x - Second Edition

"Learning Neo4j 3.x" provides a comprehensive introduction to the world of graph databases, focusing on practical usage of Neo4j. This book guides you through the fundamentals, from installation and modeling to advanced features including security and optimization. You'll gain the skills to harness Neo4j for effective data management and visualization. What this Book will help me do Understand the basics of graph databases and how to use them effectively in real-world scenarios. Master the Cypher query language to query and manipulate graph data powerfully and intuitively. Learn to implement and optimize advanced graph techniques using the APOC library. Develop the ability to extend Neo4j's core functionality using available plugins and advanced extensions. Acquire skills to design and deploy scalable, secure enterprise-grade graph database solutions. Author(s) Jerome Baton and None Van Bruggen are experienced Neo4j specialists who share a passion for making complex technical concepts accessible. Jerome brings years of real-world experience in graph database applications, while None contributes expertise in data modeling and visualization. Together, they provide clear, focused insights with practical examples and hands-on guidance. Who is it for? This book is tailored for developers looking to extend their knowledge with graph databases to take on modern connected data challenges. It is suitable for those new to Neo4j, including beginners with databases, and will serve as a valuable guide for professionals aiming to deepen their expertise in data storage and query optimization using Neo4j.

IBM Spectrum Virtualize Considerations for PCI-DSS Compliance

The Payment Card Industry Data Security Standard (PCI-DSS) is the global information security standard for organizations that process, store, or transmit data with any of the major credit card brands. More and more organizations are looking for compliance with this standard. This IBM® Redpaper™ describes how the features and functions of IBM Spectrum™ Virtualize help organizations towards compliance of their IT infrastructure on relevant areas of the PCI-DSS standard. IBM Spectrum Virtualize™ is the software common to all IBM Storwize® products such as IBM SAN Volume Controller (SVC), IBM Storwize V5000 family, IBM Storwize V7000, IBM FlashSystem® V9000, and IBM Spectrum Virtualize as Software. Therefore, all recommendations in this paper equally apply to these storage products.

IBM TS4500 R4 Tape Library Guide

Abstract The IBM® TS4500 (TS4500) tape library is a next-generation tape solution that offers higher storage density and integrated management than previous solutions. This IBM Redbooks® publication gives you a close-up view of the new IBM TS4500 tape library. In the TS4500, IBM delivers the density that today's and tomorrow's data growth requires. It has the cost-effectiveness and the manageability to grow with business data needs, while you preserve existing investments in IBM tape library products. Now, you can achieve both a low cost per terabyte (TB) and a high TB density per square foot, because the TS4500 can store up to 8.25 petabytes (PB) of uncompressed data in a single frame library or scale up at 1.5 PB per square foot to over 263 PB, which is more than 4 times the capacity of the IBM TS3500 tape library. The TS4500 offers these benefits: High availability dual active accessors with integrated service bays to reduce inactive service space by 40%. The Elastic Capacity option can be used to completely eliminate inactive service space. Flexibility to grow: The TS4500 library can grow from both the right side and the left side of the first L frame because models can be placed in any active position. Increased capacity: The TS4500 can grow from a single L frame up to an additional 17 expansion frames with a capacity of over 23,000 cartridges. High-density (HD) generation 1 frames from the existing TS3500 library can be redeployed in a TS4500. Capacity on demand (CoD): CoD is supported through entry-level, intermediate, and base-capacity configurations. Advanced Library Management System (ALMS): ALMS supports dynamic storage management, which enables users to create and change logical libraries and configure any drive for any logical library. Support for the IBM TS1155 while also supporting TS1150 and TS1140 tape drive: The TS1155 gives organizations an easy way to deliver fast access to data, improve security, and provide long-term retention, all at a lower cost than disk solutions. The TS1155 offers high-performance, flexible data storage with support for data encryption. Also, this enhanced fifth-generation drive can help protect investments in tape automation by offering compatibility with existing automation. The new TS1155 Tape Drive Model 55E delivers a 10 Gb Ethernet host attachment interface optimized for cloud-based and hyperscale environments. The TS1155 Tape Drive Model 55F delivers a native data rate of 360 MBps, the same load/ready, locate speeds, and access times as the TS1150, and includes dual-port 8 Gb Fibre Channel support. Support of the IBM Linear Tape-Open (LTO) Ultrium 7 tape drive: The LTO Ultrium 7 offering represents significant improvements in capacity, performance, and reliability over the previous generation, LTO Ultrium 6, while they still protect your investment in the previous technology. Integrated TS7700 back-end Fibre Channel (FC) switches are available. Up to four library-managed encryption (LME) key paths per logical library are available. This book describes the TS4500 components, feature codes, specifications, supported tape drives, encryption, new integrated management console (IMC), and command-line interface (CLI). You learn how to accomplish several specific tasks: Improve storage density with increased expansion frame capacity up to 2.4 times and support 33% more tape drives per frame. Manage storage by using the ALMS feature. Improve business continuity and disaster recovery with dual active accessor, automatic control path failover, and data path failover. Help ensure security and regulatory compliance with tape-drive encryption and Write Once Read Many (WORM) media. Support IBM LTO Ultrium 7, 6, and 5, IBM TS1155, TS1150, and TS1140 tape drives. Provide a flexible upgrade path for users who want to expand their tape storage as their needs grow. Reduce the storage footprint and simplify cabling with 10 U of rack space on top of the library. This guide is for anyone who wants to understand more about the IBM TS4500 tape library. It is particularly suitable for IBM clients, IBM Business Partners, IBM specialist sales representatives, and technical specialists.

EU General Data Protection Regulation (GDPR): An Implementation and Compliance Guide - Second edition

The updated second edition of the bestselling guide to the changes your organisation needs to make to comply with the EU GDPR. “The clear language of the guide and the extensive explanations, help to explain the many doubts that arise reading the articles of the Regulation.” Giuseppe G. Zorzino The EU General Data Protection Regulation (GDPR) will supersede the 1995 EU Data Protection Directive (DPD) and all EU member states’ national laws based on it – including the UK Data Protection Act 1998 – in May 2018. All organisations – wherever they are in the world – that process the personal data of EU residents must comply with the Regulation. Failure to do so could result in fines of up to €20 million or 4% of annual global turnover. This book provides a detailed commentary on the GDPR, explains the changes you need to make to your data protection and information security regimes, and tells you exactly what you need to do to avoid severe financial penalties. Product overview Now in its second edition, EU GDPR – An Implementation and Compliance Guide is a clear and comprehensive guide to this new data protection law, explaining the Regulation, and setting out the obligations of data processors and controllers in terms you can understand. Topics covered include: The role of the data protection officer (DPO) – including whether you need one and what they should do. Risk management and data protection impact assessments (DPIAs), including how, when and why to conduct a DPIA. Data subjects’ rights, including consent and the withdrawal of consent; subject access requests and how to handle them; and data controllers’ and processors’ obligations. International data transfers to “third countries” – including guidance on adequacy decisions and appropriate safeguards; the EU-US Privacy Shield; international organisations; limited transfers; and Cloud providers. How to adjust your data protection processes to transition to GDPR compliance, and the best way of demonstrating that compliance. A full index of the Regulation to help you find the articles and stipulations relevant to your organisation. New for the second edition: Additional definitions. Further guidance on the role of the DPO. Greater clarification on data subjects’ rights. Extra guidance on data protection impact assessments. More detailed information on subject access requests (SARs). Clarification of consent and the alternative lawful bases for processing personal data. New appendix: implementation FAQ. The GDPR will have a significant impact on organisational data protection regimes around the world. EU GDPR – An Implementation and Compliance Guide shows you exactly what you need to do to comply with the new law.

Data Warehousing with Greenplum

Relational databases haven’t gone away, but they are evolving to integrate messy, disjointed unstructured data into a cleansed repository for analytics. With the execution of massively parallel processing (MPP), the latest generation of analytic data warehouses is helping organizations move beyond business intelligence to processing a variety of advanced analytic workloads. These MPP databases expose their power with the familiarity of SQL. This report introduces the Greenplum Database, recently released as an open source project by Pivotal Software. Lead author Marshall Presser of Pivotal Data Engineering takes you through the Greenplum approach to data analytics and data-driven decisions, beginning with Greenplum’s shared-nothing architecture. You’ll explore data organization and storage, data loading, running queries, as well as performing analytics in the database. You’ll learn: How each networked node in Greenplum’s architecture features an independent operating system, memory, and storage Four deployment options to help you balance security, cost, and time to usability Ways to organize data, including distribution, storage, partitioning, and loading How to use Apache MADlib for in-database analytics, and GPText to process and analyze free-form text Tools for monitoring, managing, securing, and optimizing query responses available in the Pivotal Greenplum commercial database

IBM z14 Technical Introduction

Abstract This IBM® Redpaper Redbooks® publication introduces the latest IBM Z platform, the IBM z14®. It includes information about the Z environment and how it helps integrate data and transactions more securely, and can infuse insight for faster and more accurate business decisions. The z14 is state-of-the-art data and transaction system that delivers advanced capabilities, which are vital to the digital era and the trust economy. These capabilities include: - Securing data with pervasive encryption - Transforming a transactional platform into a data powerhouse - Getting more out of the platform with IT Operational Analytics - Providing resilience with key to zero downtime - Accelerating digital transformation with agile service delivery - Revolutionizing business processes - Blending open source and Z technologies This book explains how this system uses both new innovations and traditional Z strengths to satisfy growing demand for cloud, analytics, and security. With the z14 as the base, applications can run in a trusted, reliable, and secure environment that both improves operations and lessens business risk.

Moving Hadoop to the Cloud

Until recently, Hadoop deployments existed on hardware owned and run by organizations. Now, of course, you can acquire the computing resources and network connectivity to run Hadoop clusters in the cloud. But there’s a lot more to deploying Hadoop to the public cloud than simply renting machines. This hands-on guide shows developers and systems administrators familiar with Hadoop how to install, use, and manage cloud-born clusters efficiently. You’ll learn how to architect clusters that work with cloud-provider features—not just to avoid pitfalls, but also to take full advantage of these services. You’ll also compare the Amazon, Google, and Microsoft clouds, and learn how to set up clusters in each of them. Learn how Hadoop clusters run in the cloud, the problems they can help you solve, and their potential drawbacks Examine the common concepts of cloud providers, including compute capabilities, networking and security, and storage Build a functional Hadoop cluster on cloud infrastructure, and learn what the major providers require Explore use cases for high availability, relational data with Hive, and complex analytics with Spark Get patterns and practices for running cloud clusters, from designing for price and security to dealing with maintenance

Building on Multi-Model Databases

In many organizations today, businesspeople are busy requesting unified views of data stored across multiple sources within their organizations. But integrating multiple data types from multiple data stores is a complex, error-prone, and time-consuming process of cobbling everything together manually. This concise book examines how multi-model databases can help you integrate data storage and access across your organization in a seamless and elegant way. Author Pete Aven and Diane Burley from MarkLogic explain how this latest evolution in data management naturally accepts heterogeneous data, enabling you to eventually phase out technical data silos. Through several case studies, you’ll discover how organizations use multi-model databases to reduce complexity, save money, take advantage of opportunities, lessen risk, and shorten time to value. Get unified views across disparate data models and formats within a single database Learn how multi-model databases leverage the inherent structure of the data being stored Load and use unstructured and semi-structured data (such as documents and text) as is Provide agility in data access and delivery through APIs, interfaces, and indexes Learn how to scale a multi-model database, and provide ACID capabilities and security Examine how a multi-model database would fit into your existing architecture

Advanced Analytics with Spark, 2nd Edition

In the second edition of this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example. Updated for Spark 2.1, this edition acts as an introduction to these techniques and other best practices in Spark programming. You’ll start with an introduction to Spark and its ecosystem, and then dive into patterns that apply common techniques—including classification, clustering, collaborative filtering, and anomaly detection—to fields such as genomics, security, and finance. If you have an entry-level understanding of machine learning and statistics, and you program in Java, Python, or Scala, you’ll find the book’s patterns useful for working on your own data applications. With this book, you will: Familiarize yourself with the Spark programming model Become comfortable within the Spark ecosystem Learn general approaches in data science Examine complete implementations that analyze large public data sets Discover which machine learning tools make sense for particular problems Acquire code that can be adapted to many uses

Hadoop 2.x Administration Cookbook

Gain mastery over managing and maintaining large Apache Hadoop clusters with the Hadoop 2.x Administration Cookbook. This book provides practical step-by-step recipes guiding you to efficiently set up, optimize, and troubleshoot Hadoop clusters, ensuring high availability, security, and optimal performance in your data operations. What this Book will help me do Successfully set up and deploy an operational Hadoop 2.x cluster suitable for large-scale data operations. Effectively monitor and maintain Hadoop's HDFS, YARN, and MapReduce systems for optimized performance. Plan, configure, and enhance cluster availability using Zookeeper and Journal Node strategies. Develop workflows and manage data ingestion processes with tools like Flume and Oozie. Secure, troubleshoot, and optimize Hadoop environments to meet enterprise and operational standards. Author(s) Aman Singh is an experienced Hadoop administrator with years of hands-on experience managing robust and efficient Hadoop clusters. Aman has a deep understanding of the practical challenges faced in this field and a talent for breaking down complex topics into actionable steps. Through clear, problem-oriented language, Aman helps readers achieve fluency in Hadoop administration. Who is it for? This book is ideal for system administrators or IT professionals who have a foundational understanding of Hadoop and aim to strengthen their administrative skills. It is especially beneficial for experienced Hadoop administrators looking for a quick and practical reference guide to master cluster management. Whether you're working in a large enterprise or exploring Hadoop ecosystems for personal development, you'll find this book invaluable.

PostgreSQL Administration Cookbook, 9.5/9.6 Edition - Third Edition

Dive into the world of PostgreSQL database management with this hands-on guide. This book takes you through essential administration tasks and advanced features of PostgreSQL 9.5 and 9.6, equipping you with the tools to efficiently manage and optimize your databases. What this Book will help me do Set up and configure PostgreSQL servers for optimal performance and reliability. Implement robust backup and disaster recovery strategies tailored to your needs. Master replication techniques including high availability and logical replication. Analyze and troubleshoot performance issues with advanced diagnostics tools. Secure and protect your databases using best practices and security features. Author(s) Simon Riggs, Gianni Ciolli, and None Bartolini are leading figures in the PostgreSQL community. With extensive experience in database architecture and system administration, they have guided numerous professionals in mastering PostgreSQL. Their practical insights and clear instructions make this book an invaluable resource. Who is it for? This book is ideal for system administrators, database administrators, and developers who are responsible for database management. Whether you're aspiring to deepen your expertise in PostgreSQL or are already working with databases and seeking advanced knowledge, this guide caters to intermediate to advanced skill levels.

Oracle Database 12c Release 2 New Features

Leverage the New and Improved Features of Oracle Database 12c Written by Oracle experts Bob Bryla and Robert G. Freeman, this Oracle Press guide describes the myriad new and enhanced capabilities available in the latest Oracle Database release. Inside, you’ll find everything you need to know to get up and running quickly on Oracle Database 12c Release 2. Supported by contributions from Oracle expert Eric Yen, Oracle Database 12c Release 2 New Features offers detailed coverage of: • Installing Oracle Database 12c and Grid Infrastructure • Architectural changes, such as Oracle Multitenant • The most current information on upgrading and migrating to Oracle Database 12c • The pre-upgrade information tool and parallel processing for database upgrades • Oracle Real Application Clusters new features, such as Oracle Flex Cluster, Oracle Flex Automatic Storage Management, and Oracle Automatic Storage Management Cluster File System • Enhanced and new online operations: tables, indexes, and PDBs • Oracle RMAN enhancements, including cross-platform backup and recovery • Oracle Data Guard improvements, such as Fast Sync, and Oracle Active Data Guard new features, such as Far Sync • SQL, PL/SQL, DML, and DDL new features • Improvements to partitioning manageability, performance, and availability • Advanced business intelligence and data warehousing capabilities • Security enhancements, including privileges analysis, data redaction, and new administrative-level privileges • Manageability, performance, and optimization improvements