talk-data.com talk-data.com

Topic

data-engineering

3377

tagged

Activity Trend

1 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: O'Reilly Data Engineering Books ×
The Definitive Guide to MongoDB: A complete guide to dealing with Big Data using MongoDB, Third Edition

The Definitive Guide to MongoDB, Third Edition, is updated for MongoDB 3 and includes all of the latest MongoDB features, including the aggregation framework introduced in version 2.2 and hashed indexes in version 2.4. The Third Edition also now includes Node.js along with Python. MongoDB is the most popular of the "Big Data" NoSQL database technologies, and it's still growing. David Hows from 10gen, along with experienced MongoDB authors Peter Membrey and Eelco Plugge, provide their expertise and experience in teaching you everything you need to know to become a MongoDB pro.

IBM z/OS V2R2: Security

This IBM® Redbooks® publication helps you to become familiar with the technical changes that were introduced to the security areas with IBM z/OS® V2R2. The following chapters are included: - Chapter 1, “RACF updates” on page 1: In this chapter, we describe the read-only auditor attribute, password security enhancements, RACDCERT (granular certificate administration), UNIX search authority, and RACF Remote sharing facility (RRSF). - Chapter 2, “LDAP updates” on page 13: In this chapter, we describe the activity log enhancements, compatibility level upgrade without LDAP outage, dynamic group performance enhancements, and replication of password policy attributes from a read-only replica. - Chapter 3, “PKI updates” on page 21: In this chapter, we describe the Network Authentication Service (KERBEROS) PKINIT, PKI nxm authorization, PKI OCSP enhancement, and RACDCERT (granular certificate administration) - Chapter 4, “z/OS UNIX search and file execution authority” on page 27: z/OS UNIX search authority, z/OS UNIX file execution, Examples for exploiting the new functions This book is one of a series of IBM Redbooks that take a modular approach to providing information about the updates that are included with z/OS V2R2. This approach has the following goals: - Provide modular content - Group the technical changes into a topic - Provide a more streamlined way of finding relevant information that is based on the topic We hope you find this approach useful and we welcome your feedback.

IBM z/OS V2R2: Storage Management and Utilities

This IBM® Redbooks® publication helps you to become familiar with the technical changes that were introduced into the Storage Management and Utilities areas with IBM z/OS V2R2. This book is one of a series of IBM Redbooks that take a modular approach to providing information about the updates that are included with z/OS V2R2. This approach has the following goals: - Provide modular content - Group the technical changes into a topic - Provide a more streamlined way of finding relevant information that is based on the topic We hope you find this approach useful and we welcome your feedback.

Apache Oozie Essentials

Apache Oozie Essentials serves as your guide to mastering Apache Oozie, a powerful workflow scheduler for Hadoop environments. Through lucid explanations and practical examples, you will learn how to create, schedule, and enhance workflows for data ingestion, processing, and machine learning tasks using Oozie. What this Book will help me do Install and configure Apache Oozie in your Hadoop environment to start managing workflows. Develop seamless workflows that integrate tools like Hive, Pig, and Sqoop to automate data operations. Set up coordinators to handle timed and dependent job executions efficiently. Deploy Spark jobs within your workflows for machine learning on large datasets. Harness Oozie security features to improve your system's reliability and trustworthiness. Author(s) Authored by None Singh, a seasoned developer with a deep understanding of big data processing and Apache Oozie. With their practical experience, the book intersperses technical detail with real-world examples for an effective learning experience. The author's goal is to make Oozie accessible and useful to professionals. Who is it for? This book is ideal for data engineers and Hadoop professionals looking to streamline their workflow management using Apache Oozie. Whether you're a novice to Oozie or aiming to implement complex data and ML pipelines, the book offers comprehensive guidance tailored to your needs.

Accelerating Data Transformation with IBM DB2 Analytics Accelerator for z/OS Understanding and Using Accelerator-only Tables

Transforming data from operational data models to purpose-oriented data structures has been commonplace for the last decades. Data transformations are heavily used in all types of industries to provide information to various users at different levels. Depending on individual needs, the transformed data is stored in various different systems. Sending operational data to other systems for further processing is then required, and introduces much complexity to an existing information technology (IT) infrastructure. Although maintenance of additional hardware and software is one component, potential inconsistencies and individually managed refresh cycles are others. For decades, there was no simple and efficient way to perform data transformations on the source system of operational data. With IBM® DB2® Analytics Accelerator, DB2 for z/OS is now in a unique position to complete these transformations in an efficient and well-performing way. DB2 for z/OS completes these while connecting to the same platform as for operational transactions, helping you to minimize your efforts to manage existing IT infrastructure. Real-time analytics on incoming operational transactions is another demand. Creating a comprehensive scoring model to detect specific patterns inside your data can easily require multiple iterations and multiple hours to complete. By enabling a first set of analytical functionality in DB2 Analytics Accelerator, those dedicated mining algorithms can now be run on an accelerator to efficiently perform these modeling tasks. Given the speed of query processing on an accelerator, these modeling tasks can now be performed much quicker compared to traditional relational database management systems. This speed enables you to keep your scoring algorithms more up-to-date, and ultimately adapt more quickly to constantly changing customer behaviors. This IBM Redbooks® publication describes the new table type that is introduced with DB2 Analytics Accelerator V4.1 PTF5 that enables more efficient data transformations. These tables are called accelerator-only tables, and can exist on an accelerator only. The tables benefit from the accelerator performance characteristics, while maintaining access through existing DB2 for z/OS application programming interfaces (APIs). Additionally, we describe the newly introduced analytical capabilities with DB2 Analytics Accelerator V5.1, putting you in the position to efficiently perform data modeling for online analytical requirements in your DB2 for z/OS environment. This book is intended for technical decision-makers who want to get a broad understanding about the analytical capabilities and accelerator-only tables of DB2 Analytics Accelerator. In addition, you learn about how these capabilities can be used to accelerate in-database transformations and in-database analytics in various environments and scenarios, including the following scenarios: Multi-step processing and reporting in IBM DB2 Query Management Facility™, IBM Campaign, or Microstrategy environments In-database transformations using IBM InfoSphere® DataStage® Ad hoc data analysis for data scientists In-database analytics using IBM SPSS® Modeler

SQL Server AlwaysOn Revealed

Get a fast start to using AlwaysOn, the SQL Server solution to high-availability and disaster recovery. Read this short, 150-page book that is adapted from Peter Carter’s Pro SQL Server Administration to gain a solid and accurate understanding of how to implement systems requiring consistent and continuous uptime. Begin with an introduction to high-availability and disaster recovery concepts such as Recovery Point Objectives (RPOs), Recovery Time Objectives (RTO), availability levels, and the cost of downtime. Then move into detailed coverage of implementing and configuring the AlwaysOn feature set in order to meet the business objectives set by your organization. offers real-world advice on how to build and configure the most appropriate topology to meet the high-availability and disaster recovery requirements you are faced with. Content includes strong coverage on implementing clusters, on building AlwaysOn failover clustered instances, and on configuring AlwaysOn Availability Groups. This is a practical and hand-on book to get you started quickly in using one of the most talked-about SQL Server feature sets. SQL Server AlwaysOn Revealed Teaches you to build HA and DR solutions using the AlwaysOn feature set Provides real-world advice on configuration and performance considerations Demonstrates administrative techniques for the AlwaysOn feature set

IBM CICS Interdependency Analyzer

The IBM® CICS® Interdependency Analyzer (CICS IA®) is a runtime tool for use with IBM CICS Transaction Server for z/OS®. CICS IA allows both system programmers and application developers to get an understanding of the relationships and dependencies of your CICS applications and the environment on which they run. By analyzing data collected by CICS IA, you can make changes to your environment in a safe and controlled but timely manner to address changing demands on your business applications. In this IBM Redbooks® publication, we first provide a detailed overview of what CICS IA is and what business issues it addresses before we review how to configure CICS IA to collect the data that you require with the minimum provenance impact. We then show how you can analyze this data to assist with day-to-day application changes and major projects such as application onboarding.

IBM Financial Transaction Manager for Automated Clearing House Services

Automated Clearing House (ACH) payment volume is increasing every year. NACHA estimates that ACH payments crossed 21 billion several years ago. Financial institutions are re-evaluating their current payment platforms. Financial Transaction Manager is a single interface that can handle ACH needs that cross various platforms. IBM® Financial Transaction Manager for ACH Services provides pre-built support for processing all ACH transactions that flow through financial systems. This includes ingestion, validation, transaction management, and distribution. The robust rules-based environment handles payment routing and exception management, and an automated import and export facility handles ACH processing rules. Further functions include administration, process management, data warehousing, and reporting and extracts. This IBM Redbooks® publication is written for the business analyst (banker), and the computer administrators responsible for configuration of the system. A business analyst can use this book to see what process within Financial Transaction Manger are associated with their banking terms. A bridge is built from banking terms to configuration terms. A system administrator can look into this publication to see exactly how to configure Financial Transaction Manager for ACH to the needs of their financial institution. By creating reference points for both the business analyst and the system administrator, communication and understanding is enhanced as both teams understand each other's terminology and how to use Financial Transaction Manager for ACH.

IBM Wave for z/VM Installation, Implementation, and Exploitation

IBM® Wave for z/VM® (IBM Wave) is a virtualization management solution for IBM z/VM and Linux on z Systems™. This virtualization management software provides a simplified and cost-effective way for companies to harness the consolidation capabilities of the IBM z™ Systems platform and its ability to host the workloads of tens of thousands of commodity servers. IBM Wave is a complete management solution for z Systems based virtual server farms. This IBM Redbooks® publication provides a guide to understanding IBM Wave by providing information about the IBM Wave architecture and how it fits into the cloud. This publication also provides a planning and design guide that is based on common scenarios. This publication also provides installation and configuration task information and how to manage and operate the environment. The intended audience for this publication is IT Architects who are responsible for planning their IBM Wave environments and IT Specialists who are responsible for implementing them.

SAP Data Services 4.x Cookbook

Dive into "SAP Data Services 4.x Cookbook" to master the SAP Data Services platform and learn how to efficiently prepare, implement, and optimize ETL processes. This comprehensive guide makes it easy for you to understand both fundamental and advanced techniques of this powerful tool. What this Book will help me do Develop a thorough understanding of SAP Data Services concepts and architecture. Effectively set up and configure an ETL environment using SAP Data Services. Master advanced ETL design techniques to process and manipulate data effectively. Gain expertise in data cleansing, validation, and applying data quality methods. Build real-time ETL workflows and integrate various data systems seamlessly. Author(s) None Shomnikov is an experienced IT professional specializing in SAP Data Services and ETL processes. With years of practical experience, they bring a wealth of knowledge to help readers grasp concepts quickly and apply them effectively. None enjoys sharing practical solutions to complex problems in a clear and approachable manner. Who is it for? This book is ideal for IT professionals and engineers who are seeking to deepen their understanding of SAP Data Services. Readers should have a basic background in programming concepts and SQL to fully benefit from this book. It is particularly suited for professionals involved in ETL development and data quality management. By the end of the book, you will have a strong grasp of building reliable ETL workflows and managing data services efficiently.

Systems of Insight for Digital Transformation: Using IBM Operational Decision Manager Advanced and Predictive Analytics

Systems of record (SORs) are engines that generates value for your business. Systems of engagement (SOE) are always evolving and generating new customer-centric experiences and new opportunities to capitalize on the value in the systems of record. The highest value is gained when systems of record and systems of engagement are brought together to deliver insight. Systems of insight (SOI) monitor and analyze what is going on with various behaviors in the systems of engagement and information being stored or transacted in the systems of record. SOIs seek new opportunities, risks, and operational behavior that needs to be reported or have action taken to optimize business outcomes. Systems of insight are at the core of the Digital Experience, which tries to derive insights from the enormous amount of data generated by automated processes and customer interactions. Systems of Insight can also provide the ability to apply analytics and rules to real-time data as it flows within, throughout, and beyond the enterprise (applications, databases, mobile, social, Internet of Things) to gain the wanted insight. Deriving this insight is a key step toward being able to make the best decisions and take the most appropriate actions. Examples of such actions are to improve the number of satisfied clients, identify clients at risk of leaving and incentivize them to stay loyal, identify patterns of risk or fraudulent behavior and take action to minimize it as early as possible, and detect patterns of behavior in operational systems and transportation that lead to failures, delays, and maintenance and take early action to minimize risks and costs. IBM® Operational Decision Manager is a decision management platform that provides capabilities that support both event-driven insight patterns, and business-rule-driven scenarios. It also can easily be used in combination with other IBM Analytics solutions, as the detailed examples will show. IBM Operational Decision Manager Advanced, along with complementary IBM software offerings that also provide capability for systems of insight, provides a way to deliver the greatest value to your customers and your business. IBM Operational Decision Manager Advanced brings together data from different sources to recognize meaningful trends and patterns. It empowers business users to define, manage, and automate repeatable operational decisions. As a result, organizations can create and shape customer-centric business moments. This IBM Redbooks® publication explains the key concepts of systems of insight and how to implement a system of insight solution with examples. It is intended for IT architects and professionals who are responsible for implementing a systems of insights solution requiring event-based context pattern detection and deterministic decision services to enhance other analytics solution components with IBM Operational Decision Manager Advanced.

Getting Started with KVM for IBM z Systems

This IBM® Redbooks® publication gives a broad explanation of the kernel-based virtual machine (KVM) for IBM z™ Systems and how it uses the architecture of IBM z Systems™. It focuses on the planning and design of the environment and provides installation and configuration definitions that are necessary to build and manage KVM for IBM z Systems. It also helps you plan, install, and configure IBM Cloud Manager with OpenStack for use with KVM for IBM z Systems in a cloud environment. This book is useful to IT architects and system administrators who plan for and install KVM for IBM z Systems. The reader is expected to have a good understanding of IBM z Systems hardware, KVM, Linux on z Systems, and cloud concepts.

Learning PostgreSQL

Unlock the potential of PostgreSQL, a powerful open-source relational database system, with 'Learning PostgreSQL.' This book takes you through essential concepts of relational databases, SQL syntax, and the advanced features of PostgreSQL, equipping you to build and manage efficient database solutions. What this Book will help me do Learn the foundational concepts behind relational databases and relational algebra. Set up and configure a PostgreSQL server and client for development use. Develop SQL queries for robust data manipulation and retrieval. Implement advanced features of PostgreSQL, including procedural programming with PL/pgSQL. Integrate PostgreSQL with Java applications using JDBC and Hibernate frameworks. Author(s) The authors of 'Learning PostgreSQL,' None Juba, Achim Vannahme, and None Volkov, bring extensive experience and expertise in software development and database management. They have a deep understanding of PostgreSQL as well as its integration with applications. Their collective approach emphasizes practical techniques, real-world scenarios, and enriched learning, making this book a valuable resource for learners of all levels. Who is it for? This book is perfect for students, database developers, and administrators seeking to learn PostgreSQL. It suits beginners with no prior knowledge and helps intermediates deepen their expertise. Readers will learn how to develop, maintain, and optimize PostgreSQL databases, making it ideal for those aiming to advance their database development skills.

Python Geospatial Analysis Cookbook

Explore the fascinating world of geospatial analysis with "Python Geospatial Analysis Cookbook". This guide offers practical, recipe-based solutions for common spatial analysis tasks using Python, helping you tackle real-world spatial challenges effectively. From data preparation to topology checks and network analysis, the book ensures you're equipped to create powerful geospatial applications. What this Book will help me do Understand the projection and coordinate system details of geospatial data to ensure accurate analysis. Transform and manipulate spatial data formats for diverse analysis requirements and projects. Leverage the capabilities of PostGIS within Python for advanced geospatial operations. Apply vector and raster data analysis techniques to solve practical spatial problems. Develop a functional geospatial web application using GeoDjango to demonstrate analysis outputs. Author(s) None Diener is an accomplished professional in the field of geospatial analysis utilizing Python. With years of experience in coding and implementing geospatial systems, Diener bridges the gap between theoretical techniques and practical applications. Their writing is aimed at beginners and professionals alike, delivering clear and precise guidance for building geospatial solutions. Who is it for? This book is perfect for GIS analysts, programmers, data scientists, and researchers with a baseline understanding of geospatial concepts who are looking to enhance their skills. Beginners eager to explore Python's utility in geospatial analysis will also benefit. Whether you're solving intricate spatial problems or building web-based GIS applications, this guide has you covered.

Data Munging with Hadoop

The Example-Rich, Hands-On Guide to Data Munging with Apache Hadoop TM Data scientists spend much of their time “munging” data: handling day-to-day tasks such as data cleansing, normalization, aggregation, sampling, and transformation. These tasks are both critical and surprisingly interesting. Most important, they deepen your understanding of your data’s structure and limitations: crucial insight for improving accuracy and mitigating risk in any analytical project. Now, two leading Hortonworks data scientists, Ofer Mendelevitch and Casey Stella, bring together powerful, practical insights for effective Hadoop-based data munging of large datasets. Drawing on extensive experience with advanced analytics, the authors offer realistic examples that address the common issues you’re most likely to face. They describe each task in detail, presenting example code based on widely used tools such as Pig, Hive, and Spark. This concise, hands-on eBook is valuable for every data scientist, data engineer, and architect who wants to master data munging: not just in theory, but in practice with the field’s #1 platform–Hadoop. Coverage includes A framework for understanding the various types of data quality checks, including cell-based rules, distribution validation, and outlier analysis Assessing tradeoffs in common approaches to imputing missing values Implementing quality checks with Pig or Hive UDFs Transforming raw data into “feature matrix” format for machine learning algorithms Choosing features and instances Implementing text features via “bag-of-words” and NLP techniques Handling time-series data via frequency- or time-domain methods Manipulating feature values to prepare for modeling Data Munging with Hadoop is part of a larger, forthcoming work entitled Data Science Using Hadoop. To be notified when the larger work is available, register your purchase of Data Munging with Hadoop at informit.com/register and check the box “I would like to hear from InformIT and its family of brands about products and special offers.”

Essential SQLAlchemy, 2nd Edition

Dive into SQLAlchemy, the popular, open-source code library that helps Python programmers work with relational databases such as Oracle, MySQL, PostgresSQL, and SQLite. Using real-world examples, this practical guide shows you how to build a simple database application with SQLAlchemy, and how to connect to multiple databases simultaneously with the same metadata. SQL is a powerful language for querying and manipulating data, but it’s tough to integrate it with your application. SQLAlchemy helps you map Python objects to database tables without substantially changing your existing Python code. If you’re an intermediate Python developer with knowledge of basic SQL syntax and relational theory, this book serves as both a learning tool and a handy reference. Essential SQLAlchemy includes several sections: SQLAlchemy Core: Provide database services to your applications in a Pythonic way with the SQL Expression Language SQLAlchemy ORM: Use the object relational mapper to bind database schema and operations to data objects in your application Alembic: Use this lightweight database migration tool to handle changes to the database as your application evolves Cookbook: Learn how to use SQLAlchemy with web frameworks like Flask and libraries like SQLAcodegen

Modeling Service Systems

This book invites the reader on a journey of discovery of service systems. From a Service-Dominant-Logic perspective, such systems are the building blocks of all economic activity, and innovation of new service systems holds the promise of a new industrial revolution. Users navigating web sites, customers interacting with intelligent mobile retail applications, patients interpreting advice from health-care professionals and other sources, students interacting with teachers and learning materials, city dwellers invoking smart service applications for transportation routing, and the unlimited variations of smart service systems that will be enabled by the Internet of Things and other technologies provide ample evidence of the need for service innovation. This book presents an overview of the foundational constructs of service science and models of co-creative systems, with the aim of enabling the reader to be a service innovator. The value proposition of this book is the opportunity to fill each reader's knowledge gaps and offer a comprehensive, coherent, and introductory overview of service system modeling.

Oracle SOA Suite 12c Administrator's Guide

Dive into the world of Oracle SOA Suite 12c administration with this comprehensive guide. You'll learn all the core administrative tasks, including deployments, monitoring, and performance tuning, along with setting up clusters for high availability. This book offers practical step-by-step guidance to help you effectively manage your SOA environment, ensuring optimal performance and reliability. What this Book will help me do Effectively deploy and promote SOA composite applications in Oracle SOA Suite 12c. Monitor services and troubleshoot issues to maintain operational stability. Configure and administer key components like the dehydration store and Oracle Enterprise Scheduler. Set up high availability clusters and deploy robust disaster recovery solutions. Optimize system performance through advanced tuning techniques and best practices. Author(s) The authors, None Pareek, None Dost, and Ahmed Aboulnaga, bring a wealth of experience in Oracle SOA administration. Their combined expertise spans years of industry practice in designing, managing, and optimizing Oracle solutions. Their approach to writing is based on practical guidance and real-world applications, making complex topics accessible and actionable. Who is it for? This book is perfect for administrators of Oracle SOA Suite 12c, ranging from novices who need a comprehensive introduction to experienced professionals looking for advanced tips. It serves those managing middleware infrastructures, especially if your goal is to improve reliability, performance, and scalability of SOA services. If you're seeking to enhance your administration skills with Oracle SOA Suite, this book is for you.

Pro Couchbase Server, Second Edition

This new edition is a hands-on guide for developers and administrators who want to use the power and flexibility of Couchbase Server 4.0 in their applications. The second edition extends coverage of N1QL, the SQL-like query language for Couchbase. It also brings coverage of multiple new features, including the new generation of client SDKs, security and LDAP integration, secondary indexes, and multi-dimensional scaling. Pro Couchbase Server covers everything you need to develop Couchbase solutions and deploy them in production. The NoSQL movement has fundamentally changed the database world in recent years. Influenced by the growing needs of web-scale applications, NoSQL databases such as Couchbase Server provide new approaches to scalability, reliability, and performance. Never have document databases been so powerful and performant. With the power and flexibility of Couchbase Server, you can model your data however you want, and easily change the data model any time you want. Pro Couchbase Server shows what is possible and helps you take full advantage of Couchbase Server and all the performance and scalability that it offers. • Helps you design and develop a document database using Couchbase Server. • Covers the latest features such as the N1QL query language. • Gives you the tools to scale out your application as needed.

Data Lake Development with Big Data

In "Data Lake Development with Big Data," you will explore the fundamental principles and techniques for constructing and managing a Data Lake tailored for your organization's big data challenges. This book provides practical advice and architectural strategies for ingesting, managing, and analyzing large-scale data efficiently and effectively. What this Book will help me do Learn how to architect a Data Lake from scratch tailored to your organizational needs. Master techniques for ingesting data using real-time and batch processing frameworks efficiently. Understand data governance, quality, and security considerations essential for scalable Data Lakes. Discover strategies for enabling users to explore data within the Data Lake effectively. Gain insights into integrating Data Lakes with Big Data analytic applications for high performance. Author(s) None Pasupuleti and Beulah Salome Purra bring their extensive expertise in big data and enterprise data management to this book. With years of hands-on experience designing and managing large-scale data architectures, their insights are rooted in practical knowledge and proven techniques. Who is it for? This book is ideal for data architects and senior managers tasked with adapting or creating scalable data solutions in enterprise contexts. Readers should have foundational knowledge of master data management and be familiar with Big Data technologies to derive maximum value from the content presented.