talk-data.com talk-data.com

Topic

data-engineering

3377

tagged

Activity Trend

1 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: O'Reilly Data Engineering Books ×
Mastering Apache Storm

Mastering Apache Storm is your step-by-step guide to mastering real-time data streaming with this robust framework. You'll learn how to process big data efficiently and integrate Apache Storm with popular technologies like Kafka, HBase, and Redis to maximize its potential. This book walks you through from basic concepts to advanced implementations of Apache Storm in real-world scenarios. What this Book will help me do Understand the core features and operation of Apache Storm for real-time data streaming. Integrate Apache Storm with other Big Data frameworks like Kafka, HBase, Redis, and Hadoop. Effectively deploy and manage multi-node Apache Storm clusters in real-world environments. Monitor and analyze your data streams and system health effectively using built-in and external tools. Learn to implement fault-tolerant, scalable, and distributed stream processing applications in Apache Storm. Author(s) None Jain is an experienced software developer and technical instructor specializing in distributed systems and real-time data processing. With years of experience working with Apache Storm and related technologies, their teachings focus on practical, hands-on learning to equip readers with actionable skills. Who is it for? This book is ideal for Java developers aspiring to build expertise in real-time data streaming and distributed processing applications using Apache Storm. Beginners can start with the fundamentals provided, while those with prior knowledge can delve into intermediate and advanced implementations.

Essentials of Cloud Application Development on IBM Bluemix

Abstract This IBM® Redbooks® publication is based on the Presentations Guide of the course Essentials of Cloud Application Development on IBM Bluemix that was developed by the IBM Redbooks team in partnership with IBM Skills Academy Program. This course is designed to teach university students the basic skills that are required to develop, deploy, and test cloud-based applications that use the IBM Bluemix® cloud services. The primary target audience for this course is university students in undergraduate computer science and computer engineer programs with no previous experience working in cloud environments. However, anyone new to cloud computing can also benefit from this course. After completing this course, you should be able to accomplish the following tasks: Define cloud computing Describe the factors that lead to the adoption of cloud computing Describe the choices that developers have when creating cloud applications Describe infrastructure as a service, platform as a service, and software as a service Describe IBM Bluemix and its architecture Identify the runtimes and services that IBM Bluemix offers Describe IBM Bluemix infrastructure types Create an application in IBM Bluemix Describe the IBM Bluemix dashboard, catalog, and documentation features Explain how the application route is used to test an application from the browser Create services in IBM Bluemix Describe how to bind services to an application in IBM Bluemix Describe the environment variables that are used with IBM Bluemix services Explain what are IBM Bluemix organizations, domains, spaces, and users Describe how to create an IBM SDK for Node.js application that runs on IBM Bluemix Explain how to manage your IBM Bluemix account with the Cloud Foundry CLI Describe how to set up and use the IBM Bluemix plug-in for Eclipse Describe the role of Node.js for server-side scripting Describe IBM Bluemix DevOps Services and the capabilities of IBM DevOps Services Identify the Web IDE features in IBM Bluemix DevOps Describe how to connect a Git repository client to Bluemix DevOps Services project Explain the pipeline build and deploy processes that IBM Bluemix DevOps Services use Describe how IBM Bluemix DevOps Services integrate with the IBM Bluemix cloud Describe the agile planning tools in IBM Bluemix Describe the characteristics of REST APIs Explain the advantages of the JSON data format Describe an example of REST APIs using Watson Describe the main types of data services in IBM Bluemix Describe the benefits of IBM Cloudant® Explain how Cloudant databases and documents are accessed from IBM Bluemix Describe how to use REST APIs to interact with Cloudant database Describe Bluemix mobile backend as a service (MBaaS) and the MBaaS architecture Describe the Push Notifications service Describe the App ID service Describe the Kinetise service Describe how to create Bluemix Mobile applications by using MobileFirst Services Starter Boilerplate The workshop materials were created in June 2017. Therefore, all IBM Bluemix features that are described in this Presentations Guide and IBM Bluemix user interfaces that are used in the examples are current as of June 2017.

IBM Spectrum Archive Enterprise Edition V1.2.4: Installation and Configuration Guide

Abstract This IBM® Redbooks® publication helps you with the planning, installation, and configuration of the new IBM Spectrum™ Archive (formerly IBM Linear Tape File System™ (LTFS)) Enterprise Edition (EE) V1.2.4.0 for the IBM TS3310, IBM TS3500, and IBM TS4500 tape libraries. IBM Spectrum Archive™ EE enables the use of the LTFS for the policy management of tape as a storage tier in an IBM Spectrum Scale™ based environment and helps encourage the use of tape as a critical tier in the storage environment. This is the fourth edition of IBM Spectrum Archive V1.2 (SG24-8333) although it is based on the prior editions of IBM Linear Tape File System Enterprise Edition V1.1.1.2: Installation and Configuration Guide, SG24-8143. IBM Spectrum Archive EE can run any application that is designed for disk files on a physical tape media. IBM Spectrum Archive EE supports the IBM Linear Tape-Open (LTO) Ultrium 7, 6, and 5 tape drives in IBM TS3310, TS3500, and TS4500 tape libraries. In addition, IBM TS1155, TS1150, and TS1140 tape drives are supported in TS3500 and TS4500 tape library configurations. IBM Spectrum Archive EE can play a major role in reducing the cost of storage for data that does not need the access performance of primary disk. The use of IBM Spectrum Archive EE to replace disks with physical tape in tier 2 and tier 3 storage can improve data access over other storage solutions because it improves efficiency and streamlines management for files on tape. IBM Spectrum Archive EE simplifies the use of tape by making it transparent to the user and manageable by the administrator under a single infrastructure. This publication is intended for anyone who wants to understand more about IBM Spectrum Archive EE planning and implementation. This book is suitable for IBM clients, IBM Business Partners, IBM specialist sales representatives, and technical specialists.

Data Warehousing with Greenplum

Relational databases haven’t gone away, but they are evolving to integrate messy, disjointed unstructured data into a cleansed repository for analytics. With the execution of massively parallel processing (MPP), the latest generation of analytic data warehouses is helping organizations move beyond business intelligence to processing a variety of advanced analytic workloads. These MPP databases expose their power with the familiarity of SQL. This report introduces the Greenplum Database, recently released as an open source project by Pivotal Software. Lead author Marshall Presser of Pivotal Data Engineering takes you through the Greenplum approach to data analytics and data-driven decisions, beginning with Greenplum’s shared-nothing architecture. You’ll explore data organization and storage, data loading, running queries, as well as performing analytics in the database. You’ll learn: How each networked node in Greenplum’s architecture features an independent operating system, memory, and storage Four deployment options to help you balance security, cost, and time to usability Ways to organize data, including distribution, storage, partitioning, and loading How to use Apache MADlib for in-database analytics, and GPText to process and analyze free-form text Tools for monitoring, managing, securing, and optimizing query responses available in the Pivotal Greenplum commercial database

Mastering Complexity

The author covers fourteen tools to help you find the information you need and offers step-by-step instructions for constructing each one. He shows you how these tools can be combined with a set of simple problem-solving steps that can act as a powerful change agent to help reduce or eliminate process problems. Five-Step Problem-Solving Process Identify the problem: Clearly state what needs improvement. Analyze: Determine what causes the problem to occur. Evaluate Alternatives: Identify and select actions to reduce or eliminate the problem. Test Implement: Implement these actions on a trial basis to determine their effectiveness. Standardize: Ensure that useful actions are preserved.

Apache Spark 2.x for Java Developers

Delve into mastering big data processing with 'Apache Spark 2.x for Java Developers.' This book provides a practical guide to implementing Apache Spark using the Java APIs, offering a unique opportunity for Java developers to leverage Spark's powerful framework without transitioning to Scala. What this Book will help me do Learn how to process data from formats like XML, JSON, CSV using Spark Core. Implement real-time analytics using Spark Streaming and third-party tools like Kafka. Understand data querying with Spark SQL and master SQL schema processing. Apply machine learning techniques with Spark MLlib to real-world scenarios. Explore graph processing and analytics using Spark GraphX. Author(s) None Kumar and None Gulati, experienced professionals in Java development and big data, bring their wealth of practical experience and passion for teaching to this book. With a clear and concise writing style, they aim to simplify Spark for Java developers, making big data approachable. Who is it for? This book is perfect for Java developers who are eager to expand their skillset into big data processing with Apache Spark. Whether you are a seasoned Spark user or first diving into big data concepts, this book meets you at your level. With practical examples and straightforward explanations, you can unlock the potential of Spark in real-world scenarios.

IBM z14 Technical Introduction

Abstract This IBM® Redpaper Redbooks® publication introduces the latest IBM Z platform, the IBM z14®. It includes information about the Z environment and how it helps integrate data and transactions more securely, and can infuse insight for faster and more accurate business decisions. The z14 is state-of-the-art data and transaction system that delivers advanced capabilities, which are vital to the digital era and the trust economy. These capabilities include: - Securing data with pervasive encryption - Transforming a transactional platform into a data powerhouse - Getting more out of the platform with IT Operational Analytics - Providing resilience with key to zero downtime - Accelerating digital transformation with agile service delivery - Revolutionizing business processes - Blending open source and Z technologies This book explains how this system uses both new innovations and traditional Z strengths to satisfy growing demand for cloud, analytics, and security. With the z14 as the base, applications can run in a trusted, reliable, and secure environment that both improves operations and lessens business risk.

Mastering Apache Spark 2.x - Second Edition

Mastering Apache Spark 2.x is the essential guide to harnessing the power of big data processing. Dive into real-time data analytics, machine learning, and cluster computing using Apache Spark's advanced features and modules like Spark SQL and MLlib. What this Book will help me do Gain proficiency in Spark's batch and real-time data processing with SparkSQL. Master techniques for machine learning and deep learning using SparkML and SystemML. Understand the principles of Spark's graph processing with GraphX and GraphFrames. Learn to deploy Apache Spark efficiently on platforms like Kubernetes and IBM Cloud. Optimize Spark cluster performance by configuring parameters effectively. Author(s) Romeo Kienzler is a seasoned professional in big data and machine learning technologies. With years of experience in cloud-based distributed systems, Romeo brings practical insights into leveraging Apache Spark. He combines his deep technical expertise with a clear and engaging writing style. Who is it for? This book is tailored for intermediate Apache Spark users eager to deepen their knowledge in Spark 2.x's advanced features. Ideal for data engineers and big data professionals seeking to enhance their analytics pipelines with Spark. A basic understanding of Spark and Scala is necessary. If you're aiming to optimize Spark for real-world applications, this book is crafted for you.

SQL Server 2016 High Availability Unleashed (includes Content Update Program)

Book + Content Update Program SQL Server 2016 High Availability Unleashed provides start-to-finish coverage of SQL Server’s powerful high availability (HA) solutions for your traditional on-premise databases, cloud-based databases (Azure or AWS), hybrid databases (on-premise coupled with the cloud), and your emerging Big Data solutions. This complete guide introduces an easy-to-follow, formal HA methodology that has been refined over the past several years and helps you identity the right HA solution for your needs. There is also additional coverage of both disaster recovery and business continuity architectures and considerations. You are provided with step-by-step guides, examples, and sample code to help you set up, manage, and administer these highly available solutions. All examples are based on existing production deployments at major Fortune 500 companies around the globe. This book is for all intermediate-to-advanced SQL Server and Big Data professionals, but is also organized so that the first few chapters are great foundation reading for CIOs, CTOs, and even some tech-savvy CFOs. Learn a formal, high availability methodology for understanding and selecting the right HA solution for your needs Deep dive into Microsoft Cluster Services Use selective data replication topologies Explore thorough details on AlwaysOn and availability groups Learn about HA options with log shipping and database mirroring/ snapshots Get details on Microsoft Azure for Big Data and Azure SQL Explore business continuity and disaster recovery Learn about on-premise, cloud, and hybrid deployments Provide all types of database needs, including online transaction processing, data warehouse and business intelligence, and Big Data Explore the future of HA and disaster recovery In addition, this book is part of InformIT’s exciting Content Update Program, which provides content updates for major technology improvements! As significant updates are made to SQL Server, sections of this book will be updated or new sections will be added to match the updates to the technologies. As updates become available, they will be delivered to you via a free Web Edition of this book, which can be accessed with any Internet connection. To learn more, visit informit.com/cup. How to access the Web Edition: Follow the instructions inside to learn how to register your book to access the FREE Web Edition. * The companion material is not available with the online edition on O'Reilly Learning

IBM Db2: Investigating Automatic Storage Table Spaces and Data Skew

The scope of this IBM® Redpaper™ publication is to provide a high-level overview of automatic storage table spaces, table space maps, table space extent maps, and physically unbalanced data across automatic storage table space containers (that is, data skew). The objective of this paper is to investigate causes of data skew and make suggestions for how to resolve it. This paper is for Database Administrators (DBAs) of IBM Db2®; the DBAs should have general Db2 knowledge and skills. The environment used for the creation of this document is Db2 Version 11.1, and an IBM AIX® operating system. This document is based on results of testing various scenarios.

IBM Spectrum Accelerate Deployment, Usage, and Maintenance

Abstract This edition applies to IBM® Spectrum Accelerate V11.5.4. IBM Spectrum Accelerate™, a member of IBM Spectrum Storage™, is an agile, software-defined storage solution for enterprise and cloud that builds on the customer-proven and mature IBM XIV® storage software. The key characteristic of Spectrum Accelerate is that it can be easily deployed and run on purpose-built or existing hardware that is chosen by the customer. IBM Spectrum Accelerate enables rapid deployment of high-performance and scalable block data storage infrastructure over commodity hardware on-premises or off-premises. This IBM Redbooks® publication provides a broad understanding of IBM Spectrum Accelerate. The book introduces Spectrum Accelerate and describes planning and preparation that are essential for a successful deployment of the solution. The deployment is described through a step-by-step approach, by using a graphical user interface (GUI) based method or a simple command-line interface (CLI) based procedure. Chapters in this book describe the logical configuration of the system, host support and business continuity functions, and migration. Although it makes many references to the XIV storage software, the book also emphasizes where IBM Spectrum Accelerate differs from XIV. Finally, a substantial portion of the book is dedicated to maintenance and troubleshooting to provide detailed guidance for the customer support personnel.

SAP MII : Functional and Technical Concepts in Manufacturing Industries

Leverage the flexibility and power of SAP MII to integrate your business operations with your manufacturing processes. You'll explore important new features of the product and see how to apply best practices to connect all the stakeholders in your business. This book starts with an overview of SAP's manufacturing integration and intelligence application and explains why it is so important. You'll then see how it is applied in various manufacturing sectors. The biggest challenge in manufacturing industries is to reduce the manual work and human intervention so that the process becomes automatic. SAP MII explains how to bridge the gap between management and production and bring sound vital information to the shop floor in real time. With this book you'll see how to ensure existing manufacturing and information systems share a common interface for all users in your enterprise. What You'll Learn Understand the functional aspects of SAP MII Implement SAP MII in different Manufacturing sectors Explore new technical features of SAP MII 12.x Integrate scenarios with SAP MII Discover practice guidelines Who This Book is for All levels of SAP manufacturing professionals.

IBM Z Connectivity Handbook

Abstract This IBM® Redbooks® publication describes the connectivity options that are available for use within and beyond the data center for the IBM Z family of mainframes, which includes these systems: IBM z14 IBM z13® IBM z13s™ IBM zEnterprise® EC12 (zEC12) IBM zEnterprise BC12 (zBC12) This book highlights the hardware and software components, functions, typical uses, coexistence, and relative merits of these connectivity features. It helps readers understand the connectivity alternatives that are available when planning and designing their data center infrastructures. The changes to this edition are based on the IBM Z hardware announcement dated 17 July, 2017. This book is intended for data center planners, IT professionals, systems engineers, and network planners who are involved in the planning of connectivity solutions for IBM mainframes.

Moving Hadoop to the Cloud

Until recently, Hadoop deployments existed on hardware owned and run by organizations. Now, of course, you can acquire the computing resources and network connectivity to run Hadoop clusters in the cloud. But there’s a lot more to deploying Hadoop to the public cloud than simply renting machines. This hands-on guide shows developers and systems administrators familiar with Hadoop how to install, use, and manage cloud-born clusters efficiently. You’ll learn how to architect clusters that work with cloud-provider features—not just to avoid pitfalls, but also to take full advantage of these services. You’ll also compare the Amazon, Google, and Microsoft clouds, and learn how to set up clusters in each of them. Learn how Hadoop clusters run in the cloud, the problems they can help you solve, and their potential drawbacks Examine the common concepts of cloud providers, including compute capabilities, networking and security, and storage Build a functional Hadoop cluster on cloud infrastructure, and learn what the major providers require Explore use cases for high availability, relational data with Hive, and complex analytics with Spark Get patterns and practices for running cloud clusters, from designing for price and security to dealing with maintenance

Learning SAP Analytics Cloud

Discover the power of SAP Analytics Cloud in solving business intelligence challenges through concise and clear instruction. This book is the essential guide for beginners, providing you a comprehensive understanding of the platform's features and capabilities. By the end, you'll master creating reports, models, and dashboards, making data-driven decisions with confidence. What this Book will help me do Learn how to navigate and utilize the SAP Analytics Cloud interface effectively. Create data models using various sources like Excel or text files for comprehensive insights. Design and compile visually engaging stories, reports, and dashboards effortlessly. Master collaborative and presentation tools inside SAP Digital Boardroom. Understand how to plan, predict, and analyze seamlessly within a single platform. Author(s) None Ahmed is an experienced SAP consultant and analytics professional, bringing years of practical experience in BI tools and enterprise analytics. As an expert in SAP Analytics Cloud, None has guided numerous teams in deploying effective analytics solutions. Their writing aims to demystify complex tools for learners. Who is it for? This book is ideal for IT professionals, business analysts, and newcomers eager to understand SAP Analytics Cloud. Beginner-level BI developers and managers seeking guided steps for mastering this platform will find it invaluable. If you aim to enhance your career in cloud-based analytics, this book is tailored for you.

Building on Multi-Model Databases

In many organizations today, businesspeople are busy requesting unified views of data stored across multiple sources within their organizations. But integrating multiple data types from multiple data stores is a complex, error-prone, and time-consuming process of cobbling everything together manually. This concise book examines how multi-model databases can help you integrate data storage and access across your organization in a seamless and elegant way. Author Pete Aven and Diane Burley from MarkLogic explain how this latest evolution in data management naturally accepts heterogeneous data, enabling you to eventually phase out technical data silos. Through several case studies, you’ll discover how organizations use multi-model databases to reduce complexity, save money, take advantage of opportunities, lessen risk, and shorten time to value. Get unified views across disparate data models and formats within a single database Learn how multi-model databases leverage the inherent structure of the data being stored Load and use unstructured and semi-structured data (such as documents and text) as is Provide agility in data access and delivery through APIs, interfaces, and indexes Learn how to scale a multi-model database, and provide ACID capabilities and security Examine how a multi-model database would fit into your existing architecture

Streaming Data

Streaming Data introduces the concepts and requirements of streaming and real-time data systems. The book is an idea-rich tutorial that teaches you to think about how to efficiently interact with fast-flowing data. About the Technology As humans, we're constantly filtering and deciphering the information streaming toward us. In the same way, streaming data applications can accomplish amazing tasks like reading live location data to recommend nearby services, tracking faults with machinery in real time, and sending digital receipts before your customers leave the shop. Recent advances in streaming data technology and techniques make it possible for any developer to build these applications if they have the right mindset. This book will let you join them. About the Book Streaming Data is an idea-rich tutorial that teaches you to think about efficiently interacting with fast-flowing data. Through relevant examples and illustrated use cases, you'll explore designs for applications that read, analyze, share, and store streaming data. Along the way, you'll discover the roles of key technologies like Spark, Storm, Kafka, Flink, RabbitMQ, and more. This book offers the perfect balance between big-picture thinking and implementation details. What's Inside The right way to collect real-time data Architecting a streaming pipeline Analyzing the data Which technologies to use and when About the Reader Written for developers familiar with relational database concepts. No experience with streaming or real-time applications required. About the Author Andrew Psaltis is a software engineer focused on massively scalable real-time analytics. Quotes The definitive book if you want to master the architecture of an enterprise-grade streaming application. - Sergio Fernandez Gonzalez, Accenture A thorough explanation and examination of the different systems, strategies, and tools for streaming data implementations. - Kosmas Chatzimichalis, Mach 7x A well-structured way to learn about streaming data and how to put it into practice in modern real-time systems. - Giuliano Araujo Bertoti, FATEC This book is all you need to understand what streaming is all about! - Carlos Curotto, Globant

Building Custom Tasks for SQL Server Integration Services

Learn to build custom SSIS tasks using Visual Studio Community Edition and Visual Basic. Bring all the power of Microsoft .NET to bear on your data integration and ETL processes, and for no added cost over what you’ve already spent on licensing SQL Server. If you already have a license for SQL Server, then you do not need to spend more money to extend SSIS with custom tasks and components. Why are custom components necessary? Because even though the SSIS catalog of built-in tasks and components is a marvel of engineering, there do remain gaps in the functionality that is provided. These gaps are especially relevant to enterprises practicing Data Integration Lifecycle Management (DILMS) and/or DevOps. One of the gaps is a limitation of the SSIS Execute Package task. Developers using the stock version of that task are unable to select SSIS packages from other projects. Yet it’s useful to be able to select and execute tasks across projects, and the example used throughout this book will help you to create an Execute Catalog Package task that does in fact allow you to execute a task from another project. Building on the example’s pattern, you can create any task that you like, custom tailored to your specific, data integration and ETL needs. What You Will Learn Configure and execute Visual Studio in the way that best supports SSIS task development Create a class library as the basis for an SSIS task, and reference the needed SSIS assemblies Properly sign assemblies that you create in order to invoke them from your task Implement source code control via Visual Studio Team Services, or your own favorite tool set Code not only your tasks themselves, but also the associated task editors Troubleshoot and then execute your custom tasks as part of your own project Who This Book Is For Database administrators and developers who are involved in ETL projects built around SQL Server Integration Services (SSIS). Readers should have a background in programming along with a desire to optimize their ETL efforts by creating custom-tailored tasks for execution from SSIS packages.

JSON at Work

JSON is becoming the backbone for meaningful data interchange over the internet. This format is now supported by an entire ecosystem of standards, tools, and technologies for building truly elegant, useful, and efficient applications. With this hands-on guide, author and architect Tom Marrs shows you how to build enterprise-class applications and services by leveraging JSON tooling and message/document design. JSON at Work provides application architects and developers with guidelines, best practices, and use cases, along with lots of real-world examples and code samples. You’ll start with a comprehensive JSON overview, explore the JSON ecosystem, and then dive into JSON’s use in the enterprise. Get acquainted with JSON basics and learn how to model JSON data Learn how to use JSON with Node.js, Ruby on Rails, and Java Structure JSON documents with JSON Schema to design and test APIs Search the contents of JSON documents with JSON Search tools Convert JSON documents to other data formats with JSON Transform tools Compare JSON-based hypermedia formats, including HAL and jsonapi Leverage MongoDB to store and access JSON documents Use Apache Kafka to exchange JSON-based messages between services