talk-data.com talk-data.com

Topic

data-engineering

3395

tagged

Activity Trend

1 peak/qtr
2020-Q1 2026-Q1

Activities

3395 activities · Newest first

Cloud-Native Microservices with Apache Pulsar: Build Distributed Messaging Microservices

Apply different enterprise integration and processing strategies available with Pulsar, Apache's multi-tenant, high-performance, cloud-native messaging and streaming platform. This book is a comprehensive guide that examines using Pulsar Java libraries to build distributed applications with message-driven architecture. You'll begin with an introduction to Apache Pulsar architecture. The first few chapters build a foundation of message-driven architecture. Next, you'll perform a setup of all the required Pulsar components. The book also covers work with Apache Pulsar client library to build producers and consumers for the discussed patterns. You'll then explore the transformation, filter, resiliency, and tracing capabilities available with Pulsar. Moving forward, the book will discuss best practices when building message schemas and demonstrate integration patterns using microservices. Security is an important aspect of any application;the book will cover authentication and authorization in Apache Pulsar such as Transport Layer Security (TLS), OAuth 2.0, and JSON Web Token (JWT). The final chapters will cover Apache Pulsar deployment in Kubernetes. You'll build microservices and serverless components such as AWS Lambda integrated with Apache Pulsar on Kubernetes. After completing the book, you'll be able to comfortably work with the large set of out-of-the-box integration options offered by Apache Pulsar. What You'll Learn Examine the important Apache Pulsar components Build applications using Apache Pulsar client libraries Use Apache Pulsar effectively with microservices Deploy Apache Pulsar to the cloud Who This Book Is For Cloud architects and software developers who build systems in the cloud-native technologies.

The Language of SQL, 3rd Edition

Get Started Fast with SQL! The only book you need to gain a quick working knowledge of SQL and relational databases. Many SQL texts attempt to serve as an encyclopedic reference on SQL syntaxan approach that is often counterproductive because that information is readily available in online references published by the major database vendors. For SQL beginners, its more important for a book to focus on general concepts and to offer clear explanations and examples of what various SQL statements can accomplish. This is that book. Several features make The Language of SQL unique among introductory SQL books. First, you will not be required to download software or sit with a computer as you read the text. The intent of this book is to provide examples of SQL usage that can be understood simply by reading. Second, topics are organized in an intuitive and logical sequence. SQL keywords are introduced one at a time, allowing you to grow your understanding as you encounter new terms and concepts. Finally, this book covers the syntax of the latest releases of three widely used databases: Microsoft SQL Server 2019, MySQL 8.0, and Oracle 18c. Special Database Differences sidebars clearly show you any differences in syntax among these three databases, and instructions are included on how to obtain and install free versions of the databases. Use SQL to retrieve data from relational databases Apply functions and calculations to data Group and summarize data in a variety of useful ways Use complex logic to retrieve only the data you need Design relational databases so that data retrieval is easy and intuitive Update data and create new tables Use spreadsheets to transform your data into meaningful displays Retrieve data from multiple tables via joins, subqueries, views, and set logic Create, modify, and execute stored procedures Install Microsoft SQL Server, MySQL, or Oracle

Efficient MySQL Performance

You'll find several books on basic or advanced MySQL performance, but nothing in between. That's because explaining MySQL performance without addressing its complexity is difficult. This practical book bridges the gap by teaching software engineers mid-level MySQL knowledge beyond the fundamentals, but well shy of deep-level internals required by database administrators (DBAs). Daniel Nichter shows you how to apply the best practices and techniques that directly affect MySQL performance. You'll learn how to improve performance by analyzing query execution, indexing for common SQL clauses and table joins, optimizing data access, and understanding the most important MySQL metrics. You'll also discover how replication, transactions, row locking, and the cloud influenceMySQL performance. Understand why query response time is the North Star of MySQL performance Learn query metrics in detail, including aggregation, reporting, and analysis See how to index effectively for common SQL clauses and table joins Explore the most important server metrics and what they reveal about performance Dive into transactions and row locking to gain deep, actionable insight Achieve remarkable MySQL performance at any scale

IBM Supply Chain Transformation

In the midst of global disruptions, every element of IBM® Supply Chain has been affected. the IBM cognitive supply chain is positioned to win the future by using the exponential technologies that are inherent to our supply chains, and with flexibility, resiliency, and end-to-end visibility. The constant commitment of IBM to building smarter supply chains over the past decade has primed IBM to quickly and effectively navigate these disruptions and course-correct by using cognitive innovation. As a result, IBM Supply Chain teams were able to deliver exceptional outcomes without client disruption. In addition, this widespread impact inspired numerous new solutions that include exponential technologies that better prepare IBM for future disruptions in constantly changing markets.

Innovative SAP SuccessFactors Recruiting: A Guide to Creating Custom Integration and Automation

Get creative and optimize your SAP SuccessFactors Recruiting implementation with this guide, which examines a variety of integration and automation opportunities throughout the recruiting process outside of the standard integrations. Innovative SAP SuccessFactors Recruiting walks you through the end-to-end recruiting process and highlights opportunities to create interfaces and automation at each stage using a variety of methods and tools. After a brief overview of the market demands driving growth in this area and an introduction to OData, Anand Athanur, Mark Ingram and Michael A. Wellens detail each step in the recruiting process, starting with automating and integrating requisition creation using APIs and middleware. They then explore ways of enhancing candidate attraction and experience for the initial application process. After that, they jump into automation for overall candidate selection and processing, including automation using Robotic Process Automation, Integration center, the assessment integration framework, custom OData integrations, the background check integration framework, and Business Rules. Additionally, you’ll be shown onboarding optimization techniques using Intelligent Services, as well as hiring into third-party HRIS systems. After finishing this book, you will have a thorough understanding of how to utilize SAP SuccessFactors to recruit the right candidates for every position. What You Will Learn Integrate and automate the requisition creation process in innovative ways outside of SAP documentation Enhance candidate attraction and experience Leverage integration and automation opportunities within the application processing stage Automate hiring into third-party HRIS systems Who this Book For Customers, Consultants, and 3rd Party Vendors wishing to connect their solutions to SAP SuccessFactors Recruiting.

High Performance MySQL, 4th Edition

How can you realize MySQL's full power? With High Performance MySQL, you'll learn advanced techniques for everything from setting service-level objectives to designing schemas, indexes, and queries to tuning your server, operating system, and hardware to achieve your platform's full potential. This guide also teaches database administrators safe and practical ways to scale applications through replication, load balancing, high availability, and failover. Updated to reflect recent advances in cloud- and self-hosted MySQL, InnoDB performance, and new features and tools, this revised edition helps you design a relational data platform that will scale with your business. You'll learn best practices for database security along with hard-earned lessons in both performance and database stability. Dive into MySQL's architecture, including key facts about its storage engines Learn how server configuration works with your hardware and deployment choices Make query performance part of your software delivery process Examine enhancements to MySQL's replication and high availability Compare different MySQL offerings in managed cloud environments Explore MySQL's full stack optimization from application-side configuration to server tuning Turn traditional database management tasks into automated processes

Storage as a Service Offering Guide

IBM® Storage as a Service (STaaS) extends your hybrid cloud experience with a new flexible consumption model enabled for both your on-premises and hybrid cloud infrastructure needs, giving you the agility, cash flow efficiency, and services of cloud storage with the flexibility to dynamically scale up or down and only pay for what you use beyond the minimal capacity. This IBM Redpaper provides a detailed introduction to the IBM STaaS service. The paper is targeted for data center managers and storage administrators.

IBM HyperSwap and Multi-site HA/DR for IBM FlashSystem A9000 and A9000R

IBM® HyperSwap® is the high availability (HA) solution that provides continuous data availability in case of hardware failure, power failure, connectivity failure, or disasters. The HyperSwap capability is available for IBM FlashSystem® A9000 and IBM FlashSystem A9000R, starting with software version 12.2.1. Version 12.3 introduces a function that combines HyperSwap and Asynchronous replication, which creates a solution that entails HA and Disaster Recovery (DR). One side of the HyperSwap pair has an active async link to the third system, and the other side has a standby link. Known as Multi-site HA/DR, this configuration provides HyperSwap active-active HA while keeping data mirrored to a third copy to ensure two levels of business continuity. This IBM Redpaper™ publication gives a broad understanding of the architecture, design, and implementation of HyperSwap and Multi-site HA/DR solution. It also discusses and illustrates various use cases pertaining to their use and functionality. This paper is intended for those users who want to deploy solutions that take advantage of HyperSwap and Multi-site HA/DR for FlashSystem A9000 and A9000R.

Expert Oracle Database Architecture: Techniques and Solutions for High Performance and Productivity

Now in its fourth edition and covering Oracle Database 21c, this best-selling book continues to bring you some of the best thinking on how to apply Oracle Database to produce scalable applications that perform well and deliver correct results. Tom Kyte and Darl Kuhn share a simple philosophy: "you can treat Oracle as a black box and just stick data into it, or you can understand how it works and exploit it as a powerful computing environment." If you choose the latter, then you’ll find that there are few information management problems that you cannot solve quickly and elegantly. This fully revised fourth edition covers the developments and new features up to Oracle Database 21c. Up-to-date features are covered for tables, indexes, data types, sequences, partitioning, data loading, temporary tables, and more. All the examples are demonstrated using modern techniques and are executed in container and pluggable databases. The book’s proof-by-example approach encourages you to let evidence be your guide. Try something. See the result. Understand why the result is what it is. Apply your newfound knowledge with confidence. The book covers features by explaining how each one works, how to implement software using it, and the common pitfalls associated with it. Don’t treat Oracle Database as a black box. Get this book. Dive deeply into Oracle Database’s most powerful features that many do not invest the time to learn about. Set yourself apart from your competition and turbo-charge your career. What You Will Learn Identify and effectively resolve application performance issues and bottlenecks Architect systems to leverage the full power and feature set of Oracle’s database engine Configure a database to maximize the use of memory structures and background processes Understand internal locking and latching technology and how it impacts your system Proactively recommend best practices around performance for table and index structures Take advantage of advanced features such as table partitioning and parallel execution Who This Book Is For Oracle developers and Oracle DBAs. If you’re a developer and want a stronger understanding of Oracle features and architecture that will enable your applications to scale regardless of the workload, this book is for you. If you’re a DBA and want to intelligently work with developers to design applications that effectively leverage Oracle technology, then look no further.

Kafka: The Definitive Guide, 2nd Edition

Every enterprise application creates data, whether it consists of log messages, metrics, user activity, or outgoing messages. Moving all this data is just as important as the data itself. With this updated edition, application architects, developers, and production engineers new to the Kafka streaming platform will learn how to handle data in motion. Additional chapters cover Kafka's AdminClient API, transactions, new security features, and tooling changes. Engineers from Confluent and LinkedIn responsible for developing Kafka explain how to deploy production Kafka clusters, write reliable event-driven microservices, and build scalable stream processing applications with this platform. Through detailed examples, you'll learn Kafka's design principles, reliability guarantees, key APIs, and architecture details, including the replication protocol, the controller, and the storage layer. You'll examine: Best practices for deploying and configuring Kafka Kafka producers and consumers for writing and reading messages Patterns and use-case requirements to ensure reliable data delivery Best practices for building data pipelines and applications with Kafka How to perform monitoring, tuning, and maintenance tasks with Kafka in production The most critical metrics among Kafka's operational measurements Kafka's delivery capabilities for stream processing systems

Essential PySpark for Scalable Data Analytics

Dive into the world of scalable data processing with 'Essential PySpark for Scalable Data Analytics'. This book is a comprehensive guide that helps beginners understand and utilize PySpark to process, analyze, and draw insights from large datasets effectively. With hands-on tutorials and clear explanations, you will gain the confidence to tackle big data analytics challenges. What this Book will help me do Understand and apply the distributed computing paradigm for big data. Learn to perform scalable data ingestion, cleansing, and preparation using PySpark. Create and utilize data lakes and the Lakehouse paradigm for efficient data storage and access. Develop and deploy machine learning models with scalability in mind. Master real-time analytics pipelines and create impactful data visualizations. Author(s) None Nudurupati is an experienced data engineer and educator, specializing in distributed systems and big data technologies. With years of practical experience in the field, None brings a clear and approachable teaching style to technical topics. Passionate about empowering readers, the author has designed this book to be both practical and inspirational for aspiring data practitioners. Who is it for? This book is ideal for data professionals including data scientists, engineers, and analysts looking to scale their data analytics processes. It assumes familiarity with basic data science concepts and Python, as well as some experience with SQL-like data analysis. This is particularly suitable for individuals aiming to expand their knowledge in distributed computing and PySpark to handle big data challenges. Achieving scalable and efficient data solutions is at the core of this guide.

IBM DS8000 and IBM Z Synergy DS8000: Release 9.2 and z/OS 2.5

IBM® Z has a close and unique relationship to its storage. Over the years, improvements to the Z processors and storage software, the disk storage systems, and their communication architecture consistently reinforced this synergy. This IBM Redpaper publication summarizes and highlights the various aspects, advanced functions, and technologies that are often pioneered by IBM, and that make the IBM Z® and the IBM DS8000 products an ideal combination. This paper is intended for users who have some familiarity with IBM Z and the IBM DS8000® series and want a condensed but comprehensive overview of the synergy items up to the IBM z15™ server with z/OS v2.5 and the IBM DS8900 Release 9.2 firmware.

Beginning Hibernate 6: Java Persistence from Beginner to Pro

Get started with Hibernate, an open source Java persistence layer and gain a clear introduction to the current standard for object-relational persistence in Java. This updated edition includes the new Hibernate 6.0 framework which covers new configuration, new object relational mapping changes, and enhanced integration with the more general Spring, Boot and Quarkus and other Java frameworks.The book keeps its focus on Hibernate without wasting time on nonessential third-party tools, so you’ll be able to immediately start building transaction-based engines and applications. Experienced authors Joseph Ottinger with Dave Minter and Jeff Linwood provide more in-depth examples than any other book for Hibernate beginners. They present their material in a lively, example-based manner—not a dry, theoretical, hard-to-read fashion. What You'll Learn Build enterprise Java-based transaction-type applications that access complex data with Hibernate Work with Hibernate 6 using a present-day build process Integrate into the persistence life cycle Search and query with the new version of Hibernate Keep track of versioned data with Hibernate Envers Who This Book Is For Programmers experienced in Java with databases (the traditional, or connected, approach), but new to open-source, lightweight Hibernate.

Optimize Video Streaming Delivery

Media content today is increasingly streamed video, and this trend will only grow as the speed of consumer internet and video quality improve. Traditional video streaming platforms, such as Netflix and Hulu, now account for only a portion of this content as more and more live events are streamed over the internet. And consumer-generated content on video-based social networks such as Twitch and TikTok is now more accessible and gaining popularity. This report focuses on the current state of video delivery, including the challenges content providers face and the various solutions they're pursuing. The findings in this report are based on a recent survey conducted by Edgecast, a content delivery network (CDN) that helps companies accelerate and deliver static and dynamic content to end users around the world. You'll explore: The current state of video streaming, how it works, and how streams are delivered Responses from a survey of CDN users that produce video streams How content providers are addressing recent video streaming challenges How the information in this report can help you identify KPIs

Data Engineering with Apache Spark, Delta Lake, and Lakehouse

Data Engineering with Apache Spark, Delta Lake, and Lakehouse is a comprehensive guide packed with practical knowledge for building robust and scalable data pipelines. Throughout this book, you will explore the core concepts and applications of Apache Spark and Delta Lake, and learn how to design and implement efficient data engineering workflows using real-world examples. What this Book will help me do Master the core concepts and components of Apache Spark and Delta Lake. Create scalable and secure data pipelines for efficient data processing. Learn best practices and patterns for building enterprise-grade data lakes. Discover how to operationalize data models into production-ready pipelines. Gain insights into deploying and monitoring data pipelines effectively. Author(s) None Kukreja is a seasoned data engineer with over a decade of experience working with big data platforms. He specializes in implementing efficient and scalable data solutions to meet the demands of modern analytics and data science. Writing with clarity and a practical approach, he aims to provide actionable insights that professionals can apply to their projects. Who is it for? This book is tailored for aspiring data engineers and data analysts who wish to delve deeper into building scalable data platforms. It is suitable for those with basic knowledge of Python, Spark, and SQL, and seeking to learn Delta Lake and advanced data engineering concepts. Readers should be eager to develop practical skills for tackling real-world data engineering challenges.

Enhanced Cyber Resilience Threat Detection with IBM FlashSystem Safeguarded Copy and IBM QRadar

The focus of this document is to demonstrate an early threat detection by using IBM® QRadar® and the Safeguarded Copy feature that is available as part of IBM FlashSystem® and IBM SAN Volume Controller. Such early detection protects and quickly recovers the data if a cyberattack occurs. This document describes integrating IBM FlashSystem audit logs with IBM QRadar, and the configuration steps for IBM FlashSystem and IBM QRadar. It also explains how to use the IBM QRadar's device support module (DSM) editor to normalize events and assign IBM QRadar identifier (QID) map to the events. Post IBM QRadar configuration, we review configuring Safeguarded Copy on the application volumes by using volume groups and applying Safeguarded backup polices on the volume group. Finally, we demonstrate the use of orchestration software IBM Copy Services Manager to start a recovery, restore operations for data restoration on online volumes, and start a backup of data volumes.

IBM Spectrum Protect Plus Protecting Database Applications

IBM® Spectrum Protect Plus is a data protection solution that provides near-instant recovery, replication, retention management, and reuse for virtual machines, databases, and application backups in hybrid multicloud environments. This IBM Redpaper publication focuses on protecting database applications. IBM Spectrum® Protect Plus supports backup, restore, and data reuse for multiple databases, such as Oracle, IBM Db2®, MongoDB, Microsoft Exchange, and Microsoft SQL Server. Although other IBM Spectrum Protect Plus features focus on virtual environments, the database and application support of IBM Spectrum Protect Plus includes databases on virtual physical servers.

IBM FlashSystem Best Practices and Performance Guidelines

This IBM® Redbooks® publication captures several of the preferred practices and describes the performance gains that can be achieved by implementing the IBM FlashSystem products. These practices are based on field experience. This book highlights configuration guidelines and preferred practices for the storage area network (SAN) topology, clustered system, back-end storage, storage pools and managed disks, volumes, remote copy services, and hosts. It explains how you can optimize disk performance with the IBM System Storage® Easy Tier® function. It also provides preferred practices for monitoring, maintaining, and troubleshooting. This book is intended for experienced storage, SAN, IBM FlashSystem, SAN Volume Controller (SVC), and IBM Storwize® administrators and technicians. Understanding this book requires advanced knowledge of these environments.

Storage Systems

Storage Systems: Organization, Performance, Coding, Reliability and Their Data Processing was motivated by the 1988 Redundant Array of Inexpensive/Independent Disks proposal to replace large form factor mainframe disks with an array of commodity disks. Disk loads are balanced by striping data into strips—with one strip per disk— and storage reliability is enhanced via replication or erasure coding, which at best dedicates k strips per stripe to tolerate k disk failures. Flash memories have resulted in a paradigm shift with Solid State Drives (SSDs) replacing Hard Disk Drives (HDDs) for high performance applications. RAID and Flash have resulted in the emergence of new storage companies, namely EMC, NetApp, SanDisk, and Purestorage, and a multibillion-dollar storage market. Key new conferences and publications are reviewed in this book.The goal of the book is to expose students, researchers, and IT professionals to the more important developments in storage systems, while covering the evolution of storage technologies, traditional and novel databases, and novel sources of data. We describe several prototypes: FAWN at CMU, RAMCloud at Stanford, and Lightstore at MIT; Oracle's Exadata, AWS' Aurora, Alibaba's PolarDB, Fungible Data Center; and author's paper designs for cloud storage, namely heterogeneous disk arrays and hierarchical RAID. Surveys storage technologies and lists sources of data: measurements, text, audio, images, and video Familiarizes with paradigms to improve performance: caching, prefetching, log-structured file systems, and merge-trees (LSMs) Describes RAID organizations and analyzes their performance and reliability Conserves storage via data compression, deduplication, compaction, and secures data via encryption Specifies implications of storage technologies on performance and power consumption Exemplifies database parallelism for big data, analytics, deep learning via multicore CPUs, GPUs, FPGAs, and ASICs, e.g., Google's Tensor Processing Units

Snowflake Security: Securing Your Snowflake Data Cloud

This book is your complete guide to Snowflake security, covering account security, authentication, data access control, logging and monitoring, and more. It will help you make sure that you are using the security controls in a right way, are on top of access control, and making the most of the security features in Snowflake. Snowflake is the fastest growing cloud data warehouse in the world, and having the right methodology to protect the data is important both to data engineers and security teams. It allows for faster data enablement for organizations, as well as reducing security risks, meeting compliance requirements, and solving data privacy challenges. There are currently tens of thousands of people who are either data engineers/data ops in Snowflake-using organizations, or security people in such organizations. This book provides guidance when you want to apply certain capabilities, such as data masking, row-level security, column-level security, tackling rolehierarchy, building monitoring dashboards, etc., to your organizations. What You Will Learn Implement security best practices for Snowflake Set up user provisioning, MFA, OAuth, and SSO Set up a Snowflake security model Design roles architecture Use advanced access control such as row-based security and dynamic masking Audit and monitor your Snowflake Data Cloud Who This Book Is For Data engineers, data privacy professionals, and security teams either with security knowledge (preferably some data security knowledge) or with data engineering knowledge; in other words, either “Snowflake people” or “data people” who want to get security right, or “security people” who want to make sure that Snowflake gets handled right in terms of security