talk-data.com talk-data.com

Topic

data-engineering

3395

tagged

Activity Trend

1 peak/qtr
2020-Q1 2026-Q1

Activities

3395 activities · Newest first

PROC SQL, 3rd Edition

PROC SQL: Beyond the Basics Using SAS®, Third Edition, is a step-by-step, example-driven guide that helps readers master the language of PROC SQL. Packed with analysis and examples illustrating an assortment of PROC SQL options, statements, and clauses, this book not only covers all the basics, but it also offers extensive guidance on complex topics such as set operators and correlated subqueries. Programmers at all levels will appreciate Kirk Lafler’s easy-to-follow examples, clear explanations, and handy tips to extend their knowledge of PROC SQL. This third edition explores new and powerful features in SAS® 9.4, including topics such as: IFC and IFN functions nearest neighbor processing the HAVING clause indexes It also features two completely new chapters on fuzzy matching and data-driven programming. Delving into the workings of PROC SQL with greater analysis and discussion, PROC SQL: Beyond the Basics Using SAS®, Third Edition, explores this powerful database language using discussion and numerous real-world examples.

PySpark SQL Recipes: With HiveQL, Dataframe and Graphframes

Carry out data analysis with PySpark SQL, graphframes, and graph data processing using a problem-solution approach. This book provides solutions to problems related to dataframes, data manipulation summarization, and exploratory analysis. You will improve your skills in graph data analysis using graphframes and see how to optimize your PySpark SQL code. PySpark SQL Recipes starts with recipes on creating dataframes from different types of data source, data aggregation and summarization, and exploratory data analysis using PySpark SQL. You’ll also discover how to solve problems in graph analysis using graphframes. On completing this book, you’ll have ready-made code for all your PySpark SQL tasks, including creating dataframes using data from different file formats as well as from SQL or NoSQL databases. What You Will Learn Understand PySpark SQL and its advanced features Use SQL and HiveQL with PySpark SQL Work with structured streaming Optimize PySpark SQL Master graphframes and graph processing Who This Book Is For Data scientists, Python programmers, and SQL programmers.

The Enterprise Big Data Lake

The data lake is a daring new approach for harnessing the power of big data technology and providing convenient self-service capabilities. But is it right for your company? This book is based on discussions with practitioners and executives from more than a hundred organizations, ranging from data-driven companies such as Google, LinkedIn, and Facebook, to governments and traditional corporate enterprises. You’ll learn what a data lake is, why enterprises need one, and how to build one successfully with the best practices in this book. Alex Gorelik, CTO and founder of Waterline Data, explains why old systems and processes can no longer support data needs in the enterprise. Then, in a collection of essays about data lake implementation, you’ll examine data lake initiatives, analytic projects, experiences, and best practices from data experts working in various industries. Get a succinct introduction to data warehousing, big data, and data science Learn various paths enterprises take to build a data lake Explore how to build a self-service model and best practices for providing analysts access to the data Use different methods for architecting your data lake Discover ways to implement a data lake from experts in different industries

Mastering Ceph - Second Edition

Mastering Ceph is your comprehensive guide to understanding and deploying Ceph for scalable storage solutions. From planning and design to advanced disaster recovery practices, this book equips you with practical knowledge and hands-on techniques to harness the power of Ceph effectively. What this Book will help me do Design and deploy scalable Ceph clusters tailored to your needs. Optimize Ceph's performance with state-of-the-art tuning techniques. Implement effective disaster recovery strategies for robust storage systems. Extend Ceph's functionality with programming using Librados. Troubleshoot and maintain Ceph to ensure reliability and performance. Author(s) None Fisk is a recognized expert in storage infrastructure. With years of hands-on experience with Ceph and storage systems, None has been involved in numerous successful deployments and performance optimizations. Drawing from real-world scenarios, the author's insights make this guide invaluable for professionals. Who is it for? This book is tailored for storage administrators, cloud engineers, and system administrators aiming to enhance their expertise in storage technologies. Whether you're new to Ceph or looking to deepen your knowledge, the clear examples and practical advice make it a perfect pick.

Walmart and the CICS Asynchronous API: An Adoption Experience

Abstract This IBM® Redbooks® publication discusses practical uses of the IBM CICS asynchronous API capability. It describes the methodology, design and thought process used by a large client, Walmart, and the considerations of the choices made. The Redbooks publication provides real life examples and application patterns that benefit from the performance and scalability offered by the new API. The book discusses the homegrown methodology used by Walmart before the API was available and compares it with the design using the new API. A discussion of the process used to migrate older applications to begin using the new API is included so the reader will understand the ease of implementing the new API. A description of real world usage patterns describes the current production application Walmart has deployed as well as other patterns to give the reader a sense of what's possible applying creative thinking with technology improvements. Finally, a section is included on the areas to be considered as you begin to plan and implement asynchronous API capabilities. This book should be read by: Enterprise Architects searching for faster ways to service strategic applications across the enterprise. Solution Architects who want to better understand implementation possibilities for improved response times and better performance for CICS applications. CICS programmers looking to modernize and provide improved response times.

Mastering Hadoop 3

"Mastering Hadoop 3" is your in-depth guide to understanding and mastering the advanced features of the Hadoop ecosystem. With a focus on distributed computing and data processing, this book covers essential tools such as YARN, MapReduce, and Apache Spark to help you build scalable, efficient data pipelines. What this Book will help me do Gain a comprehensive understanding of Hadoop Distributed File System (HDFS) and YARN for effective resource management. Master data processing with MapReduce and learn to integrate with real-time processing engines like Spark and Flink. Develop and secure enterprise-grade Hadoop-based data pipelines by implementing robust security and governance measures. Explore techniques for batch data processing, data modeling, and designing applications tailored for Hadoop environments. Understand best practices for optimizing and troubleshooting Hadoop clusters for enhanced performance and reliability. Author(s) The authors, including None Wong, None Singh, and None Kumar, bring together years of experience in big data engineering, distributed systems, and enterprise application development. They aim to provide a clear pathway to mastering Hadoop ecosystem tools. Who is it for? This book is ideal for budding big data professionals who have some familiarity with Java and basic Hadoop concepts and wish to elevate their expertise. If you're a Hadoop career practitioner keen to expand your understanding of the ecosystem's advanced capabilities or a professional looking to implement Hadoop in organizational workflows, this book is well-suited for you.

SAP Business Intelligence Quick Start Guide

This book is your practical guide to understanding and using the SAP BusinessObjects Business Intelligence (BI) Platform. Through hands-on examples and clear instructions, you'll learn how to create insightful data visualizations, manage business intelligence reports, and deploy and maintain the BI platform effectively, empowering better data-driven decision making. What this Book will help me do Learn how to use SAP Web Intelligence to develop insightful dashboards and reports. Understand the use of SAP Crystal Reports for Enterprise in creating detailed analytics. Gain proficiency in SAP Lumira for advanced data visualization and storytelling. Learn to configure and deploy the SAP BusinessObjects BI platform in a business environment. Develop skills in using SAP Predictive Analytics to perform advanced data analysis capabilities. Author(s) Vinay Singh brings significant expertise in data analysis and the SAP BusinessObjects platform. With years of experience implementing and consulting on SAP solutions across industries, Vinay offers a unique ability to demystify complex technical subjects for readers. His practical approach and commitment to empowering readers make his book a valuable learning resource. Who is it for? This book is ideal for Business Intelligence professionals seeking to explore advanced tools for data analysis. It caters to SAP users eager to expand their expertise in leveraging SAP BusinessObjects for improved decision-making capabilities. It serves IT consultants and data analysts wishing to gain deeper insights into deployment and utilization strategies. Appropriate for beginners with foundational understanding of BI principles aiming to learn a globally recognized BI tool.

IBM DS8880 Architecture and Implementation (Release 8.51)

Abstract * Updated for R8.51 * This IBM® Redbooks® publication describes the concepts, architecture, and implementation of the IBM DS8880 family. The book provides reference information to assist readers who need to plan for, install, and configure the DS8880 systems. The IBM DS8000® family is a high-performance, high-capacity, highly secure, and resilient series of disk storage systems. The DS8880 family is the latest and most advanced of the DS8000 offerings to date. The high availability, multiplatform support, including IBM Z, and simplified management tools help provide a cost-effective path to an on-demand and cloud-based infrastructures. The IBM DS8880 family now offers business-critical, all-flash, and hybrid data systems that span a wide range of price points: DS8882F: Rack Mounted storage system DS8884: Business Class DS8886: Enterprise Class DS8888: Analytics Class The DS8884 and DS8886 are available as either hybrid models, or can be configured as all-flash. Each model represents the most recent in this series of high-performance, high-capacity, flexible, and resilient storage systems. These systems are intended to address the needs of the most demanding clients. Two powerful IBM POWER8® processor-based servers manage the cache to streamline disk I/O, maximizing performance and throughput. These capabilities are further enhanced with the availability of the second generation of high-performance flash enclosures (HPFEs Gen-2) and newer flash drives. Like its predecessors, the DS8880 supports advanced disaster recovery (DR) solutions, business continuity solutions, and thin provisioning. All disk drives in the DS8880 storage system include the Full Disk Encryption (FDE) feature. The DS8880 can automatically optimize the use of each storage tier, particularly flash drives, by using the IBM Easy Tier® feature. Release 8.5 introduces the Safeguarded Copy feature. The DS8882F Rack Mounted is decribed in a separate publication, Introducing the IBM DS8882F Rack Mounted Storage System, REDP-5505.

IBM FlashSystem A9000 and A9000R Architecture and Implementation (Version 12.3.1)

Abstract * Version 12.3.1 * This IBM® Redbooks publication presents the architecture, design, concepts, and technology that are used in IBM FlashSystem® A9000 and IBM FlashSystem A9000R. FlashSystem A9000 and FlashSystem A9000R deliver the microsecond latency and high availability of IBM FlashCore® technology with grid architecture, simple scalability, and industry-leading IBM software that is designed to drive your business into the cognitive era. The Hyper-Scale Manager highly intuitive user interface simplifies management. Comprehensive data reduction capabilities, including inline deduplication and a powerful compression engine, help lower total cost of ownership. With software version 12.3.1 and Hyper-Scale Manager version 5.5.1 (or later) the system can compute reclaimable and attributed capacity information, without performance impact. From a functional standpoint, FlashSystem A9000 and FlashSystem A9000R take advantage of most of the software-defined storage features that are offered by the IBM Spectrum™ Accelerate software, including multi-tenancy and business continuity functions. FlashSystem A9000 and FlashSystem A9000R supports HyperSwap and Multi-site High Availabilty / Disaster Recovery (HA/DR) configurations. This publication is intended for those individuals who need to plan, install, tailor, and configure FlashSystem A9000 and FlashSystem A9000R. For detailed information about configuration, management, and replication functions and their usage, see the following publications: , SG24-8376 IBM Spectrum Accelerate Family Storage Configuration and Usage for IBM FlashSystem A9000, IBM FlashSystem A9000R, and IBM XIV Gen3 , REDP-5401 IBM FlashSystem A9000 and A9000R Business Continuity Solutions , REDP-5434 IBM HyperSwap and Multi-site HA/DR solution for IBM FlashSystem A9000 and A9000R , SG24-8368. IBM Spectrum Accelerate Family: Host Attachment and Interoperability

IBM Elastic Storage Server Implementation Guide for Version 5.3

This IBM® Redpaper™ publication introduces and describes the IBM Elastic Storage™ Server as a scalable, high-performance data and file management solution. The solution is built on proven IBM Spectrum™ Scale technology, formerly IBM General Parallel File System (GPFS™). IBM Elastic Storage Servers can be implemented for a range of diverse requirements, providing reliability, performance, and scalability. This publication helps you to understand the solution and its architecture and helps you to plan the installation and integration of the environment. The following combination of physical and logical components are required: Hardware Operating system Storage Network Applications This paper provides guidelines for several usage and integration scenarios. Typical scenarios include Cluster Export Services (CES) integration, disaster recovery, and multicluster integration. This paper addresses the needs of technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) who must deliver cost-effective cloud services and big data solutions.

Using the IBM Spectrum Accelerate Family in VMware Environments: IBM XIV, IBM FlashSystem A9000 and IBM FlashSystem A9000R, and IBM Spectrum Accelerate

This IBM® Redpaper™ publication is a brief overview of synergistic aspects between various VMware offerings and the IBM Spectrum™ Accelerate family, including IBM XIV® and IBM FlashSystem® A9000 and IBM FlashSystem A9000R servers. After reviewing different integration concepts and explaining general implementation aspects for attaching the IBM Spectrum Accelerate™ family to VMware ESXi deployments, the paper focuses on components that are enabled by IBM Spectrum Connect v3.4. This paper is intended for planning to use or implementing the IBM Spectrum Accelerate family of storage systems in a VMware environment.

Advanced MySQL 8

Dive into the world of MySQL 8.0 with this comprehensive guide, tailored for professionals seeking to optimize and expand their database capabilities. You will master techniques to improve performance, scalability, and security in your database applications, making them robust and efficient. What this Book will help me do Learn to implement and analyze large queries efficiently in MySQL 8.0. Gain insights into optimizing performance by leveraging MySQL indexing and settings. Understand advanced replication techniques, including Group Replication and its applications in InnoDB clusters. Master the essentials of database monitoring and managing large distributed instances. Explore methods for backup, recovery, and enhancing data security within MySQL. Author(s) None Vanier, None Shah, and None Malepati are seasoned database experts with extensive experience in MySQL and database management. They have worked on scaling enterprise-level database applications, focusing on improving performance and reliability. They bring real-world insights and a clear, practical approach to this book, making it an invaluable resource for developers and administrators alike. Who is it for? This book is perfect for database administrators, developers, and system architects who already have a foundational understanding of MySQL and are looking to deepen their expertise. If you're someone interested in enhancing database application performance, mastering advanced techniques, or handling distributed databases and scaling challenges, this book will surely cater to your professional aspirations.

Apache Spark Quick Start Guide

Dive into the world of scalable data processing with the "Apache Spark Quick Start Guide." This book offers a foundational introduction to Spark, empowering readers to harness its capabilities for big data processing. With clear explanations and hands-on examples, you'll learn to implement Spark applications that handle complex data tasks efficiently. What this Book will help me do Understand and implement Spark's RDDs and DataFrame APIs to process large datasets effectively. Set up a local development environment for Spark-based projects. Develop skills to debug and optimize slow-performing Spark applications. Harness built-in modules of Spark for SQL, streaming, and machine learning applications. Adopt best practices and optimization techniques for high-performance Spark applications. Author(s) Shrey Mehrotra is a seasoned software developer with expertise in big data technologies, particularly Apache Spark. With years of hands-on industry experience, Shrey focuses on making complex technical concepts accessible to all. Through his writing, he aims to share clear, practical guidance for developers of all levels. Who is it for? This guide is perfect for big data enthusiasts and professionals looking to learn Apache Spark's capabilities from scratch. It's aimed at data engineers interested in optimizing application performance and data scientists wanting to integrate machine learning with Spark. A basic familiarity with either Scala, Python, or Java is recommended.

Ceph: Designing and Implementing Scalable Storage Systems

Get to grips with the unified, highly scalable distributed storage system and learn how to design and implement it. Key Features Explore Ceph's architecture in detail Implement a Ceph cluster successfully and gain deep insights into its best practices Leverage the advanced features of Ceph, including erasure coding, tiering, and BlueStore Book Description This Learning Path takes you through the basics of Ceph all the way to gaining in-depth understanding of its advanced features. You'll gather skills to plan, deploy, and manage your Ceph cluster. After an introduction to the Ceph architecture and its core projects, you'll be able to set up a Ceph cluster and learn how to monitor its health, improve its performance, and troubleshoot any issues. By following the step-by-step approach of this Learning Path, you'll learn how Ceph integrates with OpenStack, Glance, Manila, Swift, and Cinder. With knowledge of federated architecture and CephFS, you'll use Calamari and VSM to monitor the Ceph environment. In the upcoming chapters, you'll study the key areas of Ceph, including BlueStore, erasure coding, and cache tiering. More specifically, you'll discover what they can do for your storage system. In the concluding chapters, you will develop applications that use Librados and distributed computations with shared object classes, and see how Ceph and its supporting infrastructure can be optimized. By the end of this Learning Path, you'll have the practical knowledge of operating Ceph in a production environment. This Learning Path includes content from the following Packt products: Ceph Cookbook by Michael Hackett, Vikhyat Umrao and Karan Singh Mastering Ceph by Nick Fisk Learning Ceph, Second Edition by Anthony D'Atri, Vaibhav Bhembre and Karan Singh What you will learn Understand the benefits of using Ceph as a storage solution Combine Ceph with OpenStack, Cinder, Glance, and Nova components Set up a test cluster with Ansible and virtual machine with VirtualBox Develop solutions with Librados and shared object classes Configure BlueStore and see its interaction with other configurations Tune, monitor, and recover storage systems effectively Build an erasure-coded pool by selecting intelligent parameters Who this book is for If you are a developer, system administrator, storage professional, or cloud engineer who wants to understand how to deploy a Ceph cluster, this Learning Path is ideal for you. It will help you discover ways in which Ceph features can solve your data storage problems. Basic knowledge of storage systems and GNU/Linux will be beneficial.

Hands-On Deep Learning with Apache Spark

"Hands-On Deep Learning with Apache Spark" is an essential resource for mastering distributed deep learning frameworks and applications on Apache Spark. Through practical examples and guided tutorials, this book teaches you to deploy scalable deep learning solutions for handling complex data challenges efficiently. What this Book will help me do Understand how to set up Apache Spark for deep learning workflows. Gain practical insight into implementing neural networks, including CNNs and RNNs, on distributed platforms. Learn to train and optimize models using popular frameworks like TensorFlow and Keras. Develop expertise in analyzing large datasets with textual and image-based deep learning methods. Acquire skills to deploy trained models for real-world applications in distributed environments. Author(s) None Iozzia is an accomplished software engineer and data scientist with a strong background in distributed computing and machine learning. With years of experience working with Apache Spark and deep learning technologies, None brings a wealth of practical knowledge to the table. Their passion for providing clear, hands-on guidance makes this book an approachable and valuable resource for learners of all levels. Who is it for? This book is aimed at Scala developers, data scientists, and data analysts who are looking to extend their skill set to include distributed deep learning on Apache Spark. It's ideally suited for readers familiar with machine learning basics and those with prior exposure to Apache Spark workflows. If you aim to create scalable machine learning solutions that handle complex data, this book offers precisely what you need.

Learning PostgreSQL 11 - Third Edition

Immerse yourself in the capabilities of PostgreSQL 11 with this comprehensive beginner's guide. Learning PostgreSQL 11 will take you through relational database fundamentals and advanced database functionality, empowering you to build efficient and scalable database solutions with confidence. By the end of this book, you'll have mastery over PostgreSQL's features to develop, manage, and optimize your own databases. What this Book will help me do Gain a solid understanding of relational database principles and the PostgreSQL ecosystem. Learn to install PostgreSQL, create a database, and design a data model effectively. Develop skills to create, manipulate, and optimize tables, views, and efficient indexes. Utilize server-side programming with PL/pgSQL and advanced data types like JSONB. Enhance database reliability and performance, and connect to your Python applications seamlessly. Author(s) Christopher Travers and None Volkov bring their collective expertise and practical experience to this book. Christopher has a strong background in software development and database systems, with years of hands-on involvement with PostgreSQL. None has contributed significantly to innovative database solutions, emphasizing clear and actionable instructions. Together, they aim to demystify PostgreSQL for learners of all backgrounds. Who is it for? This book is crafted for developers, database administrators, and tech enthusiasts who want to delve into PostgreSQL. Beginners with no prior database experience will find its approach accessible, while those aiming to enhance their skills with PostgreSQL's latest features will benefit immensely. It's ideal for anyone seeking to build solid database or data warehousing applications with modern capabilities and best practices.

QGIS Quick Start Guide

QGIS Quick Start Guide is your hands-on introduction to QGIS 3.4, an open-source Geographic Information System (GIS) software. Through step-by-step instructions, you'll explore creating, loading, and styling geospatial data while developing the skills to generate professional-grade maps and analyses. What this Book will help me do Understand and navigate QGIS 3.4's interface and core functionality. Import and manage spatial datasets for analysis and visualization. Create custom-styled vector and raster layers for effective presentation. Design detailed and aesthetically pleasing maps for various uses. Extend QGIS with plugins and automate tasks using modeling tools. Author(s) The author, None Cutts, is a passionate GIS professional with years of experience working with geographic data and related software. With a strong focus on making GIS knowledge accessible, they provide practical advice and workflows in an easy-to-follow manner, helping readers achieve proficiency with a powerful tool like QGIS. Who is it for? This book is perfect for GIS beginners and intermediate-level users looking to develop hands-on skills with QGIS 3.4. If you're a student, geospatial analyst, or enthusiast aiming to master an accessible and robust GIS toolset, this guide aligns with your learning goals. Gain the confidence to analyze geographic data and create compelling maps as part of your studies or projects.

DS8000 Global Mirror Best Practices

This IBM® Redpaper™ publication reviews the architecture and operations of the IBM DS8000® Global Mirror function. The document looks at different aspects of the solution in terms of performance, infrastructure requirements, data integrity, business continuity, and impact on production. Hints and tips are provided on how to best configure the overall Global Mirror environment, in terms of connectivity, storage configuration, and specific parameters tuning. The guidelines that are provided are in general related to performance, which ultimately ensures a better recovery point objective (RPO). Therefore, we encourage you to follow those guidelines.

IBM Spectrum Scale and IBM StoredIQ: Identifying and securing your business data to support regulatory requirements

Having the appropriate storage for hosting business critical data and the proper analytic software for deep inspection of that data is becoming necessary to get deeper insights into the data so that users can categorize which data qualifies for compliance. This IBM® Redpaper™ publication explains why the storage features of IBM Spectrum™ Scale, when combined with the data analysis and categorization features of IBM StoredIQ®, provide an excellent platform for hosting unstructured business data that is subject to regulatory compliance guidelines, such as General Data Protection Regulation (GDPR). In this paper, we describe how IBM StoredIQ can be used to identify files that are stored in an IBM Spectrum Scale™ file system that include personal information, such as phone numbers. These files can be secured in another file system partition by encrypting those files by using IBM Spectrum Scale functions. Encrypting files prevents unauthorized access to those files because only users that can access the encryption key can decrypt those files. This paper is intended for chief technology officers, solution, and security architects and systems administrators.