talk-data.com talk-data.com

Topic

data

5765

tagged

Activity Trend

3 peak/qtr
2020-Q1 2026-Q1

Activities

5765 activities · Newest first

Mastering Ceph

Mastering Ceph offers a comprehensive guide to mastering the Ceph distributed storage system, empowering you to implement and manage scalable storage solutions effectively. As you delve into the chapters, you'll gain the practical experience needed to handle Ceph with confidence, achieve resource optimization, and ensure high availability for critical applications. What this Book will help me do Understand and utilize Ceph's advanced capabilities such as erasure coding and tiering for storage efficiency. Implement and manage scalable and resilient Ceph clusters effectively, easing resource allocation. Use tools like Ansible and Vagrant to deploy Ceph clusters quickly and reproducibly. Enhance your troubleshooting skills to resolve complex storage issues and ensure cluster stability. Develop applications to integrate with Ceph using Librados and distributed computation classes. Author(s) This book was authored by None Fisk, an experienced professional in cloud and distributed storage systems. Known for their expertise in Ceph, None Fisk shares practical insights developed over years of working as an administrator and developer. Through their accessible and systematic writing, they guide readers to overcome real-world storage challenges. Who is it for? This detailed guide is ideal for developers and system administrators familiar with deploying Ceph, who want to deepen their understanding of its advanced features. If you're aiming to optimize performance and design robust storage solutions, this is the book for you. Prior experience with Ceph is recommended to fully benefit from the book's insights.

Mastering PostgreSQL 9.6

This comprehensive guide, 'Mastering PostgreSQL 9.6,' delves into the advanced features of PostgreSQL, equipping you with the skills to optimize queries, manage replication, and ensure high availability. Whether you are implementing advanced administrative tasks or enhancing database performance, this book will provide the tools and knowledge you need. What this Book will help me do Master advanced database functionalities in PostgreSQL 9.6. Enhance your proficiency in optimizing queries and using indexes effectively. Gain expertise in managing replication and ensuring high availability. Develop skills in server maintenance, monitoring, and resilience. Learn effective troubleshooting strategies for PostgreSQL database challenges. Author(s) Hans-Jürgen Schönig is an experienced database professional specializing in PostgreSQL consulting and training. With decades of experience in developing robust solutions, he brings a pragmatic and insightful approach to database management. His emphasis on practical application and clear explanations makes his writing accessible to learners at all levels. Who is it for? This book is ideal for PostgreSQL data architects and administrators looking to deepen their understanding of PostgreSQL's advanced functionalities. It's tailored for readers with prior experience in PostgreSQL administration and a working knowledge of SQL. If you're keen to master complex database tasks and optimize your PostgreSQL usage, you'll find this book invaluable.

Python Web Scraping - Second Edition

"Python Web Scraping" is a practical guide to extracting and processing online data using the Python programming language. With this book, you'll learn step-by-step how to build web scrapers and crawlers that can handle a range of data sources and structures. After reading this, you will be equipped to tackle real-world web scraping challenges effectively. What this Book will help me do Learn how to extract structured data from standard webpages using Python. Gain proficiency with libraries such as Selenium and PyQt for handling dynamic and JavaScript-dependent content. Build concurrent scrapers to efficiently process large volumes of web pages in parallel. Understand and implement form interaction automation for data extraction from complex websites. Develop advanced scrapers using Scrapy to handle sophisticated web crawling tasks. Author(s) None Jarmul is an experienced data scientist and programmer with extensive knowledge in Python. They bring practical expertise from working on real-world web scraping projects. In their work, they focus on creating content that empowers readers by demystifying complex technical topics. Who is it for? This book is perfect for software developers eager to dive into web scraping using Python, even if they're new to the subject. If you have basic to intermediate Python skills and want to automate data collection and processing, this is the book for you. The techniques here are valuable for tackling diverse data extraction scenarios.

Hadoop 2.x Administration Cookbook

Gain mastery over managing and maintaining large Apache Hadoop clusters with the Hadoop 2.x Administration Cookbook. This book provides practical step-by-step recipes guiding you to efficiently set up, optimize, and troubleshoot Hadoop clusters, ensuring high availability, security, and optimal performance in your data operations. What this Book will help me do Successfully set up and deploy an operational Hadoop 2.x cluster suitable for large-scale data operations. Effectively monitor and maintain Hadoop's HDFS, YARN, and MapReduce systems for optimized performance. Plan, configure, and enhance cluster availability using Zookeeper and Journal Node strategies. Develop workflows and manage data ingestion processes with tools like Flume and Oozie. Secure, troubleshoot, and optimize Hadoop environments to meet enterprise and operational standards. Author(s) Aman Singh is an experienced Hadoop administrator with years of hands-on experience managing robust and efficient Hadoop clusters. Aman has a deep understanding of the practical challenges faced in this field and a talent for breaking down complex topics into actionable steps. Through clear, problem-oriented language, Aman helps readers achieve fluency in Hadoop administration. Who is it for? This book is ideal for system administrators or IT professionals who have a foundational understanding of Hadoop and aim to strengthen their administrative skills. It is especially beneficial for experienced Hadoop administrators looking for a quick and practical reference guide to master cluster management. Whether you're working in a large enterprise or exploring Hadoop ecosystems for personal development, you'll find this book invaluable.

Learning Social Media Analytics with R

Explore the intricacies of using R for social media analytics with 'Learning Social Media Analytics with R'. This comprehensive guide introduces readers to tools and techniques to extract, analyze, and visualize data from popular platforms like Twitter and Facebook. Gain insights into advanced methods such as sentiment analysis, topic modeling, and social network analysis. What this Book will help me do Master the art of leveraging R to retrieve, process, and clean data from major social media platforms. Use actionable insights from sentiment analysis and topic modeling to improve decision-making processes. Develop an understanding of social network structures by analyzing community connections and user interactions. Create impactful data visualizations that showcase trends and insights effectively using the R ecosystem. Integrate advanced R packages such as ggplot2, dplyr, and caret to streamline data analysis workflows. Author(s) The authors of this book, None Sarkar, Karthik Ganapathy, Raghav Bali, and None Sharma, are experts in data science and R programming with extensive experience in the industry. They bring a passion for teaching and a clear, step-by-step methodology to help learners grasp complex concepts. Who is it for? This book is ideal for data scientists, analysts, IT professionals, and social media marketers who aim to gain actionable insights from social data. Whether you're a beginner or have some experience with R, this book is highly approachable and beneficial. Readers will find practical examples and comprehensive tutorials tailored for their level of expertise.

High Performance Spark

Apache Spark is amazing when everything clicks. But if you haven’t seen the performance improvements you expected, or still don’t feel confident enough to use Spark in production, this practical book is for you. Authors Holden Karau and Rachel Warren demonstrate performance optimizations to help your Spark queries run faster and handle larger data sizes, while using fewer resources. Ideal for software engineers, data engineers, developers, and system administrators working with large-scale data applications, this book describes techniques that can reduce data infrastructure costs and developer hours. Not only will you gain a more comprehensive understanding of Spark, you’ll also learn how to make it sing. With this book, you’ll explore: How Spark SQL’s new interfaces improve performance over SQL’s RDD data structure The choice between data joins in Core Spark and Spark SQL Techniques for getting the most out of standard RDD transformations How to work around performance issues in Spark’s key/value pair paradigm Writing high-performance Spark code without Scala or the JVM How to test for functionality and performance when applying suggested improvements Using Spark MLlib and Spark ML machine learning libraries Spark’s Streaming components and external community packages

IBM PowerHA SystemMirror V7.2.1 for IBM AIX Updates

Abstract This IBM® Redbooks® publication helps strengthen the position of the IBM PowerHA® SystemMirror® solution with a well-defined and documented deployment models within an IBM Power Systems™ virtualized environment, which provides customers with a planned foundation for business resilience and disaster recovery for their IBM Power Systems infrastructure solutions. This publication addresses topics to help meet customers' complex high availability and disaster recovery requirements on IBM Power Systems servers to help maximize their systems' availability and resources, and provide technical documentation to transfer the how-to-skills to users and support teams. This book is targeted at technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) who are responsible for providing high availability and disaster recovery solutions and support with IBM PowerHA SystemMirror Standard and Enterprise Editions on IBM Power Systems servers.

Oracle on IBM z Systems

Abstract Oracle Database 12c Release 1 running on Linux is available for deployment on IBM® z Systems®. The enterprise-grade Linux on IBM z Systems solution is designed to add value to Oracle Database solutions, including the new functions that are introduced in Oracle Database 12c. In this IBM Redbooks® publication, we explore the IBM and Oracle Alliance and describe how Oracle Database benefits from IBM z Systems®. We then explain how to set up Linux guests to install Oracle Database 12c. We also describe how to use the Oracle Enterprise Manager Cloud Control Agent to manage Oracle Database 12c Release 1. We also describe a successful consolidation project from sizing to migration, performance management topics, and high availability. Finally, we end with a chapter about surrounding Oracle with Open Source software. The audience for this publication includes database consultants, installers, administrators, and system programmers. This publication is not meant to replace Oracle documentation, but to supplement it with our experiences while installing and using Oracle products.

Practical Statistics for Data Scientists

Statistical methods are a key part of of data science, yet very few data scientists have any formal statistics training. Courses and books on basic statistics rarely cover the topic from a data science perspective. This practical guide explains how to apply various statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what's important and what's not. Many data science resources incorporate statistical methods but lack a deeper statistical perspective. If you’re familiar with the R programming language, and have some exposure to statistics, this quick reference bridges the gap in an accessible, readable format. With this book, you’ll learn: Why exploratory data analysis is a key preliminary step in data science How random sampling can reduce bias and yield a higher quality dataset, even with big data How the principles of experimental design yield definitive answers to questions How to use regression to estimate outcomes and detect anomalies Key classification techniques for predicting which categories a record belongs to Statistical machine learning methods that “learn” from data Unsupervised learning methods for extracting meaning from unlabeled data

Implementing the IBM System Storage SAN Volume Controller with IBM Spectrum Virtualize V7.8

Abstract This IBM® Redbooks® publication is a detailed technical guide to the IBM System Storage® SAN Volume Controller, which is powered by IBM Spectrum Virtualize™ Version 7.8. IBM SAN Volume Controller is a virtualization appliance solution, which maps virtualized volumes that are visible to hosts and applications to physical volumes on storage devices. Each server within the storage area network (SAN) has its own set of virtual storage addresses that are mapped to physical addresses. If the physical addresses change, the server continues running by using the same virtual addresses that it had before. Therefore, volumes or storage can be added or moved while the server is still running. The IBM virtualization technology improves the management of information at the "block" level in a network, which enables applications and servers to share storage devices on a network.

Breaking Data Science Open

Over the past decade, data science has come out of the back office to become a force of change across the entire organization. At the forefront of this change is the open data science movement that advocates the use of open source tools in a powerful, connected ecosystem. This report explores how open data science can help your organization break free from the shackles of proprietary tools, embrace a more open and collaborative work style, and unleash new intelligent applications quickly. Authors Michele Chambers and Christine Doig explain how open source tools have helped bring about many facets of the data science evolution, including collaboration, self-service, and deployment. But you’ll discover that open data science is about more than tools; it’s about a new way of working as an organization. Learn how data science—particularly open data science—has become part of everyday business Understand how open data science engages people from other disciplines, not just statisticians Examine tools and practices that enable data science to be open across technical, operational, and organizational aspects Learn benefits of open data science, including rich resources, agility, transparency, and collective intelligence Explore case studies that demonstrate different ways to implement open data science Discover how open data science can help you break down department barriers and make bold market moves Michele Chambers, Chief Marketing Officer and VP Products at Continuum Analytics, is an entrepreneurial executive with over 25 years of industry experience. Prior to Continuum Analytics, Michele held executive leadership roles at several database and analytic companies, including Netezza, IBM, Revolution Analytics, MemSQL, and RapidMiner. Christine Doig is a senior data scientist at Continuum Analytics, where she's worked on several projects, including MEMEX, a DARPA-funded open data science project to help stop human trafficking. She has 5+ years of experience in analytics, operations research, and machine learning in a variety of industries.

POWER8 High-performance Computing Guide IBM Power System S822LC (8335-GTB) Edition

Abstract This IBM® Redbooks® publication documents and addresses topics to provide step-by-step customizable application and programming solutions to tune application and workloads to use IBM Power Systems™ hardware architecture. This publication explores, tests, and documents the solution to use the architectural technologies and the software solutions that are available from IBM to help solve challenging technical and business problems. This publication also demonstrates and documents that the combination of IBM high-performance computing (HPC) solutions (hardware and software) delivers significant value to technical computing clients who are in need of cost-effective, highly scalable, and robust solutions. First, the book provides a high-level overview of the HPC solution, including all of the components that makes the HPC cluster: IBM Power System S822LC (8335-GTB), software components, interconnect switches, and the IBM Spectrum™ Scale parallel file system. Then, the publication is divided in three parts: Part 1 focuses on the developers, Part 2 focuses on the administrators, and Part 3 focuses on the evaluators and planners of the solution. The IBM Redbooks publication is targeted toward technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) who are responsible for delivering cost-effective HPC solutions that help uncover insights from vast amounts of client’s data so they can optimize business results, product development, and scientific discoveries.

IBM DB2 Web Query for i: The Nuts and Bolts

Abstract Business Intelligence (BI) is a broad term that relates to applications that analyze data to understand and act on the key metrics that drive profitability in an enterprise. Key to analyzing that data is providing fast, easy access to it while delivering it in formats or tools that best fit the needs of the user. At the core of any BI solution are user query and reporting tools that provide intuitive access to data supporting a spectrum of users from executives to “power users,” from spreadsheet aficionados to the external Internet consumer. IBM® DB2® Web Query for i offers a set of modernized tools for a more robust, extensible, and productive reporting solution than the popular IBM Query for System i® tool (also known as IBM Query/400). IBM DB2 Web Query for i preserves investments in the reports that are developed with Query/400 by offering a choice of importing definitions into the new technology or continuing to run existing Query/400 reports as is. But, it also offers significant productivity and performance enhancements by leveraging the latest in DB2 for i query optimization technology. The DB2 Web Query for i product is a web-based query and report writing product that offers enhanced capabilities over the IBM Query for iSeries product (also commonly known as Query/400). IBM DB2 Web Query for i includes Query for iSeries technology to assist customers in their transition to DB2 Web Query. It offers a more modernized, Java based solution for a more robust, extensible, and productive reporting solution. DB2 Web Query provides the ability to query or build reports against data that is stored in DB2 for i (or Microsoft SQL Server) databases through browser-based user interface technologies: Build reports with ease through the web-based, ribbon-like InfoAssist tool that leverages a common look and feel that can extend the number of personnel that can generate their own reports. Simplify the management of reports by significantly reducing the number of report definitions that are required through the use of parameter driven reports. Deliver data to users in many different formats, including directly into spreadsheets, or in boardroom-quality PDF format, or viewed from the browser in HTML. Leverage advanced reporting functions, such as matrix reporting, ranking, color coding, drill-down, and font customization to enhance the visualization of DB2 data. DB2 Web Query offers features to import Query/400 definitions and enhance their look and functions. By using it, you can add OLAP-like slicing and dicing to the reports or view reports in disconnected mode for users on the go. This IBM Redbooks® publication provides a broad understanding of what can be done with the DB2 Web Query product. This publication is a companion of DB2 Web Query Tutorials, SG24-8378, which has a group of self-explanatory tutorials to help you get up to speed quickly.

Oracle on LinuxONE

Abstract Oracle Database 12c Release 1 running on Linux is available for deployment on IBM® LinuxONE. The enterprise-grade Linux on LinuxONE solution is designed to add value to Oracle Database solutions, including the new functions that are introduced in Oracle Database 12c. In this IBM Redbooks® publication, we explore the IBM and Oracle Alliance and describe how Oracle Database benefits from LinuxONE. We then explain how to set up Linux guests to install Oracle Database 12c. We also describe how to use the Oracle Enterprise Manager Cloud Control Agent to manage Oracle Database 12c Release 1. We also describe a successful consolidation project from sizing to migration, performance management topics, and high availability. Finally, we end with a chapter about surrounding Oracle with Open Source software. The audience for this publication includes database consultants, installers, administrators, and system programmers. This publication is not meant to replace Oracle documentation, but to supplement it with our experiences while installing and using Oracle products.

Geographic Information Systems in Action

TRY (FREE for 14 days), OR RENT this title offers content that not only teaches GIS techniques, the ideas behind them, and how they work, but also—through a series of graded, hands-on content oriented activities--challenges students to think through what they are doing and why before going on to practical ArcGIS exercises. This deeper understanding, and the superior problem-solving skills students gain from using the text, will also make them highly valuable employees, in addition to well-informed students. : www.wileystudentchoice.com Geographic Information Systems in Action , 1st Edition

Exam Ref 70-768 Developing SQL Data Models, First Edition

Prepare for Microsoft Exam 70-768–and help demonstrate your real-world mastery of Business Intelligence (BI) solutions development with SQL Server 2016 Analysis Services (SSAS), including modeling and queries. Designed for experienced IT professionals ready to advance their status, Exam Ref focuses on the critical thinking and decision-making acumen needed for success at the MCSA level. Focus on the expertise measured by these objectives: • Design a multidimensional BI semantic model • Design a tabular BI semantic model • Develop queries using Multidimensional Expressions (MDX) and Data Analysis Expressions (DAX) • Configure and maintain SSAS This Microsoft Exam Ref: • Organizes its coverage by exam objectives • Features strategic, what-if scenarios to challenge you • Assumes you are a database or BI professional with experience creating models, writing MDX or DAX queries, and using SSAS

Oracle Application Express: Build Powerful Data-Centric Web Apps with APEX

This Oracle Press guide shows how to build and deploy powerful Web applications with Oracle Application Express and features full coverage of the latest version, APEX 5.0 This comprehensive volume from Oracle Press offers up-to-date coverage of Oracle Application Express (APEX), Oracle’s rapid development tool for the Web developer. APEX is an entirely Web-based framework that comes built into every edition of Oracle Database—its backbone is Oracle’s powerful PL/SQL language, alongside the most advanced Web development technologies like HTML5, mobile development, and full support of CSS and JavaScript. APEX enables anyone—from novice user to seasoned developer—to easily create Web applications that are powerful, reliable, and highly scalable. Oracle Application Express: Build Powerful Data‐Centric Web Apps lays out basic information about APEX concepts before delving into the unparalleled power of the platform and describing the new features in version 5.0. You will discover how to install and configure APEX, work with the Application Builder and Page Designer, use built-in wizards, and design custom Web apps. Teaches the cleanest and fastest builds for high-performance, secure web applications Shows how to effectively migrate legacy applications into a modern Web-based environment Authored by early adopters of APEX 5.0 who have been active in the APEX community for years

IBM z Systems Qualified DWDM Ciena 6500 Packet-Optical Platform Platform Release 10.21

This IBM® Redpaper™ publication is one in a series that describes IBM z Systems® qualified dense wavelength division multiplexing (DWDM) vendor products for IBM Geographically Dispersed Parallel Sysplex™ (IBM GDPS®) solutions with Server Time Protocol (STP). The protocols that are described in this paper are used for IBM supported solutions that require cross-site connectivity of a multisite Parallel Sysplex or remote copy technologies, which can include GDPS and non GDPS applications. GDPS qualification testing is conducted at the IBM Vendor Solutions Connectivity (VSC) Lab in Poughkeepsie, NY. IBM and Ciena completed qualification testing of the Ciena 6500 Packet-Optical Packet-Optical platform. This paper describes the applicable environments, protocols, and topologies that are qualified for and supported by z Systems for connecting through the Ciena 6500 Packet-Optical platform hardware and software, release level 10.21. This paper is intended for anyone who wants to learn more about Ciena 6500 Packet-Optical release level 10.21. This document is not meant to determine qualified products. To ensure that the planned products to be implemented are qualified, registered users can see the IBM Resource Link® for current information about qualified DWDM vendor products. For more information about IBM Redbooks® publications for z Systems qualified DWDM vendor products, see the IBM Redbooks website.

IBM Geographically Dispersed Resiliency for IBM Power Systems

Abstract This IBM® Redbooks® publication introduces and provides a broad understanding of the new IBM Geographically Dispersed Resiliency for IBM Power Systems™ solution. The IBM Geographically Dispersed Resiliency for Power Systems solution is a set of software components that together provide a disaster recovery (DR) mechanism for virtual machines (VMs) running on an IBM POWER7® processor-based server or later. This document describes various components, subsystems, and tasks that are associated with the IBM Geographically Dispersed Resiliency for Power Systems solution. This book is targeted at technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) that are responsible for providing high availability (HA) and DR solutions and support on IBM Power Systems servers.

IBM zPDT Guide and Reference: System z Personal Development Tool

Abstract This IBM® Redbooks® publication provides both introductory information and technical details about the IBM System z® Personal Development Tool (IBM zPDT®), which produces a small System z environment suitable for application development. zPDT is a PC Linux application. When zPDT is installed (on Linux), normal System z operating systems (such as IBM z/OS®) can be run on it. zPDT provides the basic System z architecture and emulated IBM 3390 disk drives, 3270 interfaces, OSA interfaces, and so on. The systems that are discussed in this document are complex. They have elements of Linux (for the underlying PC machine), IBM z/Architecture® (for the core zPDT elements), System z I/O functions (for emulated I/O devices), z/OS (the most common System z operating system), and various applications and subsystems under z/OS. The reader is assumed to be familiar with general concepts and terminology of System z hardware and software elements, and with basic PC Linux characteristics. This book provides the primary documentation for zPDT.