talk-data.com talk-data.com

Topic

data

5765

tagged

Activity Trend

3 peak/qtr
2020-Q1 2026-Q1

Activities

5765 activities · Newest first

Advanced Analytics with Spark

In this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example. You’ll start with an introduction to Spark and its ecosystem, and then dive into patterns that apply common techniques—classification, collaborative filtering, and anomaly detection among others—to fields such as genomics, security, and finance. If you have an entry-level understanding of machine learning and statistics, and you program in Java, Python, or Scala, you’ll find these patterns useful for working on your own data applications.

IBM GDPS Active/Active Overview and Planning

IBM® Geographically Dispersed Parallel Sysplex™ (GDPS®) is a collection of several offerings, each addressing a different set of IT resiliency goals. It can be tailored to meet the recovery point objective (RPO), which is how much data can you are willing to lose or recreate, and the recovery time objective (RTO), which identifies how long can you afford to be without your systems for your business from the initial outage to having your critical business processes available to users. Each offering uses a combination of server and storage hardware or software-based replication, and automation and clustering software technologies. This IBM Redbooks® publication presents an overview of the IBM GDPS active/active (GDPS/AA) offering and the role it plays in delivering a business IT resilience solution.

Real-World Hadoop

If you’re a business team leader, CIO, business analyst, or developer interested in how Apache Hadoop and Apache HBase-related technologies can address problems involving large-scale data in cost-effective ways, this book is for you. Using real-world stories and situations, authors Ted Dunning and Ellen Friedman show Hadoop newcomers and seasoned users alike how NoSQL databases and Hadoop can solve a variety of business and research issues. You’ll learn about early decisions and pre-planning that can make the process easier and more productive. If you’re already using these technologies, you’ll discover ways to gain the full range of benefits possible with Hadoop. While you don’t need a deep technical background to get started, this book does provide expert guidance to help managers, architects, and practitioners succeed with their Hadoop projects. Examine a day in the life of big data: India’s ambitious Aadhaar project Review tools in the Hadoop ecosystem such as Apache’s Spark, Storm, and Drill to learn how they can help you Pick up a collection of technical and strategic tips that have helped others succeed with Hadoop Learn from several prototypical Hadoop use cases, based on how organizations have actually applied the technology Explore real-world stories that reveal how MapR customers combine use cases when putting Hadoop and NoSQL to work, including in production Ted Dunning is Chief Applications Architect at MapR Technologies, and committer and PMC member of the Apache’s Drill, Storm, Mahout, and ZooKeeper projects. He is also mentor for Apache’s Datafu, Kylin, Zeppelin, Calcite, and Samoa projects. Ellen Friedman is a solutions consultant, speaker, and author, writing mainly about big data topics. She is a committer for the Apache Mahout project and a contributor to the Apache Drill project.

Machine Learning

This tutorial text gives a unifying perspective on machine learning by covering both probabilistic and deterministic approaches -which are based on optimization techniques – together with the Bayesian inference approach, whose essence lies in the use of a hierarchy of probabilistic models. The book presents the major machine learning methods as they have been developed in different disciplines, such as statistics, statistical and adaptive signal processing and computer science. Focusing on the physical reasoning behind the mathematics, all the various methods and techniques are explained in depth, supported by examples and problems, giving an invaluable resource to the student and researcher for understanding and applying machine learning concepts. The book builds carefully from the basic classical methods to the most recent trends, with chapters written to be as self-contained as possible, making the text suitable for different courses: pattern recognition, statistical/adaptive signal processing, statistical/Bayesian learning, as well as short courses on sparse modeling, deep learning, and probabilistic graphical models. All major classical techniques: Mean/Least-Squares regression and filtering, Kalman filtering, stochastic approximation and online learning, Bayesian classification, decision trees, logistic regression and boosting methods. The latest trends: Sparsity, convex analysis and optimization, online distributed algorithms, learning in RKH spaces, Bayesian inference, graphical and hidden Markov models, particle filtering, deep learning, dictionary learning and latent variables modeling. Case studies - protein folding prediction, optical character recognition, text authorship identification, fMRI data analysis, change point detection, hyperspectral image unmixing, target localization, channel equalization and echo cancellation, show how the theory can be applied. MATLAB code for all the main algorithms are available on an accompanying website, enabling the reader to experiment with the code.

Bayesian Inference for Partially Identified Models

This book shows how the Bayesian approach to inference is applicable to partially identified models (PIMs) and examines the performance of Bayesian procedures in partially identified contexts. Drawing on his many years of research in this area, the author presents a thorough overview of the statistical theory, properties, and applications of PIMs. He covers a range of PIMs, including models for misclassified data and models involving instrumental variables. He also includes real data applications of PIMs that have recently appeared in the literature.

Exchanging Data between SAS and Microsoft Excel

Master simple-to-complex techniques for transporting and managing data between SAS and Excel William Benjamin's Exchanging Data between SAS and Microsoft Excel: Tips and Techniques to Transfer and Manage Data More Efficiently describes many of the options and methods that enable a SAS programmer to transport data between SAS and Excel. The book includes examples that all levels of SAS and Excel users can apply to their everyday programming tasks. Because the book makes no assumptions about the skill levels of either SAS or Excel users, it has a wide-ranging application, providing detailed instructions about how to apply the techniques shown. It contains sections that gather instructional and syntactical information together that are otherwise widely dispersed, and it provides detailed examples about how to apply the software to everyday applications. These examples enable novice users and power developers alike the chance to expand their capabilities and enhance their skillsets. By moving from simple-to-complex applications and examples, the layout of the book allows it to be used as both a training and a reference tool. Excel users and SAS programmers are presented with tools that will assist in the integration of SAS and Excel processes in order to automate reporting and programming interfaces. This enables programming staff to request their own reports or processes and, in turn, support a much larger community.

Hadoop: The Definitive Guide, 4th Edition

Get ready to unlock the power of your data. With the fourth edition of this comprehensive guide, youâ??ll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. This book is ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters. Using Hadoop 2 exclusively, author Tom White presents new chapters on YARN and several Hadoop-related projects such as Parquet, Flume, Crunch, and Spark. Youâ??ll learn about recent changes to Hadoop, and explore new case studies on Hadoopâ??s role in healthcare systems and genomics data processing. Learn fundamental components such as MapReduce, HDFS, and YARN Explore MapReduce in depth, including steps for developing applications with it Set up and maintain a Hadoop cluster running HDFS and MapReduce on YARN Learn two data formats: Avro for data serialization and Parquet for nested data Use data ingestion tools such as Flume (for streaming data) and Sqoop (for bulk data transfer) Understand how high-level data processing tools like Pig, Hive, Crunch, and Spark work with Hadoop Learn the HBase distributed database and the ZooKeeper distributed configuration service

Learning MySQL and MariaDB

If you’re a programmer new to databases—or just new to MySQL and its community-driven variant, MariaDB—you’ve found the perfect introduction. This hands-on guide provides an easy, step-by-step approach to installing, using, and maintaining these popular relational database engines. Author Russell Dyer, Curriculum Manager at MariaDB and former editor of the MySQL Knowledge Base, takes you through database design and the basics of data management and manipulation, using real-world examples and many practical tips. Exercises and review questions help you practice what you’ve just learned. Create and alter MySQL tables and specify fields and columns within them Learn how to insert, select, update, delete, join, and subquery data, using practical examples Use built-in string functions to find, extract, format, and convert text from columns Learn functions for mathematical or statistical calculations, and for formatting date and time values Perform administrative duties such as managing user accounts, backing up databases, and importing large amounts of data Use APIs to connect and query MySQL and MariaDB with PHP and other languages

Storm Applied

Storm Applied is a practical guide to using Apache Storm for the real-world tasks associated with processing and analyzing real-time data streams. This immediately useful book starts by building a solid foundation of Storm essentials so that you learn how to think about designing Storm solutions the right way from day one. But it quickly dives into real-world case studies that will bring the novice up to speed with productionizing Storm. About the Technology It's hard to make sense out of data when it's coming at you fast. Like Hadoop, Storm processes large amounts of data but it does it reliably and in real time, guaranteeing that every message will be processed. Storm allows you to scale with your data as it grows, making it an excellent platform to solve your big data problems. About the Book Storm Applied is an example-driven guide to processing and analyzing real-time data streams. This immediately useful book starts by teaching you how to design Storm solutions the right way. Then, it quickly dives into real-world case studies that show you how to scale a high-throughput stream processor, ensure smooth operation within a production cluster, and more. Along the way, you'll learn to use Trident for stateful stream processing, along with other tools from the Storm ecosystem. What's Inside Mapping real problems to Storm components Performance tuning and scaling Practical troubleshooting and debugging Exactly-once processing with Trident About the Reader This book moves through the basics quickly. While prior experience with Storm is not assumed, some experience with big data and real-time systems is helpful. About the Authors Sean Allen, Matthew Jankowski, and Peter Pathirana lead the development team for a high-volume, search-intensive commercial web application at TheLadders. Quotes Will no doubt become the definitive practitioner’s guide for Storm users. - From the Foreword by Andrew Montalenti The book’s practical approach to Storm will save you a lot of hassle and a lot of time. - Tanguy Leroux, Elasticsearch Great introduction to distributed computing with lots of real-world examples. - Shay Elkin, Tangent Logic Go beyond the MapReduce way of thinking to solve big data problems. - Muthusamy Manigandan, OzoneMedia

Financial Forecasting, Analysis and Modelling: A Framework for Long-Term Forecasting

Risk analysis has become critical to modern financial planning Financial Forecasting, Analysis and Modelling provides a complete framework of long-term financial forecasts in a practical and accessible way, helping finance professionals include uncertainty in their planning and budgeting process. With thorough coverage of financial statement simulation models and clear, concise implementation instruction, this book guides readers step-by-step through the entire projection plan development process. Readers learn the tools, techniques, and special considerations that increase accuracy and smooth the workflow, and develop a more robust analysis process that improves financial strategy. The companion website provides a complete operational model that can be customised to develop financial projections or a range of other key financial measures, giving readers an immediately-applicable tool to facilitate effective decision-making. In the aftermath of the recent financial crisis, the need for experienced financial modelling professionals has steadily increased as organisations rush to adjust to economic volatility and uncertainty. This book provides the deeper level of understanding needed to develop stronger financial planning, with techniques tailored to real-life situations. Develop long-term projection plans using Excel Use appropriate models to develop a more proactive strategy Apply risk and uncertainty projections more accurately Master the Excel Scenario Manager, Sensitivity Analysis, Monte Carlo Simulation, and more Risk plays a larger role in financial planning than ever before, and possible outcomes must be measured before decisions are made. Uncertainty has become a critical component in financial planning, and accuracy demands it be used appropriately. With special focus on uncertainty in modelling and planning, Financial Forecasting, Analysis and Modelling is a comprehensive guide to the mechanics of modern finance.

From Big Data to Smart Data

A pragmatic approach to Big Data by taking the reader on a journey between Big Data (what it is) and the Smart Data (what it is for). Today's decision making can be reached via information (related to the data), knowledge (related to people and processes), and timing (the capacity to decide, act and react at the right time). The huge increase in volume of data traffic, and its format (unstructured data such as blogs, logs, and video) generated by the "digitalization" of our world modifies radically our relationship to the space (in motion) and time, dimension and by capillarity, the enterprise vision of performance monitoring and optimization.

Modeling and Analysis of Compositional Data

Modeling and Analysis of Compositional Data presents a practical and comprehensive introduction to the analysis of compositional data along with numerous examples to illustrate both theory and application of each method. Based upon short courses delivered by the authors, it provides a complete and current compendium of fundamental to advanced methodologies along with exercises at the end of each chapter to improve understanding, as well as data and a solutions manual which is available on an accompanying website. Complementing Pawlowsky-Glahn's earlier collective text that provides an overview of the state-of-the-art in this field, Modeling and Analysis of Compositional Data fills a gap in the literature for a much-needed manual for teaching, self learning or consulting.

PostgreSQL for Data Architects

Dive into the world of scalable and maintainable PostgreSQL databases with 'PostgreSQL for Data Architects.' This book is your companion to mastering PostgreSQL and learning how to configure, optimize, and manage database systems effectively. Whether you are designing a new database or maintaining and improving an existing one, you'll find practical tips and techniques tailored for data-intensive applications. What this Book will help me do Master PostgreSQL architecture, compilation, and configuration for custom setups. Optimize database performance with advanced indexing, query tuning, and parameter adjustments. Leverage replication to scale databases horizontally and ensure high availability. Set up robust backup and recovery processes to secure and manage data effectively. Troubleshoot effectively using PostgreSQL's tools and logging mechanisms to resolve issues promptly. Author(s) Jayadevan M is a seasoned data architect with years of experience working on database design and optimization for diverse applications. His expertise spans various database management systems with a focus on practical, performance-oriented solutions. Through his writing, Jayadevan aims to make sophisticated database concepts accessible to developers seeking to advance their skills and build resilient, scalable systems. Who is it for? This book is perfect for developers and data architects who already have a basic understanding of database structures, such as tables and security configurations, looking to deepen their PostgreSQL skills. If your goal is to design, manage, or optimize database applications with PostgreSQL effectively, this guide will act as a vital resource. Additionally, those involved in performance tuning or database scalability projects will find it invaluable.

Implementing the IBM Storwize V7000 V7.4

Continuing its commitment to developing and delivering industry-leading storage technologies, IBM® introduces the IBM Storwize® V7000 solution, an innovative storage offering that delivers essential storage efficiency technologies and exceptional ease of use and performance, all integrated into a compact, modular design that is offered at a competitive, midrange price. The IBM Storwize V7000 solution incorporates some of the top IBM technologies typically found only in enterprise-class storage systems, raising the standard for storage efficiency in midrange disk systems. This cutting-edge storage system extends the comprehensive storage portfolio from IBM and can help change the way organizations address the ongoing information explosion. This IBM Redbooks® publication introduces the features and functions of the IBM Storwize V7000 system through several examples. This book is aimed at pre-sales and post-sales technical support and marketing, storage administrators, and will help you understand the architecture of the Storwize V7000, how to implement it, and take advantage of the industry-leading functions and features.

Mastering Apache Cassandra - Second Edition

Mastering Apache Cassandra - Second Edition is your comprehensive guide to understanding and utilizing the power of Cassandra, an efficient and scalable NoSQL database. Throughout this book, you will learn how to design, deploy, and manage Cassandra databases effectively, tailored to your application's needs. What this Book will help me do Understand the architecture of Apache Cassandra and how it ensures scalability and reliability. Learn to build, configure, and deploy a Cassandra database cluster for high performance. Develop skills in monitoring and tuning Cassandra clusters for optimal operation. Gain expertise in managing clusters through scaling, node repair, and backup strategies. Integrate Apache Cassandra with other tools and your application seamlessly. Author(s) Nishant Neeraj is an experienced software developer and database engineer with a focus on delivering high-performance solutions. They have extensive hands-on experience with NoSQL databases, especially Apache Cassandra, and bring their practical insights and in-depth technical knowledge to this book to help readers tackle real-world challenges. Who is it for? This book is ideal for intermediate developers aiming to enhance their expertise in NoSQL databases. If you have a foundational understanding of database concepts and want to bring your skills to a professional level by mastering Apache Cassandra for modern applications, this book is perfect for you. It provides actionable insights and guidance suitable for professionals tackling high concurrency and big data challenges. Whether you are a developer, database administrator, or architect, this book provides a targeted deep dive into Cassandra.

R Packages

Turn your R code into packages that others can easily download and use. This practical book shows you how to bundle reusable R functions, sample data, and documentation together by applying author Hadley Wickham’s package development philosophy. In the process, you’ll work with devtools, roxygen, and testthat, a set of R packages that automate common development tasks. Devtools encapsulates best practices that Hadley has learned from years of working with this programming language. Ideal for developers, data scientists, and programmers with various backgrounds, this book starts you with the basics and shows you how to improve your package writing over time. You’ll learn to focus on what you want your package to do, rather than think about package structure. Learn about the most useful components of an R package, including vignettes and unit tests Automate anything you can, taking advantage of the years of development experience embodied in devtools Get tips on good style, such as organizing functions into files Streamline your development process with devtools Learn the best way to submit your package to the Comprehensive R Archive Network (CRAN) Learn from a well-respected member of the R community who created 30 R packages, including ggplot2, dplyr, and tidyr

Indoor Wayfinding and Navigation

Outdoor wayfinding and navigation systems and services have become indispensable in people's mobility in unfamiliar environments. Advances in key technologies (e.g., positioning and mobile devices), has spurred interest in research and development of indoor wayfinding and navigation systems and services in recent years. Indoor Wayfinding and Navigation provides both breadth and depth of knowledge in designing and building indoor wayfinding and navigation systems and services. It covers the types of sensors both feasible and practical for localization of users inside buildings. The book discusses current approaches, techniques, and technologies for addressing issues in indoor wayfinding and navigation systems and services. It includes coverage of the cognitive, positioning, mapping, and application perspectives, an unusual but useful combination of information. This mix of different perspectives helps you better understand the issues and challenges of building indoor wayfinding and navigation systems and services, how they are different from those used outdoors, and how they can be used efficiently and effectively in challenging applications. Written by well-known specialists in the field, the book addresses all aspects of indoor wayfinding and navigation. It includes the latest research developments on the topic, succinctly covers the fundamentals, and details the issues and challenges in building new systems and services. With this information, you can design indoor wayfinding and navigation systems and services for a variety of uses and users.

Social Big Data Mining

This book focuses on the basic concepts and the related technologies of data mining for social medial. Topics include: big data and social data, data mining for making a hypothesis, multivariate analysis for verifying the hypothesis, web mining and media mining, natural language processing, social big data applications, and scalability. It explains analytical techniques such as modeling, data mining, and multivariate analysis for social big data. This book is different from other similar books in that presents the overall picture of social big data from fundamental concepts to applications while standing on academic bases.

CMDB Systems

CMDB Systems: Making Change Work in the Age of Cloud and Agile shows you how an integrated database across all areas of an organization’s information system can help make organizations more efficient reduce challenges during change management and reduce total cost of ownership (TCO). In addition, this valuable reference provides guidelines that will enable you to avoid the pitfalls that cause CMDB projects to fail and actually shorten the time required to achieve an implementation of a CMDB. Drawing upon extensive experience and using illustrative real world examples, Rick Sturm, Dennis Drogseth and Dan Twing discuss: Unique insights from extensive industry exposure, research and consulting on the evolution of CMDB/CMS technology and ongoing dialog with the vendor community in terms of current and future CMDB/CMS design and plans Proven and structured best practices for CMDB deployments Clear and documented insights into the impacts of cloud computing and other advances on CMDB/CMS futures Discover unique insights from industry experts who consult on the evolution of CMDB/CMS technology and will show you the steps needed to successfully plan, design and implement CMDB Covers related use-cases from retail, manufacturing and financial verticals from real-world CMDB deployments Provides structured best practices for CMDB deployments Discusses how CMDB adoption can lower total cost of ownership, increase efficiency and optimize the IT enterprise

IBM z13 Technical Introduction

This IBM® Redbooks® publication introduces the IBM z13™. IBM z13 delivers a data and transaction system reinvented as a system of insight for digital business. IBM z Systems™ leadership is extended with these features: Improved ability to meet service level agreements with new processor chip technology that includes simultaneous multithreading, analytical vector processing, redesigned and larger cache, and enhanced accelerators for hardware compression and cryptography Better availability and more efficient use of critical data with up to 10 TB available redundant array of independent memory (RAIM) Validation of transactions, management, and assignment of business priority for SAN devices through updates to the I/O subsystem Continued management of heterogeneous workloads with IBM z BladeCenter Extension (zBX) Model 004 and IBM z Unified Resource Manager This Redbooks publication can help you become familiar with the z Systems platform, and understand how the platform can help integrate data, transactions, and insight for faster and more accurate business decisions. This book explains how, with innovations and traditional strengths, IBM z13 can play an essential role in today's IT environments, and satisfy the demands for cloud deployments, analytics, mobile, and social applications in a trustful, reliable, and secure environment with operations that lessen business risk.