talk-data.com talk-data.com

Topic

data

5765

tagged

Activity Trend

3 peak/qtr
2020-Q1 2026-Q1

Activities

5765 activities · Newest first

Data Management Solutions Using SAS Hash Table Operations

Hash tables can do a lot more than you might think! Data Management Solutions Using SAS Hash Table Operations: A Business Intelligence Case Study concentrates on solving your challenging data management and analysis problems via the power of the SAS hash object, whose environment and tools make it possible to create complete dynamic solutions. To this end, this book provides an in-depth overview of the hash table as an in-memory database with the CRUD (Create, Retrieve, Update, Delete) cycle rendered by the hash object tools. By using this concept and focusing on real-world problems exemplified by sports data sets and statistics, this book seeks to help you take advantage of the hash object productively, in particular, but not limited to, the following tasks: Using this book, you will be able to answer your toughest questions quickly and in the most efficient way possible! select proper hash tools to perform hash table operations use proper hash table operations to support specific data management tasks use the dynamic, run-time nature of hash object programming understand the algorithmic principles behind hash table data look-up, retrieval, and aggregation learn how to perform data aggregation, for which the hash object is exceptionally well suited manage the hash table memory footprint, especially when processing big data use hash object techniques for other data processing tasks, such as filtering, combining, splitting, sorting, and unduplicating.

Getting Started with Kudu

Fast data ingestion, serving, and analytics in the Hadoop ecosystem have forced developers and architects to choose solutions using the least common denominator—either fast analytics at the cost of slow data ingestion or fast data ingestion at the cost of slow analytics. There is an answer to this problem. With the Apache Kudu column-oriented data store, you can easily perform fast analytics on fast data. This practical guide shows you how. Begun as an internal project at Cloudera, Kudu is an open source solution compatible with many data processing frameworks in the Hadoop environment. In this book, current and former solutions professionals from Cloudera provide use cases, examples, best practices, and sample code to help you get up to speed with Kudu. Explore Kudu’s high-level design, including how it spreads data across servers Fully administer a Kudu cluster, enable security, and add or remove nodes Learn Kudu’s client-side APIs, including how to integrate Apache Impala, Spark, and other frameworks for data manipulation Examine Kudu’s schema design, including basic concepts and primitives necessary to make your project successful Explore case studies for using Kudu for real-time IoT analytics, predictive modeling, and in combination with another storage engine

Sparse Optimization Theory and Methods

This book presents the state-of-the-art in theory and algorithms for signal recovery under the sparsity assumption. The unique conditions for the sparsest solution of underdetermined linear systems are described, and the results for sparse signal recovery under the range space property (RSP) are introduced. This framework is generalized to 1-bit compressed sensing, leading to a novel sign recovery theory in this area. Two efficient sparsity-seeking algorithms are presented, and theoretical efficiency of these algorithms are rigorously analysed. Under the RSP assumption, the author also provides a unified stability analysis for several popular optimization methods for sparse signal recovery.

Learning SAS by Example

Learn to program SAS by example! Learning SAS by Example, A Programmer’s Guide, Second Edition, teaches SAS programming from very basic concepts to more advanced topics. Because most programmers prefer examples rather than reference-type syntax, this book uses short examples to explain each topic. The second edition has brought this classic book on SAS programming up to the latest SAS version, with new chapters that cover topics such as PROC SGPLOT and Perl regular expressions. This book belongs on the shelf (or e-book reader) of anyone who programs in SAS, from those with little programming experience who want to learn SAS to intermediate and even advanced SAS programmers who want to learn new techniques or identify new ways to accomplish existing tasks. In an instructive and conversational tone, author Ron Cody clearly explains each programming technique and then illustrates it with one or more real-life examples, followed by a detailed description of how the program works. The text is divided into four major sections: Getting Started, DATA Step Processing, Presenting and Summarizing Your Data, and Advanced Topics. Subjects addressed include Reading data from external sources Learning details of DATA step programming Subsetting and combining SAS data sets Understanding SAS functions and working with arrays Creating reports with PROC REPORT and PROC TABULATE Getting started with the SAS macro language Leveraging PROC SQL Generating high-quality graphics Using advanced features of user-defined formats and informats Restructuring SAS data sets Working with multiple observations per subject Getting started with Perl regular expressions You can test your knowledge and hone your skills by solving the problems at the end of each chapter.

Apache Hive Essentials - Second Edition

"Apache Hive Essentials" provides a focused guide to mastering the essential techniques of processing and analyzing big data with Apache Hive. What this Book will help me do Set up and configure a Hive environment for big data analysis. Compose effective queries using Hive's SQL-like language to extract insights. Optimize Hive performance to handle complex datasets efficiently. Implement data security and user-defined functions to extend capabilities. Integrate Hive with Hadoop tools for comprehensive data solutions. Author(s) Dayong Du, the author of "Apache Hive Essentials," has years of experience working with big data technologies and tools. With hands-on expertise in Hadoop and the entire ecosystem, he brings a practical and informed perspective to this complex field. His approach is to make these technologies accessible to developers and analysts of all levels. Who is it for? This book is perfect for data analysts, developers, or professionals familiar with SQL who are looking to start with Apache Hive for big data processing. It is suitable for those acquainted with Hadoop and its environment and want to expand their skills into efficient data querying and management. Readers should have an interest in how to leverage big data tools for real-world solutions.

Hands-On Data Analysis with NumPy and pandas

Dive into 'Hands-On Data Analysis with NumPy and pandas' to explore the world of Python for data analysis. This book guides you through using these powerful Python libraries to handle and manipulate data efficiently. You will learn hands-on techniques to read, sort, group, and visualize data for impactful analysis. What this Book will help me do Learn to set up a Python environment for data analysis with tools like Jupyter notebooks. Master data handling using NumPy, focusing on array creation, slicing, and operations. Understand the functionalities of pandas for managing datasets, including DataFrame operations. Discover techniques for data preparation, such as handling missing data and hierarchical indexing. Explore data visualization using pandas and create impactful plots for data insights. Author(s) The book is authored by None Miller, a seasoned Python developer and data analyst. With a strong background in leveraging Python for data processing, None focuses on creating content that is practical and accessible. The author's teaching approach emphasizes hands-on practice and understanding, making technical topics approachable and engaging. Who is it for? This book is ideal for Python developers at a beginner to intermediate level looking to venture into data analysis. If you are transitioning from general programming to data-focused work or need to enhance your skills in data manipulation and processing, this book will be a strong foundation. It requires no prior experience with data analysis, so it is accessible to many learners.

PySpark Cookbook

Dive into the world of big data processing and analytics with the "PySpark Cookbook". This book provides over 60 hands-on recipes for implementing efficient data-intensive solutions using Apache Spark and Python. By mastering these recipes, you'll be equipped to tackle challenges in large-scale data processing, machine learning, and stream analytics. What this Book will help me do Set up and configure PySpark environments effectively, including working with Jupyter for enhanced interactivity. Understand and utilize DataFrames for data manipulation, analysis, and transformation tasks. Develop end-to-end machine learning solutions using the ML and MLlib modules in PySpark. Implement structured streaming and graph-processing solutions to analyze and visualize data streams and relationships. Deploy PySpark applications to the cloud infrastructure efficiently using best practices. Author(s) This book is co-authored by None Lee and None Drabas, who are experienced professionals in data processing and analytics leveraging Python and Apache Spark. With their deep technical expertise and a passion for teaching through practical examples, they aim to make the complex concepts of PySpark accessible to developers of varied experience levels. Who is it for? This book is ideal for Python developers who are keen to delve into the Apache Spark ecosystem. Whether you're just starting with big data or have some experience with Spark, this book provides practical recipes to enhance your skills. Readers looking to solve real-world data-intensive challenges using PySpark will find this resource invaluable.

Streaming Change Data Capture

There are many benefits to becoming a data-driven organization, including the ability to accelerate and improve business decision accuracy through the real-time processing of transactions, social media streams, and IoT data. But those benefits require significant changes to your infrastructure. You need flexible architectures that can copy data to analytics platforms at near-zero latency while maintaining 100% production uptime. Fortunately, a solution already exists. This ebook demonstrates how change data capture (CDC) can meet the scalability, efficiency, real-time, and zero-impact requirements of modern data architectures. Kevin Petrie, Itamar Ankorion, and Dan Potter—technology marketing leaders at Attunity—explain how CDC enables faster and more accurate decisions based on current data and reduces or eliminates full reloads that disrupt production and efficiency. The book examines: How CDC evolved from a niche feature of database replication software to a critical data architecture building block Architectures where data workflow and analysis take place, and their integration points with CDC How CDC identifies and captures source data updates to assist high-speed replication to one or more targets Case studies on cloud-based streaming and streaming to a data lake and related architectures Guiding principles for effectively implementing CDC in cloud, data lake, and streaming environments The Attunity Replicate platform for efficiently loading data across all major database, data warehouse, cloud, streaming, and Hadoop platforms

IBM Db2 11.1 Certification Guide

Delve into the IBM Db2 11.1 Certification Guide to comprehensively prepare for the IBM C2090-600 exam and master database programming and administration tasks in Db2 environments. Across its insightful chapters, this guide provides practical steps, expert guidance, and over 150 practice questions aimed at ensuring your success. What this Book will help me do Master Db2 server management, including configuration and maintenance tasks, to ensure optimized performance. Implement advanced features such as BLU Acceleration and Db2 pureScale to enhance database functionality. Gain expertise in security protocols, including data encryption and integrity enforcement, for secure database environments. Troubleshoot common Db2 issues using advanced diagnostic tools like db2pd and dsmtop, improving efficiency and uptime. Develop skills in creating and altering database objects, enabling robust database design and management. Author(s) The authors, None Collins and None Saraswatipura, are seasoned database professionals with vast experience in administering and optimizing Db2 environments. Their expertise in guiding students and professionals shines through in the accessible language and practical approach of the book. They bring a blend of theoretical and hands-on insights to ensure learners not only understand but also apply the knowledge effectively. Who is it for? This book is ideal for database administrators, architects, and application developers who are pursuing certification in Db2. It caters to readers with basic Db2 understanding seeking to advance their skills. Whether you're aiming for professional growth or practical expertise, this guide serves your goals by covering certification essentials while enriching your practical knowledge.

Mastering Numerical Computing with NumPy

"Mastering Numerical Computing with NumPy" is a comprehensive guide to becoming proficient in numerical computing using Python's NumPy library. This book will teach you how to perform advanced numerical operations, explore data statistically, and build predictive models effectively. By mastering the provided concepts and exercises, you'll be empowered in your scientific computing projects. What this Book will help me do Perform and optimize vector and matrix operations effectively using NumPy. Analyze data using exploratory data analysis techniques and predictive modeling. Implement unsupervised learning algorithms such as clustering with relevant datasets. Understand advanced benchmarks and select optimal configurations for performance. Write efficient and scalable programs utilizing advanced NumPy features. Author(s) The authors of "Mastering Numerical Computing with NumPy" include domain experts and educators with years of experience in Python programming, numerical computing, and data science. They bring a practical and detailed approach to teaching advanced topics and guide you through every step of mastering NumPy. Who is it for? This book is ideal for Python programmers, data analysts, and data science enthusiasts who aim to deepen their understanding of numerical computing. If you have basic mathematics skills and want to utilize NumPy to solve complex data problems, this book is an excellent resource. Whether you're a beginner or an intermediate user, you will find this content approachable and enriching. Advanced users will benefit from the highly specialized content and real-world examples.

Practical Enterprise Data Lake Insights: Handle Data-Driven Challenges in an Enterprise Big Data Lake

Use this practical guide to successfully handle the challenges encountered when designing an enterprise data lake and learn industry best practices to resolve issues. When designing an enterprise data lake you often hit a roadblock when you must leave the comfort of the relational world and learn the nuances of handling non-relational data. Starting from sourcing data into the Hadoop ecosystem, you will go through stages that can bring up tough questions such as data processing, data querying, and security. Concepts such as change data capture and data streaming are covered. The book takes an end-to-end solution approach in a data lake environment that includes data security, high availability, data processing, data streaming, and more. Each chapter includes application of a concept, code snippets, and use case demonstrations to provide you with a practical approach. You will learn the concept, scope, application, and starting point. What You'll Learn Get to know data lake architecture and design principles Implement data capture and streaming strategies Implement data processing strategies in Hadoop Understand the data lake security framework and availability model Who This Book Is For Big data architects and solution architects

Hortonworks Data Platform with IBM Spectrum Scale: Reference Guide for Building an Integrated Solution

This IBM® Redpaper™ publication provides guidance on building an enterprise-grade data lake by using IBM Spectrum™ Scale and Hortonworks Data Platform for performing in-place Hadoop or Spark-based analytics. It covers the benefits of the integrated solution, and gives guidance about the types of deployment models and considerations during the implementation of these models. Hortonworks Data Platform (HDP) is a leading Hadoop and Spark distribution. HDP addresses the complete needs of data-at-rest, powers real-time customer applications, and delivers robust analytics that accelerate decision making and innovation. IBM Spectrum Scale™ is flexible and scalable software-defined file storage for analytics workloads. Enterprises around the globe have deployed IBM Spectrum Scale to form large data lakes and content repositories to perform high-performance computing (HPC) and analytics workloads. It can scale performance and capacity both without bottlenecks.

Domain-Specific Languages in R: Advanced Statistical Programming

Gain an accelerated introduction to domain-specific languages in R, including coverage of regular expressions. This compact, in-depth book shows you how DSLs are programming languages specialized for a particular purpose, as opposed to general purpose programming languages. Along the way, you’ll learn to specify tasks you want to do in a precise way and achieve programming goals within a domain-specific context. Domain-Specific Languages in R includes examples of DSLs including large data sets or matrix multiplication; pattern matching DSLs for application in computer vision; and DSLs for continuous time Markov chains and their applications in data science. After reading and using this book, you’ll understand how to write DSLs in R and have skills you can extrapolate to other programming languages. What You'll Learn Program with domain-specific languages using R Discover the components of DSLs Carry out large matrix expressions and multiplications Implement metaprogramming with DSLs Parse and manipulate expressions Who This Book Is For Those with prior programming experience. R knowledge is helpful but not required.

Big Data Architect???s Handbook

Big Data Architect's Handbook is your comprehensive guide to mastering the art of building sophisticated big data solutions. As you delve into this book, you'll learn to design end-to-end big data pipelines and integrate data from various sources for insightful analysis. What this Book will help me do Understand the Hadoop ecosystem and familiarize yourself with major Apache projects. Make informed decisions when designing cloud infrastructures for big data needs. Gain expertise in analyzing structured and unstructured data using machine learning. Develop skills to implement scalable and efficient big data pipelines. Enhance your ability to visualize and monitor data insights effectively. Author(s) None Akhtar has amassed a wealth of experience in big data architecture and related technologies. With years of hands-on involvement in development, analysis, and implementation of big data systems, None brings a pragmatic and insightful perspective. This passion for educating others about data-driven technologies shines through in a user-first approach to making complex topics accessible. Who is it for? This book caters to aspiring data professionals, software developers, and tech enthusiasts aiming to enhance their expertise in big data. Readers with basic programming and data analysis skills will find the content approachable yet challenging enough to deepen their understanding. If your career goal involves managing, analyzing, and making decisions based on large datasets, this book will help bridge the gap between skill and application.

Introducing the MySQL 8 Document Store

Learn the new Document Store feature of MySQL 8 and build applications around a mix of the best features from SQL and NoSQL database paradigms. Don’t allow yourself to be forced into one paradigm or the other, but combine both approaches by using the Document Store. MySQL 8 was designed from the beginning to bridge the gap between NoSQL and SQL. Oracle recognizes that many solutions need the capabilities of both. More specifically, developers need to store objects as loose collections of schema-less documents, but those same developers also need the ability to run structured queries on their data. With MySQL 8, you can do both! Introducing the MySQL 8 Document Store presents new tools and features that make creating a hybrid database solution far easier than ever before. This book covers the vitally important MySQL Document Store, the new X Protocol for developing applications, and a new client shell called the MySQL Shell. Also covered are supporting technologies and concepts such as JSON, schema-less documents, and more. The book gives insight into how features work and how to apply them to get the most out of your MySQL experience. The book covers topics such as: The headline feature in MySQL 8 MySQL's answer to NoSQL New APIs and client protocols What You'll Learn Create NoSQL-style applications by using the Document Store Mix the NoSQL and SQL approaches by using each to its best advantage in a hybrid solution Work with the new X Protocol for application connectivity in MySQL 8 Master the new X Developer Application Programming Interfaces Combine SQL and JSON in the same database and application Migrate existing applications to MySQL Document Store Who This Book Is For Developers and database professionals wanting to learn about the most profound paradigm-changing features of the MySQL 8 Document Store

Python Graphics: A Reference for Creating 2D and 3D Images

This book will show you how to use Python to create graphic objects for technical illustrations and data visualization. Often, the function you need to produce the image you want cannot be found in a standard Python library. Knowing how to create your own graphics will free you from the chore of looking for a function that may not exist or be difficult to use. This book will give you the tools to eliminate that process and create and customize your own graphics to satisfy your own unique requirements. Using basic geometry and trigonometry, you will learn how to create math models of 2D and 3D shapes. Using Python, you will then learn how to project these objects onto the screen of your monitor, translate and rotate them in 2D and 3D, remove hidden lines, add shading, view in perspective, view intersections between surfaces, and display shadows cast from one object onto another. You will also learn how to visualize and analyze 2D and 3D data sets, fit lines, splines and functions. The final chapter includes demonstrations from quantum mechanics, astronomy and climate science. Includes Python programs written in a clear and open style with detailed explanation of the code. What You Will Learn How to create math and Python models of 2D and 3D shapes. How to rotate, view in perspective, shade, remove hidden lines, display projected shadows, and more. How to analyze and display data sets as curves and surfaces, fit lines and functions. Who This Book Is For Python developers, scientists, engineers, and students using Python to produce technical illustrations, display and analyze data sets. Assumes familiarity with vectors, matrices, geometry and trigonometry.

Designing Fast Data Application Architectures

Today’s digital companies demand real-time insights and immediate action for everything from purchase to fulfillment, recommendation, and more. As a result, many organizations are adopting fast data applications to accelerate the value they extract from data as it flows into the system. With this practical ebook, you’ll learn the common architectural patterns that form the foundation of successful fast data deployments. Engineers from Lightbend identify the key characteristics of fast data architectures, separate them into functional blocks, and show you how to implement those functions using components like those in the SMACK stack—Spark, Mesos, Akka, Cassandra, and Kafka, as well as others. Architects will learn how to choose, combine, and run SMACK stack technologies to build resilient, scalable, and responsive systems that your company requires. This ebook examines: The anatomy of fast data applications: the application model, streaming data sources, processing engines, and data sinks Functional composition of the SMACK stack and extensions The event backbone that connects all the major components of a fast data platform together Compute engines for transforming data into valuable insights Storage systems that form the transition between the fast data domain and client applications Patterns you can use in the data serving layer, including data-driven microservices Container orchestrators in the substrate layer that provide resources to services, frameworks, and applications

Hands-On Data Visualization with Bokeh

Dive into the world of interactive data visualization with the Python library Bokeh. In this book, you will learn to create dynamic, engaging visualizations that communicate your data insights effectively. Starting with the basics of installation and setup, you will be guided through progressively advanced techniques to build visually appealing and interactive plots, concluding with hosting your Bokeh applications. What this Book will help me do Install and configure the Bokeh Python library for interactive data visualization projects. Create visually appealing and informative plots using Bokeh's glyph model. Leverage data structures like Pandas and NumPy to efficiently visualize data. Enhance the interactivity and functionality of plots using widgets and layouts in Bokeh. Build and deploy professional-grade data visualization applications using the Bokeh Server. Author(s) None Jolly is an experienced data visualization expert and Python programmer specializing in creating interactive and insightful visualizations. With a passion for teaching and a knack for simplifying complex concepts, they bring a practical and hands-on approach to technical education. Their work empowers professionals to effectively communicate complex data through visually intuitive designs. Who is it for? This book is intended for data professionals like analysts and scientists who seek to add interactivity to their visualizations using Python. Ideal readers will have basic Python knowledge but are new to Bokeh. It's also for anyone curious about building data visualization web applications, moving beyond static charts to impactful interactive tools, and extending their data storytelling skills.

Python vs. R for Data Science

Python and R are two of the mainstream languages in data science. Fundamentally, Python is a language for programmers, whereas R is a language for statisticians. In a data science context, there is a significant degree of overlap when it comes to the capabilities of each language in the fields of regression analysis and machine learning. Your choice of language will depend highly on the environment in which you are operating. In a production environment, Python integrates with other languages much more seamlessly and is therefore the modus operandi in this context. However, R is much more common in research environments due to its more extensive selection of libraries for statistical analysis.