talk-data.com talk-data.com

Topic

Big Data

data_processing analytics large_datasets

1217

tagged

Activity Trend

28 peak/qtr
2020-Q1 2026-Q1

Activities

1217 activities · Newest first

Cassandra Design Patterns - Second Edition

Cassandra Design Patterns is your guide to harnessing the full potential of Apache Cassandra's distributed database capabilities through advanced design practices. Whether you're migrating from an RDBMS or implementing scalable storage for big data, this book provides clear strategies, practical examples, and real-world use cases demonstrating effective design patterns. What this Book will help me do Learn to integrate Cassandra with existing RDBMS solutions, enabling hybrid data architecture. Understand and implement key design patterns for distributed, scalable databases. Master the transition from RDBMS or cache systems to Cassandra with minimal disruption. Dive into time-series and temporal data patterns unique to Cassandra's strengths. Apply learned design patterns directly to real-world big data scenarios for analytics. Author(s) Rajanarayanan Thottuvaikkatumana, the author of Cassandra Design Patterns, is an expert in distributed systems and holds extensive experience in designing and implementing big data solutions. His hands-on approach to Cassandra is evident throughout the book as he bridges theoretical knowledge with practical applications. Rajanarayanan's approachable writing style aims to make complex concepts accessible. Who is it for? This book is ideal for big data developers and system architects who are familiar with the basics of Cassandra and are looking to deepen their understanding of design patterns for robust applications. Readers should have experience with relational databases and desire to migrate or integrate these concepts with NoSQL systems. Whether you're building solutions for data scalability, high availability, or analytics, Cassandra Design Patterns positions itself as an essential resource.

Practical Google Analytics and Google Tag Manager for Developers

Whether you’re a marketer with development skills or a full-on web developer/analyst, Practical Google Analytics and Google Tag Manager for Developers shows you how to implement Google Analytics using Google Tag Manager to jumpstart your web analytics measurement. There’s a reason that so many organizations use Google Analytics. Effective collection of data with Google Analytics can reduce customer acquisition costs, provide priceless feedback on new product initiatives, and offer insights that will grow a customer or client base. So where does Google Tag Manager fit in? Google Tag Manager allows for unprecedented collaboration between marketing and technical teams, lightning fast updates to your site, and standardization of the most common tags for on-site tracking and marketing efforts. To achieve the rich data you're really after to better serve your users’ needs, you'll need the tools Google Tag Manager provides for a best-in-class implementation of Google Analytics measurement on your site. Written by data evangelist and Google Analytics expert Jonathan Weber and the team at LunaMetrics, this book offers foundational knowledge, a collection of practical Google Tag Manager recipes, well-tested best practices, and troubleshooting tips to get your implementation in tip-top condition. It covers topics including: • Google Analytics implementation via Google Tag Manager • How to customize Google Analytics for your unique situation • Using Google Tag Manager to track and analyze interactions across multiple devices and touch points • How to extract data from Google Analytics and use Google BigQuery to analyze Big Data questions

Business Statistics Made Easy in SAS

Learn or refresh core statistical methods for business with SAS® and approach real business analytics issues and techniques using a practical approach that avoids complex mathematics and instead employs easy-to-follow explanations.

Business Statistics Made Easy in SAS® is designed as a user-friendly, practice-oriented, introductory text to teach businesspeople, students, and others core statistical concepts and applications. It begins with absolute core principles and takes you through an overview of statistics, data and data collection, an introduction to SAS®, and basic statistics (descriptive statistics and basic associational statistics). The book also provides an overview of statistical modeling, effect size, statistical significance and power testing, basics of linear regression, introduction to comparison of means, basics of chi-square tests for categories, extrapolating statistics to business outcomes, and some topical issues in statistics, such as big data, simulation, machine learning, and data warehousing.

The book steers away from complex mathematical-based explanations, and it also avoids basing explanations on the traditional build-up of distributions, probability theory and the like, which tend to lose the practice-oriented reader. Instead, it teaches the core ideas of statistics through methods such as careful, intuitive written explanations, easy-to-follow diagrams, step-by-step technique implementation, and interesting metaphors.

With no previous SAS experience necessary, Business Statistics Made Easy in SAS® is an ideal introduction for beginners. It is suitable for introductory undergraduate classes, postgraduate courses such as MBA refresher classes, and for the business practitioner. It is compatible with SAS® University Edition.

Advanced Data Management

Advanced data management has always been at the core of efficient database and information systems. Recent trends like big data and cloud computing have aggravated the need for sophisticated and flexible data storage and processing solutions. This book provides a comprehensive coverage of the principles of data management developed in the last decades with a focus on data structures and query languages. It treats a wealth of different data models and surveys the foundations of structuring, processing, storing and querying data according these models. Starting off with the topic of database design, it further discusses weaknesses of the relational data model, and then proceeds to convey the basics of graph data, tree-structured XML data, key-value pairs and nested, semi-structured JSON data, columnar and record-oriented data as well as object-oriented data. The final chapters round the book off with an analysis of fragmentation, replication and consistency strategies for data management in distributed databases as well as recommendations for handling polyglot persistence in multi-model databases and multi-database architectures. While primarily geared towards students of Master-level courses in Computer Science and related areas, this book may also be of benefit to practitioners looking for a reference book on data modeling and query processing. It provides both theoretical depth and a concise treatment of open source technologies currently on the market.

Learning Bayesian Models with R

Dive into the world of Bayesian Machine Learning with "Learning Bayesian Models with R." This comprehensive guide introduces the foundations of probability theory and Bayesian inference, teaches you how to implement these concepts with the R programming language, and progresses to practical techniques for supervised and unsupervised problems in data science. What this Book will help me do Understand and set up an R environment for Bayesian modeling Build Bayesian models including linear regression and classification for predictive analysis Learn to apply Bayesian inference to real-world machine learning problems Work with big data and high-performance computation frameworks like Hadoop and Spark Master advanced Bayesian techniques and apply them to deep learning and AI challenges Author(s) Hari Manassery Koduvely is a proficient data scientist with extensive experience in leveraging Bayesian frameworks for real-world applications. His passion for Bayesian Machine Learning is evident in his approachable and detailed teaching methodology, aimed at making these complex topics accessible for practitioners. Who is it for? This book is best suited for data scientists, analysts, and statisticians familiar with R and basic probability theory who aim to enhance their expertise in Bayesian approaches. It's ideal for professionals tackling machine learning challenges in applied data contexts. If you're looking to incorporate advanced probabilistic methods into your projects, this guide will show you how.

Learning to Love Data Science

Until recently, many people thought big data was a passing fad. "Data science" was an enigmatic term. Today, big data is taken seriously, and data science is considered downright sexy. With this anthology of reports from award-winning journalist Mike Barlow, you’ll appreciate how data science is fundamentally altering our world, for better and for worse. Barlow paints a picture of the emerging data space in broad strokes. From new techniques and tools to the use of data for social good, you’ll find out how far data science reaches. With this anthology, you’ll learn how: Analysts can now get results from their data queries in near real time Indie manufacturers are blurring the lines between hardware and software Companies try to balance their desire for rapid innovation with the need to tighten data security Advanced analytics and low-cost sensors are transforming equipment maintenance from a cost center to a profit center CIOs have gradually evolved from order takers to business innovators New analytics tools let businesses go beyond data analysis and straight to decision-making Mike Barlow is an award-winning journalist, author, and communications strategy consultant. Since launching his own firm, Cumulus Partners, he has represented major organizations in a number of industries.

Hadoop 2 Quick-Start Guide: Learn the Essentials of Big Data Computing in the Apache Hadoop 2 Ecosystem

Get Started Fast with Apache Hadoop ® 2, YARN, and Today’s Hadoop Ecosystem With Hadoop 2.x and YARN, Hadoop moves beyond MapReduce to become practical for virtually any type of data processing. Hadoop 2.x and the Data Lake concept represent a radical shift away from conventional approaches to data usage and storage. Hadoop 2.x installations offer unmatched scalability and breakthrough extensibility that supports new and existing Big Data analytics processing methods and models. Hadoop ® 2 Quick-Start Guide is the first easy, accessible guide to Apache Hadoop 2.x, YARN, and the modern Hadoop ecosystem. Building on his unsurpassed experience teaching Hadoop and Big Data, author Douglas Eadline covers all the basics you need to know to install and use Hadoop 2 on personal computers or servers, and to navigate the powerful technologies that complement it. Eadline concisely introduces and explains every key Hadoop 2 concept, tool, and service, illustrating each with a simple “beginning-to-end” example and identifying trustworthy, up-to-date resources for learning more. This guide is ideal if you want to learn about Hadoop 2 without getting mired in technical details. Douglas Eadline will bring you up to speed quickly, whether you’re a user, admin, devops specialist, programmer, architect, analyst, or data scientist. Coverage Includes Understanding what Hadoop 2 and YARN do, and how they improve on Hadoop 1 with MapReduce Understanding Hadoop-based Data Lakes versus RDBMS Data Warehouses Installing Hadoop 2 and core services on Linux machines, virtualized sandboxes, or clusters Exploring the Hadoop Distributed File System (HDFS) Understanding the essentials of MapReduce and YARN application programming Simplifying programming and data movement with Apache Pig, Hive, Sqoop, Flume, Oozie, and HBase Observing application progress, controlling jobs, and managing workflows Managing Hadoop efficiently with Apache Ambari–including recipes for HDFS to NFSv3 gateway, HDFS snapshots, and YARN configuration Learning basic Hadoop 2 troubleshooting, and installing Apache Hue and Apache Spark

Sams Teach Yourself: Big Data Analytics with Microsoft HDInsight in 24 Hours

Sams Teach Yourself Big Data Analytics with Microsoft HDInsight in 24 Hours In just 24 lessons of one hour or less, Sams Teach Yourself Big Data Analytics with Microsoft HDInsight in 24 Hours helps you leverage Hadoop’s power on a flexible, scalable cloud platform using Microsoft’s newest business intelligence, visualization, and productivity tools. This book’s straightforward, step-by-step approach shows you how to provision, configure, monitor, and troubleshoot HDInsight and use Hadoop cloud services to solve real analytics problems. You’ll gain more of Hadoop’s benefits, with less complexity–even if you’re completely new to Big Data analytics. Every lesson builds on what you’ve already learned, giving you a rock-solid foundation for real-world success. Practical, hands-on examples show you how to apply what you learn Quizzes and exercises help you test your knowledge and stretch your skills Notes and tips point out shortcuts and solutions Learn how to… Master core Big Data and NoSQL concepts, value propositions, and use cases Work with key Hadoop features, such as HDFS2 and YARN Quickly install, configure, and monitor Hadoop (HDInsight) clusters in the cloud Automate provisioning, customize clusters, install additional Hadoop projects, and administer clusters Integrate, analyze, and report with Microsoft BI and Power BI Automate workflows for data transformation, integration, and other tasks Use Apache HBase on HDInsight Use Sqoop or SSIS to move data to or from HDInsight Perform R-based statistical computing on HDInsight datasets Accelerate analytics with Apache Spark Run real-time analytics on high-velocity data streams Write MapReduce, Hive, and Pig programs Register your book at informit.com/register for convenient access to downloads, updates, and corrections as they become available.

Data Preparation in the Big Data Era

Preparing and cleaning data is notoriously expensive, prone to error, and time consuming: the process accounts for roughly 80% of the total time spent on analysis. As this O’Reilly report points out, enterprises have already invested billions of dollars in big data analytics, so there’s great incentive to modernize methods for cleaning, combining, and transforming data. Author Federico Castanedo, Chief Data Scientist at WiseAthena.com, details best practices for reducing the time it takes to convert raw data into actionable insights. With these tools and techniques in mind, your organization will be well positioned to translate big data into big decisions. Explore the problems organizations face today with traditional prep and integration Define the business questions you want to address before selecting, prepping, and analyzing data Learn new methods for preparing raw data, including date-time and string data Understand how some cleaning actions (like replacing missing values) affect your analysis Examine data curation products: modern approaches that scale Consider your business audience when choosing ways to deliver your analysis

Private and Open Data in Asia: A Regional Guide

The rise of big data in recent years coincides with the economic and political rise of Asia, especially among the five countries that make up the bulk of the East Asian Internet-using population: China, Japan, Korea, India, and Indonesia. If you’re thinking of entering the Asian market, this O’Reilly report provides an overview of the current state of big data and open data in these countries, and helps you examine whether the benefits of doing business with them outweigh the costs. While Japan and South Korea are highly developed countries with lofty Internet penetration rates, China, India, and Indonesia have enormous populations, relatively low Internet penetration, and enormous growth potential. But access to open data from fields such as healthcare, education, agriculture, transportation, energy, and finance—data vital for building businesses and services—varies from country to country. Each of them has a distinctive character reflecting its national priorities. To help you assess risk vs opportunity in the Asian market, author Franklin Lu reviews these five countries individually to reveal the nature of data privacy laws, open data initiatives, and existing businesses.

Big Data for Chimps

Finding patterns in massive event streams can be difficult, but learning how to find them doesn’t have to be. This unique hands-on guide shows you how to solve this and many other problems in large-scale data processing with simple, fun, and elegant tools that leverage Apache Hadoop. You’ll gain a practical, actionable view of big data by working with real data and real problems. Perfect for beginners, this book’s approach will also appeal to experienced practitioners who want to brush up on their skills. Part I explains how Hadoop and MapReduce work, while Part II covers many analytic patterns you can use to process any data. As you work through several exercises, you’ll also learn how to use Apache Pig to process data. Learn the necessary mechanics of working with Hadoop, including how data and computation move around the cluster Dive into map/reduce mechanics and build your first map/reduce job in Python Understand how to run chains of map/reduce jobs in the form of Pig scripts Use a real-world dataset—baseball performance statistics—throughout the book Work with examples of several analytic patterns, and learn when and where you might use them

Data Analysis in the Cloud

Data Analysis in the Cloud introduces and discusses models, methods, techniques, and systems to analyze the large number of digital data sources available on the Internet using the computing and storage facilities of the cloud. Coverage includes scalable data mining and knowledge discovery techniques together with cloud computing concepts, models, and systems. Specific sections focus on map-reduce and NoSQL models. The book also includes techniques for conducting high-performance distributed analysis of large data on clouds. Finally, the book examines research trends such as Big Data pervasive computing, data-intensive exascale computing, and massive social network analysis. Introduces data analysis techniques and cloud computing concepts Describes cloud-based models and systems for Big Data analytics Provides examples of the state-of-the-art in cloud data analysis Explains how to develop large-scale data mining applications on clouds Outlines the main research trends in the area of scalable Big Data analysis

SAP in 24 Hours, Sams Teach Yourself, Fifth Edition

Thoroughly updated and expanded! Includes new coverage on HANA, the cloud, and using SAP’s applications! In just 24 sessions of one hour or less, you’ll get up and running with the latest SAP technologies, applications, and solutions. Using a straightforward, step-by-step approach, each lesson strengthens your understanding of SAP from both a business and technical perspective, helping you gain practical mastery from the ground up on topics such as security, governance, validations, release management, SLA, and legal issues. Step-by-step instructions carefully walk you through the most common questions, issues, and tasks. Quizzes and exercises help you build and test your knowledge. Notes present interesting pieces of information. Tips offer advice or teach an easier way to do something. Cautions advise you about potential problems and help you steer clear of disaster. Learn how to… Understand SAP terminology, concepts, and solutions Install SAP on premises or in the cloud Master SAP’s revamped user interface Discover how and when to use in-memory HANA databases Integrate SAP Software as a Service (SaaS) solutions such as Ariba, Successfactors, Fieldglass, and hybris Find resources at SAP’s Service Marketplace, Developer Network, and Help Portal Avoid pitfalls in SAP project implementation, migration, and upgrades Discover how SAP fits with mobile devices, social media, big data, and the Internet of Things Start or accelerate your career working with SAP technologies

Beginning Big Data with Power BI and Excel 2013

In Beginning Big Data with Power BI and Excel 2013, you will learn to solve business problems by tapping the power of Microsoft’s Excel and Power BI to import data from NoSQL and SQL databases and other sources, create relational data models, and analyze business problems through sophisticated dashboards and data-driven maps. While Beginning Big Data with Power BI and Excel 2013 covers prominent tools such as Hadoop and the NoSQL databases, it recognizes that most small and medium-sized businesses don’t have the Big Data processing needs of a Netflix, Target, or Facebook. Instead, it shows how to import data and use the self-service analytics available in Excel with Power BI. As you’ll see through the book’s numerous case examples, these tools—which you already know how to use—can perform many of the same functions as the higher-end Apache tools many people believe are required to carry out in Big Data projects. Through instruction, insight, advice, and case studies, Beginning Big Data with Power BI and Excel 2013 will show you how to: Import and mash up data from web pages, SQL and NoSQL databases, the Azure Marketplace and other sources. Tap into the analytical power of PivotTables and PivotCharts and develop relational data models to track trends and make predictions based on a wide range of data. Understand basic statistics and use Excel with PowerBI to do sophisticated statistical analysis—including identifying trends and correlations. Use SQL within Excel to do sophisticated queries across multiple tables, including NoSQL databases. Create complex formulas to solve real-world business problems using Data Analysis Expressions (DAX).

Getting Data Right

Over the last 20 years, companies have invested roughly $3-4 trillion in enterprise software. These investments have been primarily focused on the development and deployment of single systems, applications, functions, and geographies targeted at the automation and optimization of key business processes. Companies are now investing heavily in big data analytics ($44 billion alone in 2014) in an effort to begin analyzing all of the data being generated from their process automation systems. But companies are quickly realizing that one of their key bottlenecks is Data Variety—the silo’d nature of the data that is a natural result of internal and external source proliferation. The problem of big data variety has crept up from the bottom—and the cost of variety is only appreciated when companies attempt to ask simple questions across many business silos (divisions, geographies, functions, etc.). Current top-down, deterministic data unification approaches (such as ETL, ELT, and MDM) were simply not designed to scale to the variety of hundreds or thousands or even tens of thousands of data silos. Download this free eBook to learn about the fundamental challenges that Data Variety poses to enterprises looking to maximize the value of their existing investments—and how new approaches promise to help organizations embrace and leverage the fundamental diversity of data. Readers will also find best practices for designing bottom-up and probabilistic methods for finding and managing data; principles for doing data science at scale in the big data era; preparing and unifying data in ways that complement existing systems; optimizing data warehousing; and how to use “data ops” to automate large-scale integration.

Managing the Data Lake

Organizations across many industries have recently created fast-growing repositories to deal with an influx of new data from many sources and often in multiple formats. To manage these data lakes, companies have begun to leave the familiar confines of relational databases and data warehouses for Hadoop and various big data solutions. But adopting new technology alone won’t solve the problem. Based on interviews with several experts in data management, author Andy Oram provides an in-depth look at common issues you’re likely to encounter as you consider how to manage business data. You’ll explore five key topic areas, including: Acquisition and ingestion: how to solve these problems with a degree of automation. Metadata: how to keep track of when data came in and how it was formatted, and how to make it available at later stages of processing. Data preparation and cleaning: what you need to know before you prepare and clean your data, and what needs to be cleaned up and how. Organizing workflows: what you should do to combine your tasks—ingestion, cataloging, and data preparation—into an end-to-end workflow. Access control: how to address security and access controls at all stages of data handling. Andy Oram, an editor at O’Reilly Media since 1992, currently specializes in programming. His work for O'Reilly includes the first books on Linux ever published commercially in the United States.

Mapping Big Data

To discover the shape and structure of the big data market, the San Francisco-based startup Relato took a unique approach to market research and created the first fully data-driven market report. Company CEO Russell Jurney and his team collected and analyzed raw data from a variety of sources to reveal a boatload of business insights about the big data space. This exceptional report is now available for free download. Using data analytic techniques such as social network analysis (SNA), Relato exposed the vast and complex partnership network that exists among tens of thousands of unique big data vendors. The dataset Relato collected is centered around Cloudera, Hortonworks, and MapR, the major platform vendors of Hadoop, the primary force behind this market. From this snowball sample, a 2-hop network, the Relato team was able to answer several questions, including: Who are the major players in the big data market? Which is the leading Hadoop vendor? What sectors are included in this market and how do they relate? Which among the thousands of partnerships are most important? Who’s doing business with whom? Metrics used in this report are also visible in Relato’s interactive web application, via a link in the report, which walks you through the insights step-by-step.

Sharing Big Data Safely

Many big data-driven companies today are moving to protect certain types of data against intrusion, leaks, or unauthorized eyes. But how do you lock down data while granting access to people who need to see it? In this practical book, authors Ted Dunning and Ellen Friedman offer two novel and practical solutions that you can implement right away.

Apache Spark Graph Processing

Dive into the world of large-scale graph data processing with Apache Spark's GraphX API. This book introduces you to the core concepts of graph analytics and teaches you how to leverage Spark for handling and analyzing massive graphs. From building to analyzing, you'll acquire a comprehensive skillset to work with graph data efficiently. What this Book will help me do Learn to utilize Apache Spark GraphX API to process and analyze graph data. Master transforming raw datasets into sophisticated graph structures. Explore visualization and analysis techniques for understanding graphs. Understand and build custom graph operations tailored to your needs. Implement advanced graph algorithms like clustering and iterative processing. Author(s) Rindra Ramamonjison is a seasoned data engineer with vast experience in big data technologies and graph processing. With a passion for explaining complex concepts in simple terms, Rindra builds on his professional expertise to guide readers in mastering cutting-edge Spark tools. Who is it for? This book is tailored for data scientists and software developers looking to delve into graph data processing at scale. Ideal for those with basic knowledge of Scala and Apache Spark, it equips readers with the tools and techniques to derive insights from complex network datasets. Whether you're diving deeper into big data or exploring graph-specific analytics, this book is your guide.

Statistics for Big Data For Dummies

The fast and easy way to make sense of statistics for big data Does the subject of data analysis make you dizzy? You've come to the right place! Statistics For Big Data For Dummies breaks this often-overwhelming subject down into easily digestible parts, offering new and aspiring data analysts the foundation they need to be successful in the field. Inside, you'll find an easy-to-follow introduction to exploratory data analysis, the lowdown on collecting, cleaning, and organizing data, everything you need to know about interpreting data using common software and programming languages, plain-English explanations of how to make sense of data in the real world, and much more. Data has never been easier to come by, and the tools students and professionals need to enter the world of big data are based on applied statistics. While the word "statistics" alone can evoke feelings of anxiety in even the most confident student or professional, it doesn't have to. Written in the familiar and friendly tone that has defined the For Dummies brand for more than twenty years, Statistics For Big Data For Dummies takes the intimidation out of the subject, offering clear explanations and tons of step-by-step instruction to help you make sense of data mining—without losing your cool. Helps you to identify valid, useful, and understandable patterns in data Provides guidance on extracting previously unknown information from large databases Shows you how to discover patterns available in big data Gives you access to the latest tools and techniques for working in big data If you're a student enrolled in a related Applied Statistics course or a professional looking to expand your skillset, Statistics For Big Data For Dummies gives you access to everything you need to succeed.