talk-data.com talk-data.com

Topic

Big Data

data_processing analytics large_datasets

1217

tagged

Activity Trend

28 peak/qtr
2020-Q1 2026-Q1

Activities

1217 activities · Newest first

Apache Hadoop™ YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop™ 2

“This book is a critically needed resource for the newly released Apache Hadoop 2.0, highlighting YARN as the significant breakthrough that broadens Hadoop beyond the MapReduce paradigm.” —From the Foreword by Raymie Stata, CEO of Altiscale The Insider’s Guide to Building Distributed, Big Data Applications with Apache Hadoop™ YARN Apache Hadoop is helping drive the Big Data revolution. Now, its data processing has been completely overhauled: Apache Hadoop YARN provides resource management at data center scale and easier ways to create distributed applications that process petabytes of data. And now in two Hadoop technical leaders show you how to develop new applications and adapt existing code to fully leverage these revolutionary advances. Apache Hadoop™ YARN, YARN project founder Arun Murthy and project lead Vinod Kumar Vavilapalli demonstrate how YARN increases scalability and cluster utilization, enables new programming models and services, and opens new options beyond Java and batch processing. They walk you through the entire YARN project lifecycle, from installation through deployment. You’ll find many examples drawn from the authors’ cutting-edge experience—first as Hadoop’s earliest developers and implementers at Yahoo! and now as Hortonworks developers moving the platform forward and helping customers succeed with it. Coverage includes YARN’s goals, design, architecture, and components—how it expands the Apache Hadoop ecosystem Exploring YARN on a single node Administering YARN clusters and Capacity Scheduler Running existing MapReduce applications Developing a large-scale clustered YARN application Discovering new open source frameworks that run under YARN

Microsoft Big Data Solutions

Tap the power of Big Data with Microsoft technologies Big Data is here, and Microsoft's new Big Data platform is a valuable tool to help your company get the very most out of it. This timely book shows you how to use HDInsight along with HortonWorks Data Platform for Windows to store, manage, analyze, and share Big Data throughout the enterprise. Focusing primarily on Microsoft and HortonWorks technologies but also covering open source tools, Microsoft Big Data Solutions explains best practices, covers on-premises and cloud-based solutions, and features valuable case studies. Best of all, it helps you integrate these new solutions with technologies you already know, such as SQL Server and Hadoop. Walks you through how to integrate Big Data solutions in your company using Microsoft's HDInsight Server, HortonWorks Data Platform for Windows, and open source tools Explores both on-premises and cloud-based solutions Shows how to store, manage, analyze, and share Big Data through the enterprise Covers topics such as Microsoft's approach to Big Data, installing and configuring HortonWorks Data Platform for Windows, integrating Big Data with SQL Server, visualizing data with Microsoft and HortonWorks BI tools, and more Helps you build and execute a Big Data plan Includes contributions from the Microsoft and HortonWorks Big Data product teams If you need a detailed roadmap for designing and implementing a fully deployed Big Data solution, you'll want Microsoft Big Data Solutions.

Optimizing Hadoop for MapReduce

"Optimizing Hadoop for MapReduce" is your comprehensive guide to getting the best performance out of your Hadoop-based big data processing jobs. With a focus on practical application rather than theory, this book delves into the nuances of MapReduce job design, execution, and optimization to help you harness the full power of this technology. What this Book will help me do Understand the internal workings of Hadoop MapReduce and how it executes jobs. Master key optimization techniques to improve Hadoop job efficiency and resource use. Learn advanced MapReduce programming concepts to handle complex data processing tasks. Analyze and monitor Hadoop job performance using practical tools and methods. Integrate best practices for scaling production workloads in a Hadoop cluster. Author(s) Khaled Tannir is a seasoned software engineer and an expert in distributed systems, big data, and cloud technologies. He has decades of experience designing and optimizing systems for high-performance data processing. Khaled's hands-on approach to explaining technical concepts ensures readers gain practical, applied knowledge that can be immediately implemented in real-world projects. Who is it for? This book is intended for developers, data engineers, and system architects who work with or are planning to work with Apache Hadoop. Ideal readers should have basic familiarity with Hadoop concepts and a foundational understanding of distributed systems. This book will benefit professionals looking to optimize their Hadoop-based applications or understand advanced usage of MapReduce. Whether you're aiming to improve your existing knowledge or implement high-performance data solutions, this book is tailored for you.

Big Data

Big Data is defined as "a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools". The challenges include capture, storage, search, sharing, analysis, and visualization." Big Data has always been a major challenge in geoinformatics as geospatial databases are inherently very large. This book will integrate in one single volume techniques and technologies for storing and managing very large geospatial databases and help developing new geoinformatics software and systems that involve very large databases.

2013 Data Science Salary Survey

What tools do successful data scientists and analysts use, and how much money do they make? We surveyed hundreds of attendees at the O'Reilly Strata Conferences in Santa Clara, California and New York to understand. Findings from the survey include: Average number of tools and median income for all respondents Distribution of responses by age, location, industry, and position Detailed analysis of tools used by respondents and correlation to their salaries - including by tool clusters (Hadoop, SQL/Excel, and other) Correlation of specialized big data tools usage and salary What tools should you be learning and using? Read this valuable report to gain insight from these potentially career-changing findings.

IBM InfoSphere Streams: Accelerating Deployments with Analytic Accelerators

This IBM® Redbooks® publication describes visual development, visualization, adapters, analytics, and accelerators for IBM InfoSphere® Streams (V3), a key component of the IBM Big Data platform. Streams was designed to analyze data in motion, and can perform analysis on incredibly high volumes with high velocity, using a wide variety of analytic functions and data types. The Visual Development environment extends Streams Studio with drag-and-drop development, provides round tripping with existing text editors, and is ideal for rapid prototyping. Adapters facilitate getting data in and out of Streams, and V3 supports WebSphere MQ, Apache Hadoop Distributed File System, and IBM InfoSphere DataStage. Significant analytics include the native Streams Processing Language, SPSS Modeler analytics, Complex Event Processing, TimeSeries Toolkit for machine learning and predictive analytics, Geospatial Toolkit for location-based applications, and Annotation Query Language for natural language processing applications. Accelerators for Social Media Analysis and Telecommunications Event Data Analysis sample programs can be modified to build production level applications. Want to learn how to analyze high volumes of streaming data or implement systems requiring high performance across nodes in a cluster? Then this book is for you. Please note that the additional material referenced in the text is not available from IBM.

Ask, Measure, Learn

You can measure practically anything in the age of social media, but if you don’t know what you’re looking for, collecting mountains of data won’t yield a grain of insight. This non-technical guide shows you how to extract significant business value from big data with Ask-Measure-Learn, a system that helps you ask the right questions, measure the right data, and then learn from the results. Authors Lutz Finger and Soumitra Dutta originally devised this system to help governments and NGOs sift through volumes of data. With this book, these two experts provide business managers and analysts with a high-level overview of the Ask-Measure-Learn system, and demonstrate specific ways to apply social media analytics to marketing, sales, public relations, and customer management, using examples and case studies.

Data Visualization For Dummies

A straightforward, full-color guide to showcasing data so your audience can see what you mean, not just read about it Big data is big news! Every company, industry, not-for-profit, and government agency wants and needs to analyze and leverage datasets that can quickly become ponderously large. Data visualization software enables different industries to present information in ways that are memorable and relevant to their mission. This full-color guide introduces you to a variety of ways to handle and synthesize data in much more interesting ways than mere columns and rows of numbers. Learn meaningful ways to show trending and relationships, how to convey complex data in a clear, concise diagram, ways to create eye-catching visualizations, and much more! Effective data analysis involves learning how to synthesize data, especially big data, into a story and present that story in a way that resonates with the audience This full-color guide shows you how to analyze large amounts of data, communicate complex data in a meaningful way, and quickly slice data into various views Explains how to automate redundant reporting and analyses, create eye-catching visualizations, and use statistical graphics and thematic cartography Enables you to present vast amounts of data in ways that won't overwhelm your audience Part technical manual and part analytical guidebook, Data Visualization For Dummies is the perfect tool for transforming dull tables and charts into high-impact visuals your audience will notice...and remember.

Business Analytics

This book explains how to use business analytics to sort through an ever-increasing amount of data and improve the decision-making capabilities of an organization. Covering the key areas of business analytics, the book explores the concepts, techniques, applications, and emerging trends that professionals across a wide range of industries need to be aware of. It also examines legal and privacy issues and explores social media in analytics. With this book, readers can develop the understanding required to use Big Data and high-performance computing in complex environments to improve strategic decision making.

Data Just Right: Introduction to Large-Scale Data & Analytics

Making Big Data Work: Real-World Use Cases and Examples, Practical Code, Detailed Solutions Large-scale data analysis is now vitally important to virtually every business. Mobile and social technologies are generating massive datasets; distributed cloud computing offers the resources to store and analyze them; and professionals have radically new technologies at their command, including NoSQL databases. Until now, however, most books on “Big Data” have been little more than business polemics or product catalogs. is different: It’s a completely practical and indispensable guide for every Big Data decision-maker, implementer, and strategist. Data Just Right Michael Manoochehri, a former Google engineer and data hacker, writes for professionals who need practical solutions that can be implemented with limited resources and time. Drawing on his extensive experience, he helps you focus on building applications, rather than infrastructure, because that’s where you can derive the most value. Manoochehri shows how to address each of today’s key Big Data use cases in a cost-effective way by combining technologies in hybrid solutions. You’ll find expert approaches to managing massive datasets, visualizing data, building data pipelines and dashboards, choosing tools for statistical analysis, and more. Throughout, the author demonstrates techniques using many of today’s leading data analysis tools, including Hadoop, Hive, Shark, R, Apache Pig, Mahout, and Google BigQuery. Coverage includes Mastering the four guiding principles of Big Data success—and avoiding common pitfalls Emphasizing collaboration and avoiding problems with siloed data Hosting and sharing multi-terabyte datasets efficiently and economically “Building for infinity” to support rapid growth Developing a NoSQL Web app with Redis to collect crowd-sourced data Running distributed queries over massive datasets with Hadoop, Hive, and Shark Building a data dashboard with Google BigQuery Exploring large datasets with advanced visualization Implementing efficient pipelines for transforming immense amounts of data Automating complex processing with Apache Pig and the Cascading Java library Applying machine learning to classify, recommend, and predict incoming information Using R to perform statistical analysis on massive datasets Building highly efficient analytics workflows with Python and Pandas Establishing sensible purchasing strategies: when to build, buy, or outsource Previewing emerging trends and convergences in scalable data technologies and the evolving role of the Data Scientist

Big Data Application Architecture Q&A: A Problem - Solution Approach

Big Data Application Architecture Pattern Recipes provides an insight into heterogeneous infrastructures, databases, and visualization and analytics tools used for realizing the architectures of big data solutions. Its problem-solution approach helps in selecting the right architecture to solve the problem at hand. In the process of reading through these problems, you will learn harness the power of new big data opportunities which various enterprises use to attain real-time profits. Big Data Application Architecture Pattern Recipes answers one of the most critical questions of this time 'how do you select the best end-to-end architecture to solve your big data problem?'. The book deals with various mission critical problems encountered by solution architects, consultants, and software architects while dealing with the myriad options available for implementing a typical solution, trying to extract insight from huge volumes of data in real-time and across multiple relational and non-relational data types for clients from industries like retail, telecommunication, banking, and insurance. The patterns in this book provide the strong architectural foundation required to launch your next big data application. The architectures for realizing these opportunities are based on relatively less expensive and heterogeneous infrastructures compared to the traditional monolithic and hugely expensive options that exist currently. This book describes and evaluates the benefits of heterogeneity which brings with it multiple options of solving the same problem, evaluation of trade-offs and validation of 'fitness-for-purpose' of the solution. What you'll learn Major considerations in building a big data solution Big data application architectures problems for specific industries What are the components one needs to build and end-to-end big data solution? Does one really need a real-time big data solution or an off-line analytics batch solution? What are the operations and support architectures for a big data solution? What are the scalability considerations, and options for a Hadoop installation? Who this book is for CIOs, CTOs, enterprise architects, and software architects Consultants, solution architects, and information management (IM) analysts who want to architect a big data solution for their enterprise

Oracle NoSQL Database

Master Oracle NoSQL Database Enable highly reliable, scalable, and available data. Oracle NoSQL Database: Real-Time Big Data Management for the Enterprise shows you how to take full advantage of this cost-effective solution for storing, retrieving, and updating high-volume, unstructured data. The book covers installation, configuration, application development, capacity planning and sizing, and integration with other enterprise data center products. Real-world examples illustrate the concepts presented in this Oracle Press guide. Understand Oracle NoSQL Database architecture and the underlying data storage engine, Oracle Berkeley DB Install and configure Oracle NoSQL Database for optimal performance Develop complex, distributed applications using a rich set of APIs Read and write data into the Oracle NoSQL Database key-value store Apply an Avro schema to the value portion of the key-value pair using Avro bindings Learn best practices for capacity planning and sizing an enterpriselevel Oracle NoSQL Database deployment Integrate Oracle NoSQL Database with Oracle Database, Oracle Event Processing, and Hadoop Code examples from the book are available for download at www.OraclePressBooks.com.

Big Data Computing

Novel approaches and tools have emerged to tackle the challenges of Big Data. Moreover, the technology required for Big Data computing is developing at a satisfactory rate due to market forces and technological evolution. This book presents a mix of theory and industry cases that discuss the technical and practical issues related to Big Data in intelligent information management. It emphasizes the adoption and diffusion of Big Data tools and technologies in real practical applications.

Oracle® 12c For Dummies®

Demystifying the power of the Oracle 12c database The Oracle database is the industry-leading relational database management system (RDMS) used from small companies to the world's largest enterprises alike for their most critical business and analytical processing. Oracle 12c includes industry leading enhancements to enable cloud computing and empowers users to manage both Big Data and traditional data structures faster and cheaper than ever before. Oracle 12c For Dummies is the perfect guide for a novice database administrator or an Oracle DBA who is new to Oracle 12c. The book covers what you need to know about Oracle 12c architecture, software tools, and how to successfully manage Oracle databases in the real world. Highlights the important features of Oracle 12c Explains how to create, populate, protect, tune, and troubleshoot a new Oracle database Covers advanced Oracle 12c technologies including Oracle Multitenant—the “pluggable database” concept—as well as several other key changes in this release Make the most of Oracle 12c's improved efficiency, stronger security, and simplified management capabilities with Oracle 12c For Dummies.

Big Learning Data

In today’s wired world, we interact with millions of pieces of information every day. Capturing that information and making sense of it is the revolutionary impact of big data on business—and on learning. Thought leader Elliott Masie and Learning CONSORTIUM Members bring a powerful new book to the T&D profession. They provide a SWOT analysis of big data and implications for learning and development professionals. Big learning data is at your fingertips. You need to know why it matters. Find out where to start with big learning data. Think differently about the data you have. Understand the risks that come with big data. Solve problems using the new perspectives and measurement support that big learning data provides.

Securing Hadoop

"Securing Hadoop" provides a comprehensive guide to implementing and understanding security within a Hadoop-based Big Data ecosystem. The book explores key topics like authentication, authorization, and data encryption, ensuring you gain practical insights on how to protect sensitive information effectively and integrate security measures into your Hadoop platform. What this Book will help me do Understand the key security challenges associated with Hadoop and Big Data platforms. Learn how to implement Kerberos authentication and integrate it with Hadoop. Master the configuration of authorization mechanisms for a secure Hadoop ecosystem. Gain knowledge about security event auditing and monitoring techniques specifically for Hadoop. Get a detailed overview of tools and protocols to build and secure a Hadoop infrastructure effectively. Author(s) Sudheesh Narayan is an experienced professional in the fields of Hadoop and enterprise security. With years of expertise in designing and implementing secure distributed data platforms, Sudheesh brings practical insights and step-by-step solutions to Hadoop practitioners. His teaching approach is hands-on, ensuring readers can directly apply theoretical concepts to real-world scenarios. Who is it for? This book is ideal for Hadoop practitioners including solution architects, administrators, and developers seeking to enhance their understanding of security mechanisms for Hadoop. It assumes a foundational knowledge of Hadoop and requires familiarity with basic security concepts. Readers aiming to implement secure Hadoop systems for enterprise-level applications will find this book especially beneficial.

Successful Business Intelligence, Second Edition, 2nd Edition

Revised to cover new advances in business intelligence—big data, cloud, mobile, and more—this fully updated bestseller reveals the latest techniques to exploit BI for the highest ROI. “Cindi has created, with her typical attention to details that matter, a contemporary forward-looking guide that organizations could use to evaluate existing or create a foundation for evolving business intelligence / analytics programs. The book touches on strategy, value, people, process, and technology, all of which must be considered for program success. Among other topics, the data, data warehousing, and ROI comments were spot on. The ‘technobabble’ chapter was brilliant!” — Bill Frank, Business Intelligence and Data Warehousing Program Manager, Johnson & Johnson “If you want to be an analytical competitor, you’ve got to go well beyond business intelligence technology. Cindi Howson has wrapped up the needed advice on technology, organization, strategy, and even culture in a neat package. It’s required reading for quantitatively oriented strategists and the technologists who support them.” — Thomas H. Davenport, President’s Distinguished Professor, Babson College and co-author, Competing on Analytics “Cindi has created an exceptional, authoritative description of the end-to-end business intelligence ecosystem. This is a great read for those who are just trying to better understand the business intelligence space, as well as for the seasoned BI practitioner.” — Sully McConnell, Vice President, Business Intelligence and Information Management, Time Warner Cable “Cindi’s book succinctly yet completely lays out what it takes to deliver BI successfully. IT and business leaders will benefit from Cindi’s deep BI experience, which she shares through helpful, real-world definitions, frameworks, examples, and stories. This is a must-read for companies engaged in – or considering – BI.” — Barbara Wixom, PhD, Principal Research Scientist, MIT Sloan Center for Information Systems Research Expanded to cover the latest advances in business intelligence such as big data, cloud, mobile, visual data discovery, and in-memory computing, this fully updated bestseller by BI guru Cindi Howson provides cutting-edge techniques to exploit BI for maximum value. Successful Business Intelligence: Unlock the Value of BI & Big Data, Second Edition describes best practices for an effective BI strategy. Find out how to: Garner executive support to foster an analytic culture Align the BI strategy with business goals Develop an analytic ecosystem to exploit data warehousing, analytic appliances, and Hadoop for the right BI workload Continuously improve the quality, breadth, and timeliness of data Find the relevance of BI for everyone in the company Use agile development processes to deliver BI capabilities and improvements at the pace of business change Select the right BI tools to meet user and business needs Measure success in multiple ways Embrace innovation, promote successes and applications, and invest in training Monitor your evolution and maturity across various factors for impact Exclusive industry survey data and real-world case studies from Medtronic, Macy’s, 1-800 CONTACTS, The Dow Chemical Company, Netflix, Constant Contact, and other companies show successful BI initiatives in action. From Moneyball to Nate Silver, BI and big data have permeated our cultural, political, and economic landscape. This timely, up-to-date guide reveals how to plan and deploy an agile, state-of-the-art BI solution that links insight to action and delivers a sustained competitive advantage.

The Definitive Guide to MongoDB: A complete guide to dealing with Big Data using MongoDB, Second Edition

The Definitive Guide to MongoDB, Second Edition, is updated for the latest version and includes all of the latest MongoDB features, including the aggregation framework introduced in version 2.2 and hashed indexes in version 2.4. MongoDB is the most popular of the "Big Data" NoSQL database technologies, and it's still growing. David Hows from 10gen, along with experienced MongoDB authors Peter Membrey and Eelco Plugge, provide their expertise and experience in teaching you everything you need to know to become a MongoDB pro. The Definitive Guide to MongoDB, Second Edition, starts with the basics, including how to install on Windows, Linux, and OS X, and how MongoDB handles your data. Then you'll learn how to develop with MongoDB with both PHP and Python, including an example application using a PHP driver to create a blog application. Finally, you'll dig into more advanced but extremely important MongoDB features, including optimization, replication, and sharding -- load-balancing that makes MongoDB ideal for dealing with Big Data. If you're dealing with data, MongoDB should be on your must-learn list. The Definitive Guide to MongoDB, Second Edition, is just the book you need. What you'll learn Set up MongoDB on all major server platforms, including Windows, Linux, OS X, and cloud platforms like Rackspace, Azure, and Amazon EC2 Work with GridFS and the new aggregation framework Work with your data using non-SQL commands Write applications using either PHP or Python Optimize MongoDB Master MongoDB administration, including replication, replication tagging, and tag-aware sharding Who this book is for Database admins and developers who need to get up to speed on MongoDB and its Big Data, NoSQL approach to dealing with data management.