talk-data.com talk-data.com

Topic

Big Data

data_processing analytics large_datasets

1217

tagged

Activity Trend

28 peak/qtr
2020-Q1 2026-Q1

Activities

1217 activities · Newest first

Real-World Hadoop

If you’re a business team leader, CIO, business analyst, or developer interested in how Apache Hadoop and Apache HBase-related technologies can address problems involving large-scale data in cost-effective ways, this book is for you. Using real-world stories and situations, authors Ted Dunning and Ellen Friedman show Hadoop newcomers and seasoned users alike how NoSQL databases and Hadoop can solve a variety of business and research issues. You’ll learn about early decisions and pre-planning that can make the process easier and more productive. If you’re already using these technologies, you’ll discover ways to gain the full range of benefits possible with Hadoop. While you don’t need a deep technical background to get started, this book does provide expert guidance to help managers, architects, and practitioners succeed with their Hadoop projects. Examine a day in the life of big data: India’s ambitious Aadhaar project Review tools in the Hadoop ecosystem such as Apache’s Spark, Storm, and Drill to learn how they can help you Pick up a collection of technical and strategic tips that have helped others succeed with Hadoop Learn from several prototypical Hadoop use cases, based on how organizations have actually applied the technology Explore real-world stories that reveal how MapR customers combine use cases when putting Hadoop and NoSQL to work, including in production Ted Dunning is Chief Applications Architect at MapR Technologies, and committer and PMC member of the Apache’s Drill, Storm, Mahout, and ZooKeeper projects. He is also mentor for Apache’s Datafu, Kylin, Zeppelin, Calcite, and Samoa projects. Ellen Friedman is a solutions consultant, speaker, and author, writing mainly about big data topics. She is a committer for the Apache Mahout project and a contributor to the Apache Drill project.

Storm Applied

Storm Applied is a practical guide to using Apache Storm for the real-world tasks associated with processing and analyzing real-time data streams. This immediately useful book starts by building a solid foundation of Storm essentials so that you learn how to think about designing Storm solutions the right way from day one. But it quickly dives into real-world case studies that will bring the novice up to speed with productionizing Storm. About the Technology It's hard to make sense out of data when it's coming at you fast. Like Hadoop, Storm processes large amounts of data but it does it reliably and in real time, guaranteeing that every message will be processed. Storm allows you to scale with your data as it grows, making it an excellent platform to solve your big data problems. About the Book Storm Applied is an example-driven guide to processing and analyzing real-time data streams. This immediately useful book starts by teaching you how to design Storm solutions the right way. Then, it quickly dives into real-world case studies that show you how to scale a high-throughput stream processor, ensure smooth operation within a production cluster, and more. Along the way, you'll learn to use Trident for stateful stream processing, along with other tools from the Storm ecosystem. What's Inside Mapping real problems to Storm components Performance tuning and scaling Practical troubleshooting and debugging Exactly-once processing with Trident About the Reader This book moves through the basics quickly. While prior experience with Storm is not assumed, some experience with big data and real-time systems is helpful. About the Authors Sean Allen, Matthew Jankowski, and Peter Pathirana lead the development team for a high-volume, search-intensive commercial web application at TheLadders. Quotes Will no doubt become the definitive practitioner’s guide for Storm users. - From the Foreword by Andrew Montalenti The book’s practical approach to Storm will save you a lot of hassle and a lot of time. - Tanguy Leroux, Elasticsearch Great introduction to distributed computing with lots of real-world examples. - Shay Elkin, Tangent Logic Go beyond the MapReduce way of thinking to solve big data problems. - Muthusamy Manigandan, OzoneMedia

From Big Data to Smart Data

A pragmatic approach to Big Data by taking the reader on a journey between Big Data (what it is) and the Smart Data (what it is for). Today's decision making can be reached via information (related to the data), knowledge (related to people and processes), and timing (the capacity to decide, act and react at the right time). The huge increase in volume of data traffic, and its format (unstructured data such as blogs, logs, and video) generated by the "digitalization" of our world modifies radically our relationship to the space (in motion) and time, dimension and by capillarity, the enterprise vision of performance monitoring and optimization.

Mastering Apache Cassandra - Second Edition

Mastering Apache Cassandra - Second Edition is your comprehensive guide to understanding and utilizing the power of Cassandra, an efficient and scalable NoSQL database. Throughout this book, you will learn how to design, deploy, and manage Cassandra databases effectively, tailored to your application's needs. What this Book will help me do Understand the architecture of Apache Cassandra and how it ensures scalability and reliability. Learn to build, configure, and deploy a Cassandra database cluster for high performance. Develop skills in monitoring and tuning Cassandra clusters for optimal operation. Gain expertise in managing clusters through scaling, node repair, and backup strategies. Integrate Apache Cassandra with other tools and your application seamlessly. Author(s) Nishant Neeraj is an experienced software developer and database engineer with a focus on delivering high-performance solutions. They have extensive hands-on experience with NoSQL databases, especially Apache Cassandra, and bring their practical insights and in-depth technical knowledge to this book to help readers tackle real-world challenges. Who is it for? This book is ideal for intermediate developers aiming to enhance their expertise in NoSQL databases. If you have a foundational understanding of database concepts and want to bring your skills to a professional level by mastering Apache Cassandra for modern applications, this book is perfect for you. It provides actionable insights and guidance suitable for professionals tackling high concurrency and big data challenges. Whether you are a developer, database administrator, or architect, this book provides a targeted deep dive into Cassandra.

Social Big Data Mining

This book focuses on the basic concepts and the related technologies of data mining for social medial. Topics include: big data and social data, data mining for making a hypothesis, multivariate analysis for verifying the hypothesis, web mining and media mining, natural language processing, social big data applications, and scalability. It explains analytical techniques such as modeling, data mining, and multivariate analysis for social big data. This book is different from other similar books in that presents the overall picture of social big data from fundamental concepts to applications while standing on academic bases.

Big Data

Convert the promise of big data into real world results There is so much buzz around big data. We all need to know what it is and how it works - that much is obvious. But is a basic understanding of the theory enough to hold your own in strategy meetings? Probably. But what will set you apart from the rest is actually knowing how to USE big data to get solid, real-world business results - and putting that in place to improve performance. Big Data will give you a clear understanding, blueprint, and step-by-step approach to building your own big data strategy. This is a well-needed practical introduction to actually putting the topic into practice. Illustrated with numerous real-world examples from a cross section of companies and organisations, Big Data will take you through the five steps of the SMART model: Start with Strategy, Measure Metrics and Data, Apply Analytics, Report Results, Transform. Discusses how companies need to clearly define what it is they need to know Outlines how companies can collect relevant data and measure the metrics that will help them answer their most important business questions Addresses how the results of big data analytics can be visualised and communicated to ensure key decisions-makers understand them Includes many high-profile case studies from the author's work with some of the world's best known brands

Data Science For Dummies

Discover how data science can help you gain in-depth insight into your business - the easy way! Jobs in data science abound, but few people have the data science skills needed to fill these increasingly important roles in organizations. Data Science For Dummies is the perfect starting point for IT professionals and students interested in making sense of their organization's massive data sets and applying their findings to real-world business scenarios. From uncovering rich data sources to managing large amounts of data within hardware and software limitations, ensuring consistency in reporting, merging various data sources, and beyond, you'll develop the know-how you need to effectively interpret data and tell a story that can be understood by anyone in your organization. Provides a background in data science fundamentals before moving on to working with relational databases and unstructured data and preparing your data for analysis Details different data visualization techniques that can be used to showcase and summarize your data Explains both supervised and unsupervised machine learning, including regression, model validation, and clustering techniques Includes coverage of big data processing tools like MapReduce, Hadoop, Dremel, Storm, and Spark It's a big, big data world out there - let Data Science For Dummies help you harness its power and gain a competitive edge for your organization.

podcast_episode
by Val Kroll , Julie Hoyer , Tim Wilson (Analytics Power Hour - Columbus (OH) , Moe Kiss (Canva) , Michael Helbling (Search Discovery)

The power of big data is a curious thing, Make a one man weep, make another man sing. Change a hawk to a little white dove. More than a feeling that's the power of big data. As always, Huey Lewis hits the nail on the head with this complex topic. What does the phrase actually mean? How can my company take advantage of it? Michael, Tim and Jim take on big data in episode five, and try to focus in on making this hard to pin down concept understandable and relevant. All this and more in one American hour, 46 Canadian minutes.

Big Data Revolution

Exploit the power and potential of Big Data to revolutionize business outcomes Big Data Revolution is a guide to improving performance, making better decisions, and transforming business through the effective use of Big Data. In this collaborative work by an IBM Vice President of Big Data Products and an Oxford Research Fellow, this book presents inside stories that demonstrate the power and potential of Big Data within the business realm. Readers are guided through tried-and-true methodologies for getting more out of data, and using it to the utmost advantage. This book describes the major trends emerging in the field, the pitfalls and triumphs being experienced, and the many considerations surrounding Big Data, all while guiding readers toward better decision making from the perspective of a data scientist. Companies are generating data faster than ever before, and managing that data has become a major challenge. With the right strategy, Big Data can be a powerful tool for creating effective business solutions – but deep understanding is key when applying it to individual business needs. Big Data Revolution provides the insight executives need to incorporate Big Data into a better business strategy, improving outcomes with innovation and efficient use of technology. Examine the major emerging patterns in Big Data Consider the debate surrounding the ethical use of data Recognize patterns and improve personal and organizational performance Make more informed decisions with quantifiable results In an information society, it is becoming increasingly important to make sense of data in an economically viable way. It can drive new revenue streams and give companies a competitive advantage, providing a way forward for businesses navigating an increasingly complex marketplace. Big Data Revolution provides expert insight on the tool that can revolutionize industries.

Field Guide to Hadoop

If your organization is about to enter the world of big data, you not only need to decide whether Apache Hadoop is the right platform to use, but also which of its many components are best suited to your task. This field guide makes the exercise manageable by breaking down the Hadoop ecosystem into short, digestible sections. You’ll quickly understand how Hadoop’s projects, subprojects, and related technologies work together. Each chapter introduces a different topic—such as core technologies or data transfer—and explains why certain components may or may not be useful for particular needs. When it comes to data, Hadoop is a whole new ballgame, but with this handy reference, you’ll have a good grasp of the playing field. Topics include: Core technologies—Hadoop Distributed File System (HDFS), MapReduce, YARN, and Spark Database and data management—Cassandra, HBase, MongoDB, and Hive Serialization—Avro, JSON, and Parquet Management and monitoring—Puppet, Chef, Zookeeper, and Oozie Analytic helpers—Pig, Mahout, and MLLib Data transfer—Scoop, Flume, distcp, and Storm Security, access control, auditing—Sentry, Kerberos, and Knox Cloud computing and virtualization—Serengeti, Docker, and Whirr

Apache Hive Essentials

Apache Hive Essentials is the perfect guide for understanding and mastering Hive, the SQL-like big data query language built on top of Hadoop. With this book, you will gain the skills to effectively use Hive to analyze and manage large data sets. Whether you're a developer, data analyst, or just curious about big data, this hands-on guide will enhance your capabilities. What this Book will help me do Understand the core concepts of Hive and its relation to big data and Hadoop. Learn how to set up a Hive environment and integrate it with Hadoop. Master the SQL-like query functionalities of Hive to select, manipulate, and analyze data. Develop custom functions in Hive to extend its functionality for your own specific use cases. Discover best practices for optimizing Hive performance and ensuring data security. Author(s) Dayong Du is an expert in big data analytics with extensive experience in implementing and using tools like Hive in professional settings. Having worked on practical big data solutions, Dayong brings a wealth of knowledge and insights to his writing. His clear, approachable style makes complex topics accessible to readers. Who is it for? This book is ideal for developers, data analysts, and data engineers looking to leverage Hive for big data analysis. If you are familiar with SQL and Hadoop basics and aim to enhance your understanding of Hive, this book is for you. Beginners with some programming background eager to dive into big data technologies will also benefit. It's tailored for learners wanting actionable knowledge to advance their data processing skills.

Apache Flume: Distributed Log Collection for Hadoop - Second Edition

"Apache Flume: Distributed Log Collection for Hadoop - Second Edition" is your hands-on guide to learning how to use Apache Flume to reliably collect and move logs and data streams into your Hadoop ecosystem. Through practical examples and real-world scenarios, this book will help you master the setup, configuration, and optimization of Flume for various data ingestion use cases. What this Book will help me do Understand the key concepts and architecture behind Apache Flume to build reliable and scalable data ingestion systems. Set up Flume agents to collect and transfer data into the Hadoop File System (HDFS) or other storage solutions effectively. Learn stream data processing techniques, such as filtering, transforming, and enriching data during transit to improve data usability. Integrate Flume with other tools like Elasticsearch and Solr to enhance analytics and search capabilities. Implement monitoring and troubleshooting workflows to maintain healthy and optimized Flume data pipelines. Author(s) Steven Hoffman, a seasoned software developer and data engineer, brings years of practical experience working with big data technologies to this book. He has a strong background in distributed systems and big data solutions, having implemented enterprise-scale analytics projects. Through clear and approachable writing, he aims to empower readers to successfully deploy reliable data pipelines using Apache Flume. Who is it for? This book is written for Hadoop developers, data engineers, and IT professionals who seek to build robust pipelines for streaming data into Hadoop environments. It is ideal for readers who have a basic understanding of Hadoop and HDFS but are new to Apache Flume. If you are looking to enhance your analytics capabilities by efficiently ingesting, routing, and processing streaming data, this book is for you. Beginners as well as experienced engineers looking to dive deeper into Flume will find it insightful.

Hadoop MapReduce v2 Cookbook - Second Edition

Explore insights from vast datasets with "Hadoop MapReduce v2 Cookbook - Second Edition." This book serves as a practical guide for developers and system administrators who aim to master big data processing using Hadoop v2. By engaging with its step-by-step recipes, you will learn to harness the Hadoop MapReduce ecosystem for scalable and efficient data solutions. What this Book will help me do Master the configuration and management of Hadoop YARN, MapReduce v2, and HDFS clusters. Integrate big data tools such as Hive, HBase, Pig, Mahout, and Nutch with Hadoop v2. Develop analytics solutions for large-scale datasets using MapReduce-based applications. Address specific challenges like data classification, recommendations, and text analytics leveraging Hadoop MapReduce. Deploy and manage big data clusters effectively, including options for cloud environments. Author(s) The authors behind "Hadoop MapReduce v2 Cookbook - Second Edition" combine their deep expertise in big data technology and years of experience working directly with Hadoop. They have helped numerous organizations implement scalable data processing solutions and are passionate about teaching others. Their approach ensures readers gain both foundational knowledge and practical skills. Who is it for? This book is perfect for developers and system administrators who want to learn Hadoop MapReduce v2, including configuring and managing big data clusters. Beginners with basic Java knowledge can follow along to advance their skills in big data processing. Ideal for those transitioning to Hadoop v2 or requiring practical recipes for immediate application. Great for professionals aiming to deepen their expertise in scalable data technologies.

NoSQL For Dummies

Get up to speed on the nuances of NoSQL databases and what they mean for your organization This easy to read guide to NoSQL databases provides the type of no-nonsense overview and analysis that you need to learn, including what NoSQL is and which database is right for you. Featuring specific evaluation criteria for NoSQL databases, along with a look into the pros and cons of the most popular options, NoSQL For Dummies provides the fastest and easiest way to dive into the details of this incredible technology. You'll gain an understanding of how to use NoSQL databases for mission-critical enterprise architectures and projects, and real-world examples reinforce the primary points to create an action-oriented resource for IT pros. If you're planning a big data project or platform, you probably already know you need to select a NoSQL database to complete your architecture. But with options flooding the market and updates and add-ons coming at a rapid pace, determining what you require now, and in the future, can be a tall task. This is where NoSQL For Dummies comes in! Learn the basic tenets of NoSQL databases and why they have come to the forefront as data has outpaced the capabilities of relational databases Discover major players among NoSQL databases, including Cassandra, MongoDB, MarkLogic, Neo4J, and others Get an in-depth look at the benefits and disadvantages of the wide variety of NoSQL database options Explore the needs of your organization as they relate to the capabilities of specific NoSQL databases Big data and Hadoop get all the attention, but when it comes down to it, NoSQL databases are the engines that power many big data analytics initiatives. With NoSQL For Dummies, you'll go beyond relational databases to ramp up your enterprise's data architecture in no time.

Data: Emerging Trends and Technologies

What are the emerging trends and technologies that will transform the data landscape in coming months? In this report from Strata + Hadoop World co-chair Alistair Croll, you'll learn how the ubiquity of cheap sensors, fast networks, and distributed computing have given rise to several developments that will soon have a profound effect on individuals and society as a whole. Machine learning, for example, has quickly moved from lab tool to hosted, pay-as-you-go services in the cloud. Those services, in turn, are leading to predictive apps that will provide individuals with the right functionality and content at the right time by continuously learning about them and predicting what they'll need. Computational power can produce cognitive augmentation. Report topics include: The swing between centralized and distributed computing Machine learning as a service Personal digital assistants and cognitive augmentation Graph databases and analytics Regulating complex algorithms The pace of real-time data and automation Solving dire problems with big data Implications of having sensors everywhere This report contains many more examples of how big data is starting to reshape business and change behavior, and it's just a small sample of the in-depth information Strata + Hadoop World provides. Pick up this report and make plans to attend one of several Strata + Hadoop World conferences in the San Francisco Bay Area, London, and New York.

Learning Hadoop 2

Delve into the world of big data with 'Learning Hadoop 2', a comprehensive guide to leveraging the capabilities of Hadoop 2 for data processing and analysis. In this book, you will explore the tools and frameworks that integrate with Hadoop, discovering the best ways to design and deploy effective workflows for managing and analyzing large datasets. What this Book will help me do Understand the fundamentals of the MapReduce framework and its applications. Utilize advanced tools such as Samza and Spark for real-time and iterative data processing. Manage large datasets with data mining techniques tailored for Hadoop environments. Deploy Hadoop applications across various infrastructures, including local clusters and cloud services. Create and orchestrate sophisticated data workflows and pipelines with Apache Pig and Oozie. Author(s) Gabriele Modena is an experienced developer and trained data specialist with a keen focus on distributed data processing frameworks. Having worked extensively with big data platforms, Gabriele brings practical insights and a hands-on perspective to technical subjects. His writing is concise and engaging, aiming to render complex concepts accessible. Who is it for? This book is ideal for system and application developers eager to learn practical implementations of the Hadoop framework. Readers should be familiar with the Unix/Linux command-line interface and Java programming. Prior experience with Hadoop will be advantageous, but not necessary.

Big Data Analytics

With this book, managers and decision makers are given the tools to make more informed decisions about big data purchasing initiatives. Big Data Analytics: A Practical Guide for Managers not only supplies descriptions of common tools, but also surveys the various products and vendors that supply the big data market. Comparing and contrasting the different types of analysis commonly conducted with big data, this accessible reference presents clear-cut explanations of the general workings of big data tools. Instead of spending time on HOW to install specific packages, it focuses on the reasons WHY readers would install a given package. The book provides authoritative guidance on a range of tools, including open source and proprietary systems. It details the strengths and weaknesses of incorporating big data analysis into decision-making and explains how to leverage the strengths while mitigating the weaknesses. Describes the benefits of distributed computing in simple terms Includes substantial vendor/tool material, especially for open source decisions Covers prominent software packages, including Hadoop and Oracle Endeca Examines GIS and machine learning applications Considers privacy and surveillance issues The book further explores basic statistical concepts that, when misapplied, can be the source of errors. Time and again, big data is treated as an oracle that discovers results nobody would have imagined. While big data can serve this valuable function, all too often these results are incorrect, yet are still reported unquestioningly. The probability of having erroneous results increases as a larger number of variables are compared unless preventative measures are taken. The approach taken by the authors is to explain these concepts so managers can ask better questions of their analysts and vendors as to the appropriateness of the methods used to arrive at a conclusion. Because the world of science and medicine has been grappling with similar issues in the publication of studies, the authors draw on their efforts and apply them to big data.

Practical Business Analytics Using SAS: A Hands-on Guide

Practical Business Analytics Using SAS: A Hands-on Guide shows SAS users and businesspeople how to analyze data effectively in real-life business scenarios. The book begins with an introduction to analytics, analytical tools, and SAS programming. The authors—both SAS, statistics, analytics, and big data experts—first show how SAS is used in business, and then how to get started programming in SAS by importing data and learning how to manipulate it. Besides illustrating SAS basic functions, you will see how each function can be used to get the information you need to improve business performance. Each chapter offers hands-on exercises drawn from real business situations. The book then provides an overview of statistics, as well as instruction on exploring data, preparing it for analysis, and testing hypotheses. You will learn how to use SAS to perform analytics and model using both basic and advanced techniques like multiple regression, logistic regression, and time series analysis, among other topics. The book concludes with a chapter on analyzing big data. Illustrations from banking and other industries make the principles and methods come to life. Readers will find just enough theory to understand the practical examples and case studies, which cover all industries. Written for a corporate IT and programming audience that wants to upgrade skills or enter the analytics field, this book includes: More than 200 examples and exercises, including code and datasets for practice. Relevant examples for all industries. Case studies that show how to use SAS analytics to identify opportunities, solve complicated problems, and chart a course. Practical Business Analytics Using SAS: A Hands-on Guide gives you the tools you need to gain insight into the data at your fingertips, predict business conditions for better planning, and make excellent decisions. Whether you are in retail, finance, healthcare, manufacturing, government, or any other industry, this book will help your organization increase revenue, drive down costs, improve marketing, and satisfy customers better than ever before.

In all the excitement around Big Data and Analytics, even savvy users of business intelligence can get a bit confused about how and when to use A/B Testing, Predictive Analytics, and Personalization to optimize. But optimizing isn’t about choosing which tool to use: Optimizing is about making decisions. The digital environment gives us an opportunity to make these marketing decisions at scale. In this session we’ll discuss how to bring these tools together to make better decisions, we’ll also touch on how machine learning can help us automate the process to free up analytics teams to focus on the higher value problems.

ElasticSearch Cookbook - Second Edition

The "ElasticSearch Cookbook - Second Edition" is a hands-on guide featuring over 130 advanced recipes to help you harness the power of ElasticSearch, a leading search and analytics engine. Through insightful examples and practical guidance, you'll learn to implement efficient search solutions, optimize queries, and manage ElasticSearch clusters effectively. What this Book will help me do Design and configure ElasticSearch topologies optimized for your specific deployment needs. Develop and utilize custom mappings to optimize your data indexes. Execute advanced queries and filters to refine and retrieve search results effectively. Set up and monitor ElasticSearch clusters for optimal performance. Extend ElasticSearch capabilities through plugin development and integrations using Java and Python. Author(s) Alberto Paro is a technology expert with years of experience working with ElasticSearch, Big Data solutions, and scalable cloud architecture. He has authored multiple books and technical articles on ElasticSearch, leveraging his extensive knowledge to provide practical insights. His approachable and detail-oriented style makes complex concepts accessible to technical professionals. Who is it for? This book is best suited for software developers and IT professionals looking to use ElasticSearch in their projects. Readers should be familiar with JSON, as well as basic programming skills in Java. It is ideal for those who have an understanding of search applications and want to deepen their expertise. Whether you're integrating ElasticSearch into a web application or optimizing your system's search capabilities, this book will provide the skills and knowledge you need.