talk-data.com talk-data.com

Topic

Big Data

data_processing analytics large_datasets

1217

tagged

Activity Trend

28 peak/qtr
2020-Q1 2026-Q1

Activities

1217 activities · Newest first

Introducing Data Science

Introducing Data Science teaches you how to accomplish the fundamental tasks that occupy data scientists. Using the Python language and common Python libraries, you'll experience firsthand the challenges of dealing with data at scale and gain a solid foundation in data science. About the Technology Many companies need developers with data science skills to work on projects ranging from social media marketing to machine learning. Discovering what you need to learn to begin a career as a data scientist can seem bewildering. This book is designed to help you get started. About the Book Introducing Data Science explains vital data science concepts and teaches you how to accomplish the fundamental tasks that occupy data scientists. You'll explore data visualization, graph databases, the use of NoSQL, and the data science process. You'll use the Python language and common Python libraries as you experience firsthand the challenges of dealing with data at scale. Discover how Python allows you to gain insights from data sets so big that they need to be stored on multiple machines, or from data moving so quickly that no single machine can handle it. This book gives you hands-on experience with the most popular Python data science libraries, Scikit-learn and StatsModels. After reading this book, you'll have the solid foundation you need to start a career in data science. What's Inside Handling large data Introduction to machine learning Using Python to work with data Writing data science algorithms About the Reader This book assumes you're comfortable reading code in Python or a similar language, such as C, Ruby, or JavaScript. No prior experience with data science is required. About the Authors Davy Cielen, Arno D. B. Meysman, and Mohamed Ali are the founders and managing partners of Optimately and Maiton, where they focus on developing data science projects and solutions in various sectors. Quotes Read this book if you want to get a quick overview of data science, with lots of examples to get you started! - Alvin Raj, Oracle The map that will help you navigate the data science oceans. - Marius Butuc, Shopify Covers the processes involved in data science from end to end… A complete overview. - Heather Campbell, Kainos A must-read for anyone who wants to get into the data science world. - Hector Cuesta, Big Data Bootcamp

Big Data in Practice

The best-selling author of Big Data is back, this time with a unique and in-depth insight into how specific companies use big data. Big data is on the tip of everyone's tongue. Everyone understands its power and importance, but many fail to grasp the actionable steps and resources required to utilise it effectively. This book fills the knowledge gap by showing how major companies are using big data every day, from an up-close, on-the-ground perspective. From technology, media and retail, to sport teams, government agencies and financial institutions, learn the actual strategies and processes being used to learn about customers, improve manufacturing, spur innovation, improve safety and so much more. Organised for easy dip-in navigation, each chapter follows the same structure to give you the information you need quickly. For each company profiled, learn what data was used, what problem it solved and the processes put it place to make it practical, as well as the technical details, challenges and lessons learned from each unique scenario. Learn how predictive analytics helps Amazon, Target, John Deere and Apple understand their customers Discover how big data is behind the success of Walmart, LinkedIn, Microsoft and more Learn how big data is changing medicine, law enforcement, hospitality, fashion, science and banking Develop your own big data strategy by accessing additional reading materials at the end of each chapter

Apache Hive Cookbook

Apache Hive Cookbook is a comprehensive resource for mastering Apache Hive, a tool that bridges the gap between SQL and Big Data processing. Through guided recipes, you'll acquire essential skills in Hive query development, optimization, and integration with modern big data frameworks. What this Book will help me do Design efficient Hive query structures for big data analytics. Optimize data storage and query execution using partitions and buckets. Integrate Hive seamlessly with frameworks like Spark and Hadoop. Understand and utilize the HiveQL syntax to perform advanced analytical processing. Implement practical solutions to secure, maintain, and scale Hive environments. Author(s) Hanish Bansal, Saurabh Chauhan, and Shrey Mehrotra bring their extensive expertise in big data technologies and Hive to this cookbook. With years of practical experience and deep technical knowledge, they offer a collection of solutions and best practices that reflect real-world use cases. Their commitment to clarity and depth makes this book an invaluable resource for exploring Hive to its fullest potential. Who is it for? This book is perfect for data professionals, engineers, and developers looking to enhance their capabilities in big data analytics using Hive. It caters to those with a foundational understanding of big data frameworks and some familiarity with SQL. Whether you're planning to optimize data handling or integrate Hive with other data tools, this guide helps you achieve your goals. Step into the world of efficient data analytics with Apache Hive through structured learning paths.

Big Data

Big Data: Storage, Sharing, and Security examines Big Data management from an R&D perspective. It covers the 3S designs-storage, sharing, and security-through detailed descriptions of Big Data concepts and implementations. Presenting the contributions of recognized Big Data experts from around the world, the book contains more than 450 pages of technical details on the most important implementation aspects regarding Big Data.

Big Data and Business Analytics

With the increasing barrage of big data, it becomes vital for organizations to make sense of this data in a timely and effective way to improve their decision making and competitive advantage. That's where business analytics come into play. This book explores case studies from industry leaders in big data domains such as cybersecurity, marketing, finance, emergency management, healthcare, and transportation. It offers a concise guide for CEOs and senior managers, as well as for business, management, and technology students interested in this emerging field.

Getting Analytics Right

Ask vital questions before you dive into data Are your big data and analytics capabilities up to par? Nearly half of the global company executives in a recent Forbes Insight/Teradata survey certainly don’t think theirs are. This new book from O’Reilly examines how things typically go wrong in the data analytics process, and introduces a question-first, data-second strategy that can help your company close the gap between being analytics-invested and truly data-driven. Authors from Tamr, Inc. share insights into why analytics projects often fail, and offer solutions based on their combined experience in engineering, architecture, product strategizing, and marketing. You’ll learn how projects often start from the wrong place, take too long, and don’t go far enough—missteps that lead to incomplete, late, or useless answers to critical business questions. Find out how their question-first, data-second approach—fueled by vastly improved data preparation platforms and cataloging software—can help you create human-machine analytics solutions designed specifically to produce better answers, faster. Getting Analytics Right was written and presented by people at Tamr, Inc., including Nidhi Aggarwal, Product and Strategy Lead; Byron Berk, Customer Success Lead; Gideon Goldin, Senior UX Architect; Matt Holzapfel, Product Marketing; and Eliot Knudsen, Field Engineer. Tamr, a Cambridge, Massachusetts-based startup, helps companies understand and unify their disparate databases.

Relational Database Design and Implementation, 4th Edition

Relational Database Design and Implementation: Clearly Explained, Fourth Edition, provides the conceptual and practical information necessary to develop a database design and management scheme that ensures data accuracy and user satisfaction while optimizing performance. Database systems underlie the large majority of business information systems. Most of those in use today are based on the relational data model, a way of representing data and data relationships using only two-dimensional tables. This book covers relational database theory as well as providing a solid introduction to SQL, the international standard for the relational database data manipulation language. The book begins by reviewing basic concepts of databases and database design, then turns to creating, populating, and retrieving data using SQL. Topics such as the relational data model, normalization, data entities, and Codd's Rules (and why they are important) are covered clearly and concisely. In addition, the book looks at the impact of big data on relational databases and the option of using NoSQL databases for that purpose. Features updated and expanded coverage of SQL and new material on big data, cloud computing, and object-relational databases Presents design approaches that ensure data accuracy and consistency and help boost performance Includes three case studies, each illustrating a different database design challenge Reviews the basic concepts of databases and database design, then turns to creating, populating, and retrieving data using SQL

The Hadoop Performance Myth

The wish lists of many data-driven organizations seem reasonable enough. They’d like to capitalize on real-time data analysis, move beyond batch processing for time-critical insights, allow multiple users to share cluster resources, and provide predictable service levels. However, fundamental performance limitations of complex distributed systems such as Hadoop prevent much of this from happening. In this report, Courtney Webster examines the root cause of these performance problems and explains why best practices for mitigating them—cluster tuning, provisioning, and even cluster isolation for mission critical jobs—don’t provide viable, scalable, or long-term solutions. Organizations have been pushing Hadoop and other distributed systems to their performance breaking points as they seek to use clusters as shared resources across multiple business units and individual users. Once they hit this performance wall, companies will find it difficult to deliver on the big data promise at scale. Read this report to find out what the implications are for your organization.

Business Intelligence Strategy and Big Data Analytics

Business Intelligence Strategy and Big Data Analytics is written for business leaders, managers, and analysts - people who are involved with advancing the use of BI at their companies or who need to better understand what BI is and how it can be used to improve profitability. It is written from a general management perspective, and it draws on observations at 12 companies whose annual revenues range between $500 million and $20 billion. Over the past 15 years, my company has formulated vendor-neutral business-focused BI strategies and program execution plans in collaboration with manufacturers, distributors, retailers, logistics companies, insurers, investment companies, credit unions, and utilities, among others. It is through these experiences that we have validated business-driven BI strategy formulation methods and identified common enterprise BI program execution challenges. In recent years, terms like “big data” and “big data analytics” have been introduced into the business and technical lexicon. Upon close examination, the newer terminology is about the same thing that BI has always been about: analyzing the vast amounts of data that companies generate and/or purchase in the course of business as a means of improving profitability and competitiveness. Accordingly, we will use the terms BI and business intelligence throughout the book, and we will discuss the newer concepts like big data as appropriate. More broadly, the goal of this book is to share methods and observations that will help companies achieve BI success and thereby increase revenues, reduce costs, or both. Provides ideas for improving the business performance of one’s company or business functions Emphasizes proven, practical, step-by-step methods that readers can readily apply in their companies Includes exercises and case studies with road-tested advice about formulating BI strategies and program plans

Global Business Analytics Models: Concepts and Applications in Predictive, Healthcare, Supply Chain, and Finance Analytics

THE COMPLETE GUIDE TO USING ANALYTICS TO MANAGE RISK AND UNCERTAINTY IN COMPLEX GLOBAL BUSINESS ENVIRONMENTS Practical techniques for developing reliable, actionable intelligence–and using it to craft strategy Analytical opportunities to solve key managerial problems in global enterprises Written for working managers: packed with realistic, useful examples This guide helps global managers use modern analytics to gain reliable, actionable, and timely business intelligence–and use it to manage risk, build winning strategies, and solve urgent problems. Dr. Hokey Min offers a practical, easy-to-understand overview of business analytics in a global context, focusing especially on managerial and strategic implications. After demystifying today’s core quantitative tools, he demonstrates them at work in a wide spectrum of global applications. You’ll build models to help segment global markets, forecast demand, assess risk, plan financing, optimize supply chains, and more. Along the way, you’ll find practical guidance for developing analytic thinking, operationalizing Big Data in global environments, and preparing for future analytical innovations. Whether you’re a global executive, strategist, analyst, marketer, supply chain professional, student or researcher, this book will help you drive real value from analytics–in smarter decisions, improved strategy, and better management. In today’s global business environments characterized by growing complexity, volatility, and uncertainty, business analytics has become an indispensable tool for managing these challenges. Specifically, global managers need analytics expertise to solve problems, identify opportunities, shape strategy, mitigate risk, and improve their day-to-day operational efficiency. Now, for the first time, there’s an analytics guide designed specifically for decision-makers in global organizations. Leveraging his experience teaching a number of students and training hundreds of managers and executives, Dr. Hokey Min demystifies the principles and tools of modern business analytics, and demonstrates their real-world use in global business. First, Dr. Min identifies key success factors and mindsets, helping you establish the preconditions for effective analysis. Next, he walks you through the practicalities of collecting, organizing, and analyzing Big Data, and developing models to transform them into actionable insight. Building on these foundations, he illustrates core analytical applications in finance, healthcare, and global supply chains. He concludes by previewing emerging trends in analytics, including the newest tools for automated decision-making. Compare today’s key quantitative tools Stats, data mining, OR, and simulation: how they work, when to use them Get the right data… …and get the data right Predict the future… …and sense its arrival sooner than others can Implement high-value analytics applications… …in finance, supply chains, healthcare, and beyond

Excel Power Pivot and Power Query For Dummies

A guide to PowerPivot and Power Query no data cruncher should be without! Want to familiarize yourself with the rich set of Microsoft Excel tools and reporting capabilities available from PowerPivot and Power Query? Look no further! Excel PowerPivot & Power Query For Dummies shows you how this powerful new set of tools can be leveraged to more effectively source and incorporate 'big data' Business Intelligence and Dashboard reports. You'll discover how PowerPivot and Power Query not only allow you to save time and simplify your processes, but also enable you to substantially enhance your data analysis and reporting capabilities. Gone are the days of relatively small amounts of data—today's data environment demands more from business analysts than ever before. Now, with the help of this friendly, hands-on guide, you'll learn to use PowerPivot and Power Query to expand your skill-set from the one-dimensional spreadsheet to new territories, like relational databases, data integration, and multi-dimensional reporting. Demonstrates how Power Query is used to discover, connect to, and import your data Shows you how to use PowerPivot to model data once it's been imported Offers guidance on using these tools to make analyzing data easier Written by a Microsoft MVP in the lighthearted, fun style you've come to expect from the For Dummies brand If you spend your days analyzing data, Excel PowerPivot & Power Query For Dummies will get you up and running with the rich set of Excel tools and reporting capabilities that will make your life—and work—easier.

Hadoop Real-World Solutions Cookbook - Second Edition

Master the full potential of big data processing using Hadoop with this comprehensive guide. Featuring over 90 practical recipes, this book helps you streamline data workflows and implement machine learning models with tools like Spark, Hive, and Pig. By the end, you'll confidently handle complex data problems and optimize big data solutions effectively. What this Book will help me do Install and manage a Hadoop 2.x cluster efficiently to suit your data processing needs. Explore and utilize advanced tools like Hive, Pig, and Flume for seamless big data analysis. Master data import/export processes with Sqoop and workflows automation using Oozie. Implement machine learning and analytics tasks using Mahout and Apache Spark. Store and process data flexibly across formats like Parquet, ORC, RC, and more. Author(s) None Deshpande is an expert in big data processing and analytics with years of hands-on experience in implementing Hadoop-based solutions for real-world problems. Known for a clear and pragmatic writing style, None brings actionable wisdom and best practices to the forefront, helping readers excel in managing and utilizing big data systems. Who is it for? Designed for technical enthusiasts and professionals, this book is ideal for those familiar with basic big data concepts. If you are looking to expand your expertise in Hadoop's ecosystem and implement data-driven solutions, this book will guide you through essential skills and advanced techniques to efficiently manage complex big data projects.

MongoDB in Action, Second Edition

GET MORE WITH MANNING An eBook copy of the previous edition, MongoDB in Action (First Edition), is included at no additional cost. It will be automatically added to your Manning Bookshelf within 24 hours of purchase. MongoDB in Action, Second Edition is a completely revised and updated version. It introduces MongoDB 3.0 and the document-oriented database model. This perfectly paced book gives you both the big picture you'll need as a developer and enough low-level detail to satisfy system engineers. About the Technology This document-oriented database was built for high availability, supports rich, dynamic schemas, and lets you easily distribute data across multiple servers. MongoDB 3.0 is flexible, scalable, and very fast, even with big data loads. About the Book MongoDB in Action, Second Edition is a completely revised and updated version. It introduces MongoDB 3.0 and the document-oriented database model. This perfectly paced book gives you both the big picture you'll need as a developer and enough low-level detail to satisfy system engineers. Lots of examples will help you develop confidence in the crucial area of data modeling. You'll also love the deep explanations of each feature, including replication, auto-sharding, and deployment. What's Inside Indexes, queries, and standard DB operations Aggregation and text searching Map-reduce for custom aggregations and reporting Deploying for scale and high availability Updated for Mongo 3.0 About the Reader Written for developers. No previous MongoDB or NoSQL experience is assumed. About the Authors After working at MongoDB, Kyle Banker is now at a startup. Peter Bakkum is a developer with MongoDB expertise. Shaun Verch has worked on the core server team at MongoDB. A Genentech engineer, Doug Garrett is one of the winners of the MongoDB Innovation Award for Analytics. A software architect, Tim Hawkins has led search engineering at Yahoo Europe. Technical Contributor: Wouter Thielen Technical Editor: Mihalis Tsoukalos Quotes A thorough manual for learning, practicing, and implementing MongoDB - Jeet Marwah, Acer Inc. A must-read to properly use MongoDB and model your data in the best possible way. - Hernan Garcia, Betterez Inc. Provides all the necessary details to get you jump-started with MongoDB. - Gregor Zurowski, Independent Software Development Consultant Awesome! MongoDB in a nutshell. - Hardy Ferentschik, Red Hat

Big Data, Open Data and Data Development

The world has become digital and technological advances have multiplied circuits with access to data, their processing and their diffusion. New technologies have now reached a certain maturity. Data are available to everyone, anywhere on the planet. The number of Internet users in 2014 was 2.9 billion or 41% of the world population. The need for knowledge is becoming apparent in order to understand this multitude of data. We must educate, inform and train the masses. The development of related technologies, such as the advent of the Internet, social networks, "cloud-computing" (digital factories), has increased the available volumes of data. Currently, each individual creates, consumes, uses digital information: more than 3.4 million e-mails are sent worldwide every second, or 107,000 billion annually with 14,600 e-mails per year per person, but more than 70% are spam. Billions of pieces of content are shared on social networks such as Facebook, more than 2.46 million every minute. We spend more than 4.8 hours a day on the Internet using a computer, and 2.1 hours using a mobile. Data, this new ethereal manna from heaven, is produced in real time. It comes in a continuous stream from a multitude of sources which are generally heterogeneous. This accumulation of data of all types (audio, video, files, photos, etc.) generates new activities, the aim of which is to analyze this enormous mass of information. It is then necessary to adapt and try new approaches, new methods, new knowledge and new ways of working, resulting in new properties and new challenges since SEO logic must be created and implemented. At company level, this mass of data is difficult to manage. Its interpretation is primarily a challenge. This impacts those who are there to "manipulate" the mass and requires a specific infrastructure for creation, storage, processing, analysis and recovery. The biggest challenge lies in "the valuing of data" available in quantity, diversity and access speed.

Emerging Trends in Applications and Infrastructures for Computational Biology, Bioinformatics, and Systems Biology

Emerging Trends in Applications and Infrastructures for Computational Biology, Bioinformatics, and Systems Biology: Systems and Applications covers the latest trends in the field with special emphasis on their applications. The first part covers the major areas of computational biology, development and application of data-analytical and theoretical methods, mathematical modeling, and computational simulation techniques for the study of biological and behavioral systems. The second part covers bioinformatics, an interdisciplinary field concerned with methods for storing, retrieving, organizing, and analyzing biological data. The book also explores the software tools used to generate useful biological knowledge. The third part, on systems biology, explores how to obtain, integrate, and analyze complex datasets from multiple experimental sources using interdisciplinary tools and techniques, with the final section focusing on big data and the collection of datasets so large and complex that it becomes difficult to process using conventional database management systems or traditional data processing applications. Explores all the latest advances in this fast-developing field from an applied perspective Provides the only coherent and comprehensive treatment of the subject available Covers the algorithm development, software design, and database applications that have been developed to foster research

Spark

Production-targeted Spark guidance with real-world use cases Spark: Big Data Cluster Computing in Production goes beyond general Spark overviews to provide targeted guidance toward using lightning-fast big-data clustering in production. Written by an expert team well-known in the big data community, this book walks you through the challenges in moving from proof-of-concept or demo Spark applications to live Spark in production. Real use cases provide deep insight into common problems, limitations, challenges, and opportunities, while expert tips and tricks help you get the most out of Spark performance. Coverage includes Spark SQL, Tachyon, Kerberos, ML Lib, YARN, and Mesos, with clear, actionable guidance on resource scheduling, db connectors, streaming, security, and much more. Spark has become the tool of choice for many Big Data problems, with more active contributors than any other Apache Software project. General introductory books abound, but this book is the first to provide deep insight and real-world advice on using Spark in production. Specific guidance, expert tips, and invaluable foresight make this guide an incredibly useful resource for real production settings. Review Spark hardware requirements and estimate cluster size Gain insight from real-world production use cases Tighten security, schedule resources, and fine-tune performance Overcome common problems encountered using Spark in production Spark works with other big data tools including MapReduce and Hadoop, and uses languages you already know like Java, Scala, Python, and R. Lightning speed makes Spark too good to pass up, but understanding limitations and challenges in advance goes a long way toward easing actual production implementation. Spark: Big Data Cluster Computing in Production tells you everything you need to know, with real-world production insight and expert guidance, tips, and tricks.

The DS2 Procedure: SAS Programming Methods at Work

The issue facing most SAS programmers today is not that data space has become bigger ("Big Data"), but that our programming problem space has become bigger. Through the power of DS2, this book shows programmers how easily they can manage complex problems using modular coding techniques.

The DS2 Procedure: SAS Programming Methods at Work outlines the basic structure of a DS2 program and teaches you how each component can help you address problems. The DS2 programming language in SAS 9.4 simplifies and speeds data preparation with user-defined methods, storing methods and attributes in shareable packages, and threaded execution on multicore symmetric multiprocessing (SMP) and massively parallel processing (MPP) machines. This book is intended for all BASE SAS programmers looking to learn about DS2; readers need only an introductory level of SAS to get started. Topics covered include introductions to Object Oriented Programming methods, DATA step programs, user-defined methods, predefined packages, and threaded processing.

Data and Electric Power

Traditional engineering is built upon a world of knowledge and scientific laws, with components and systems that operate predictably. But what happens when a large number of these devices are interconnected? You get a complex system that’s no longer deterministic, but probabilistic. That’s happening today in many industries, including manufacturing, petroleum, transportation, and energy. In this O’Reilly report, Sean Patrick Murphy, Chief Data Scientist at PingThings, describes how data science is helping electric utilities make sense of a stochastic world filled with increasing uncertainty—including fundamental changes to the energy market and random phenomena such as weather and solar activity. Murphy also reviews several cutting-edge tools for storing and processing big data that he’s used in his work with electric utilities—tools that can help traditional engineers pursue a data-driven approach in many industries. Topics in this report include: Key drivers that have changed the electric grid from a deterministic machine into probabilistic system Fundamental differences that put traditional engineering and data science at odds with one another Why the time is right for engineering organizations to adopt a complete data-driven approach Contemporary tools that traditional engineers can use to store and process big data A PingThings case study for dealing with random geomagnetic disturbances to the energy grid

Real-Time Big Data Analytics

This book delves into the techniques and tools essential for designing, processing, and analyzing complex datasets in real-time using advanced frameworks like Apache Spark, Storm, and Amazon Kinesis. By engaging with this thorough guide, you'll build proficiency in creating robust, efficient, and scalable real-time data processing architectures tailored to real-world scenarios. What this Book will help me do Learn the fundamentals of real-time data processing and how it differs from batch processing. Gain hands-on experience with Apache Storm for creating robust data-driven solutions. Develop real-world applications using Amazon Kinesis for cloud-based analytics. Perform complex data queries and transformations with Spark SQL and understand Spark RDDs. Master the Lambda Architecture to combine batch and real-time analytics effectively. Author(s) Shilpi Saxena is a renowned expert in big data technologies, holding extensive experience in real-time data analytics. With a career spanning years in the industry, Shilpi has provided innovative solutions for big data challenges in top-tier organizations. Her teaching approach emphasizes practical applicability, making her writings accessible and impactful for developers and architects alike. Who is it for? This book is for software professionals such as Big Data architects, developers, or programmers looking to enhance their skills in real-time big data analytics. If you are familiar with basic programming principles and seek to build solutions for processing large data streams in real-time environments, this book caters to your needs. It is also suitable for those seeking to familiarize themselves with using state-of-the-art tools like Spark SQL, Apache Storm, and Amazon Kinesis. Whether you're extending current expertise or transitioning into this field, this resource helps you achieve your objectives.

Handbook of Big Data

This handbook provides a state-of-the-art overview of the analysis of large-scale datasets. Featuring contributions from statistics and computer science experts in industry and academia, the text instills a working understanding of key statistical and computing ideas that can be readily applied in research and practice. Offering balanced coverage of methodology, theory, and applications, the text describes modern, scalable approaches for analyzing large datasets. It details advances in statistics and machine learning, as well as defines the underlying concepts of the available analytical tools and techniques.