talk-data.com talk-data.com

Topic

Big Data

data_processing analytics large_datasets

1217

tagged

Activity Trend

28 peak/qtr
2020-Q1 2026-Q1

Activities

1217 activities · Newest first

Educating Data

While big data has already made significant advances in business and government, data analytics is also beginning to transform education. This O’Reilly report explores how the use of analytics has already helped several educational programs, such as personalized learning and massive open online courses (MOOCs), for students of all ages. Of course, that’s only part of the story. As author Taylor Martin explains, researchers, educators, and private practitioners in the field have also run into several challenges in bringing the education field up to speed. Issues such as building data infrastructures, integrating data sources, and assuring student privacy still need to be resolved—as does the problem of teaching a new generation of data scientists about the challenges and opportunities unique to education. Download this report and find out what educators and analysts have accomplished so far, and how they hope data analytics will help improve outcomes for students, parents, schools, and teachers in the near future. Taylor Martin is a professor of Instructional Technology and Learning Sciences at Utah State University. She researches how people learn from active participation, both physical and social. Currently on rotation at the National Science Foundation, Dr. Martin focuses on a variety of efforts to understand how big data is impacting research in education and across the STEM disciplines.

Fast Data Front Ends for Hadoop

Organizations striving to build applications for streaming data have a new possibility to ponder: the use of ingestion engines at the front end of their Hadoop systems. With this O’Reilly report, you’ll learn how these fast data front ends process data before it reaches the Hadoop Data File System (HDFS), and provide intelligence and context in real time. This helps you reduce response times from hours to minutes, or even minutes to seconds. Author and independent consultant Akmal Chaudhri looks at several popular ingestion engines, including Apache Spark, Apache Storm, and the VoltDB in-memory database. Among them, VoltDB stands out by providing full Atomicity, Consistency, Isolation, and Durability (ACID) support. VoltDB also lets you build a fast data front-end that uses the familiar SQL language and standards. Learn the advantages of ingestion engines as well as the theoretical and practical problems that can come up in an implementation. You’ll discover how this option can handle streaming data, provide state, ensure durability, and support transactions and real-time decisions. Akmal B. Chaudhri is an Independent Consultant, specializing in big data, NoSQL, and NewSQL database technologies. He has previously held roles as a developer, consultant, product strategist, and technical trainer with several blue-chip companies and big data startups. Akmal regularly presents at international conferences and serves on program committees for several major conferences and workshops.

Elasticsearch Essentials

"Elasticsearch Essentials" provides a comprehensive introduction to Elasticsearch, the powerful search and analytics engine. This book delivers a fast-paced, practical guide to harnessing Elasticsearch for creating scalable search and analytics applications. What this Book will help me do Learn to effectively use Elasticsearch REST APIs for search and analytics. Understand and design schema and mappings with best practices. Master data modeling concepts for efficient data queries. Develop skills to create and manage Elasticsearch clusters in production. Learn techniques for ensuring high availability and handling large datasets. Author(s) Bharvi Dixit is a seasoned developer and expert in search technologies with hands-on experience in Elasticsearch and other search solutions. With extensive knowledge in data analytics and large-scale systems, Bharvi ensures readers gain practical skills and insights through well-structured examples and explanations. Who is it for? This book is perfect for developers looking to enhance their skills in building search and analytics solutions with Elasticsearch. It's particularly suited for those familiar with search technologies like Apache Lucene or Solr but new to Elasticsearch. Beginners to intermediate learners in big data and analytics will find the structured approach beneficial. It's ideal for professionals aspiring to develop advanced search implementations with modern tools.

Big Data Now: 2015 Edition

Now in its fifth year, O’Reilly’s annual Big Data Now report recaps the trends, tools, applications, and forecasts we’ve talked about over the past year. For 2015, we’ve included a collection of blog posts, authored by leading thinkers and experts in the field, that reflect a unique set of themes we’ve identified as gaining significant attention and traction. Our list of 2015 topics include: Data-driven cultures Data science Data pipelines Big data architecture and infrastructure The Internet of Things and real time Applications of big data Security, ethics, and governance Is your organization on the right track? Get a hold of this free report now and stay in tune with the latest significant developments in big data.

Fundamentals of Big Data Network Analysis for Research and Industry

Fundamentals of Big Data Network Analysis for Research and Industry Hyunjoung Lee, Institute of Green Technology, Yonsei University, Republic of Korea Il Sohn, Material Science and Engineering, Yonsei University, Republic of Korea Presents the methodology of big data analysis using examples from research and industry There are large amounts of data everywhere, and the ability to pick out crucial information is increasingly important. Contrary to popular belief, not all information is useful; big data network analysis assumes that data is not only large, but also meaningful, and this book focuses on the fundamental techniques required to extract essential information from vast datasets. Featuring case studies drawn largely from the iron and steel industries, this book offers practical guidance which will enable readers to easily understand big data network analysis. Particular attention is paid to the methodology of network analysis, offering information on the method of data collection, on research design and analysis, and on the interpretation of results. A variety of programs including UCINET, NetMiner, R, NodeXL, and Gephi for network analysis are covered in detail. Fundamentals of Big Data Network Analysis for Research and Industry looks at big data from a fresh perspective, and provides a new approach to data analysis. This book: Explains the basic concepts in understanding big data and filtering meaningful data Presents big data analysis within the networking perspective Features methodology applicable to research and industry Describes in detail the social relationship between big data and its implications Provides insight into identifying patterns and relationships between seemingly unrelated big data Fundamentals of Big Data Network Analysis for Research and Industry will prove a valuable resource for analysts, research engineers, industrial engineers, marketing professionals, and any individuals dealing with accumulated large data whose interest is to analyze and identify potential relationships among data sets.

Predictive Analytics, Revised and Updated

"Mesmerizing & fascinating..." — The Seattle Post-Intelligencer "The Freakonomics of big data." —Stein Kretsinger, founding executive of Advertising.com Award-winning | Used by over 30 universities | Translated into 9 languages An introduction for everyone. In this rich, fascinating — surprisingly accessible — introduction, leading expert Eric Siegel reveals how predictive analytics works, and how it affects everyone every day. Rather than a “how to” for hands-on techies, the book serves lay readers and experts alike by covering new case studies and the latest state-of-the-art techniques. Prediction is booming. It reinvents industries and runs the world. Companies, governments, law enforcement, hospitals, and universities are seizing upon the power. These institutions predict whether you're going to click, buy, lie, or die. Why? For good reason: predicting human behavior combats risk, boosts sales, fortifies healthcare, streamlines manufacturing, conquers spam, optimizes social networks, toughens crime fighting, and wins elections. How? Prediction is powered by the world's most potent, flourishing unnatural resource: data. Accumulated in large part as the by-product of routine tasks, data is the unsalted, flavorless residue deposited en masse as organizations churn away. Surprise! This heap of refuse is a gold mine. Big data embodies an extraordinary wealth of experience from which to learn. unleashes the power of data. With this technology Predictive Analytics , the computer literally learns from data how to predict the future behavior of individuals. Perfect prediction is not possible, but putting odds on the future drives millions of decisions more effectively, determining whom to call, mail, investigate, incarcerate, set up on a date, or medicate. In this lucid, captivating introduction — now in its Revised and Updated edition — former Columbia University professor and Predictive Analytics World founder Eric Siegel reveals the power and perils of prediction: What type of mortgage risk Chase Bank predicted before the recession. Predicting which people will drop out of school, cancel a subscription, or get divorced before they even know it themselves. Why early retirement predicts a shorter life expectancy and vegetarians miss fewer flights. Five reasons why organizations predict death — including one health insurance company. How U.S. Bank and Obama for America calculated — and Hillary for America 2016 plans to calculate — the way to most strongly persuade each individual. Why the NSA wants all your data: machine learning supercomputers to fight terrorism. How IBM's Watson computer used predictive modeling to answer questions and beat the human champs on TV's Jeopardy! How companies ascertain untold, private truths — how Target figures out you're pregnant and Hewlett-Packard deduces you're about to quit your job. How judges and parole boards rely on crime-predicting computers to decide how long convicts remain in prison. 182 examples from Airbnb, the BBC, Citibank, ConEd, Facebook, Ford, Google, the IRS, LinkedIn, Match.com, MTV, Netflix, PayPal, Pfizer, Spotify, Uber, UPS, Wikipedia, and more. How does predictive analytics work? This jam-packed book satisfies by demystifying the intriguing science under the hood. For future hands-on practitioners pursuing a career in the field, it sets a strong foundation, delivers the prerequisite knowledge, and whets your appetite for more. A truly omnipresent science, predictive analytics constantly affects our daily lives. Whether

R for Programmers

Unlike other books about R, written from the perspective of statistics, this book is written from the perspective of programmers, providing a channel for programmers with expertise in other programming languages to quickly understand R. The contents are divided into four parts: the basics of R, the server of R, databases and big data, and the appendices, which introduce the installation of Java, various databases, and Hadoop. Because this is a reference book, there is no special sequence for reading all the chapters. Anyone new to the subject who wishes to master R comprehensively can simply follow the chapters in sequence.

Scalable Big Data Architecture: A Practitioner’s Guide to Choosing Relevant Big Data Architecture

This book highlights the different types of data architecture and illustrates the many possibilities hidden behind the term "Big Data", from the usage of No-SQL databases to the deployment of stream analytics architecture, machine learning, and governance. Scalable Big Data Architecture covers real-world, concrete industry use cases that leverage complex distributed applications , which involve web applications, RESTful API, and high throughput of large amount of data stored in highly scalable No-SQL data stores such as Couchbase and Elasticsearch. This book demonstrates how data processing can be done at scale from the usage of NoSQL datastores to the combination of Big Data distribution. When the data processing is too complex and involves different processing topology like long running jobs, stream processing, multiple data sources correlation, and machine learning, it’s often necessary to delegate the load to Hadoop or Spark and use the No-SQL to serve processed data in real time. This book shows you how to choose a relevant combination of big data technologies available within the Hadoop ecosystem. It focuses on processing long jobs, architecture, stream data patterns, log analysis, and real time analytics. Every pattern is illustrated with practical examples, which use the different open sourceprojects such as Logstash, Spark, Kafka, and so on. Traditional data infrastructures are built for digesting and rendering data synthesis and analytics from large amount of data. This book helps you to understand why you should consider using machine learning algorithms early on in the project, before being overwhelmed by constraints imposed by dealing with the high throughput of Big data. Scalable Big Data Architecture is for developers, data architects, and data scientists looking for a better understanding of how to choose the most relevant pattern for a Big Data project and which tools to integrate into that pattern.

Big Data For Small Business For Dummies

Capitalise on big data to add value to your small business Written by bestselling author and big data expert Bernard Marr, Big Data For Small Business For Dummies helps you understand what big data actually is—and how you can analyse and use it to improve your business. Free of confusing jargon and complemented with lots of step-by-step guidance and helpful advice, it quickly and painlessly helps you get the most from using big data in a small business. Business data has been around for a long time. Unfortunately, it was trapped away in overcrowded filing cabinets and on archaic floppy disks. Now, thanks to technology and new tools that display complex databases in a much simpler manner, small businesses can benefit from the big data that's been hiding right under their noses. With the help of this friendly guide, you'll discover how to get your hands on big data to develop new offerings, products and services; understand technological change; create an infrastructure; develop strategies; and make smarter business decisions. Shows you how to use big data to make sense of user activity on social networks and customer transactions Demonstrates how to capture, store, search, share, analyse and visualise analytics Helps you turn your data into actionable insights Explains how to use big data to your advantage in order to transform your small business If you're a small business owner or employee, Big Data For Small Business For Dummies helps you harness the hottest commodity on the market today in order to take your company to new heights.

Big Data Analytics with Spark: A Practitioner’s Guide to Using Spark for Large-Scale Data Processing, Machine Learning, and Graph Analytics, and High-Velocity Data Stream Processing

This book is a step-by-step guide for learning how to use Spark for different types of big-data analytics projects, including batch, interactive, graph, and stream data analysis as well as machine learning. It covers Spark core and its add-on libraries, including Spark SQL, Spark Streaming, GraphX, MLlib, and Spark ML. Big Data Analytics with Spark shows you how to use Spark and leverage its easy-to-use features to increase your productivity. You learn to perform fast data analysis using its in-memory caching and advanced execution engine, employ in-memory computing capabilities for building high-performance machine learning and low-latency interactive analytics applications, and much more. Moreover, the book shows you how to use Spark as a single integrated platform for a variety of data processing tasks, including ETL pipelines, BI, live data stream processing, graph analytics, and machine learning. The book also includes a chapter on Scala, the hottest functional programming language, and the language that underlies Spark. You’ll learn the basics of functional programming in Scala, so that you can write Spark applications in it. What's more, Big Data Analytics with Spark provides an introduction to other big data technologies that are commonly used along with Spark, such as HDFS, Avro, Parquet, Kafka, Cassandra, HBase, Mesos, and so on. It also provides an introduction to machine learning and graph concepts. So the book is self-sufficient; all the technologies that you need to know to use Spark are covered. The only thing that you are expected to have is some programming knowledge in any language.

Next Generation Databases: NoSQL, NewSQL, and Big Data

This is a book for enterprise architects, database administrators, and developers who need to understand the latest developments in database technologies. It is the book to help you choose the correct database technology at a time when concepts such as Big Data, NoSQL and NewSQL are making what used to be an easy choice into a complex decision with significant implications. The relational database (RDBMS) model completely dominated database technology for over 20 years. Today this "one size fits all" stability has been disrupted by a relatively recent explosion of new database technologies. These paradigm-busting technologies are powering the "Big Data" and "NoSQL" revolutions, as well as forcing fundamental changes in databases across the board. Deciding to use a relational database was once truly a no-brainer, and the various commercial relational databases competed on price, performance, reliability, and ease of use rather than on fundamental architectures. Today we are faced with choices between radically different database technologies. Choosing the right database today is a complex undertaking, with serious economic and technological consequences. Next Generation Databases demystifies today’s new database technologies. The book describes what each technology was designed to solve. It shows how each technology can be used to solve real word application and business problems. Most importantly, this book highlights the architectural differences between technologies that are the critical factors to consider when choosing a database platform for new and upcoming projects. Introduces the new technologies that have revolutionized the database landscape Describes how each technology can be used to solve specific application or business challenges Reviews the most popular new wave databases and how they use these new database technologies

The Patient Revolution

In The Patient Revolution, author Krisa Tailor—a noted expert in health care innovation and management—explores, through the lens of design thinking, how information technology will take health care into the experience economy. In the experience economy, patients will shift to being empowered consumers who are active participants in their own care. Tailor explores this shift by creating a vision for a newly designed health care system that's focused on both sickness and wellness, and is driven by data and analytics. The new system seamlessly integrates health into our daily lives, and delivers care so uniquely personalized that no two people are provided identical treatments. Connected through data, everyone across the health care ecosystem, including clinicians, insurers, and researchers, will be able to meet individuals wherever they are in their health journey to reach the ultimate goal of keeping people healthy. The patient revolution has just begun and an exciting journey awaits us. Praise for the patient revolution "A full 50% of the US population has at least one chronic disease that requires ongoing monitoring and treatment. Our current health care system is woefully inadequate in providing these individuals with the treatment and support they need. This disparity can only be addressed through empowering patients to better care for themselves and giving providers better tools to care for their patients. Both of those solutions will require the development and application of novel technologies. In Krisa Tailor's book The Patient Revolution, a blueprint is articulated for how this could be achieved, culminating in a vision for a learning health system within 10 years." —Ricky Bloomfield, MD, Director, Mobile Technology Strategy; Assistant Professor, Duke Medicine "In The Patient Revolution, Krisa Tailor astutely points out that 80% of health is impacted by factors outside of the health care system. Amazon unfortunately knows more about our patients than we do. The prescriptive analytics she describes will allow health care providers to use big data to optimize interventions at the level of the individual patient. The use of analytics will allow providers to improve quality, shape care coordination, and contain costs. Advanced analytics will lead to personalized care and ultimately empowered patients!" —Linda Butler, MD, Vice President of Medical Affairs/Chief Medical Officer/Chief Medical Information Officer, Rex Healthcare " The Patient Revolution provides a practical roadmap on how the industry can capture value by making health and care more personalized, anticipatory, and intuitive to patient needs." —Ash Damle, CEO, Lumiata "Excellent read. For me, health care represents a unique economy—one focused on technology, but requiring a deep understanding of humanity. Ms. Tailor begins the exploration of how we provide care via the concepts of design thinking, asking how we might redesign care with an eye toward changing the experience. She does an excellent job deconstructing this from the patient experience. I look forward to a hopeful follow-up directed at changing the provider culture." —Alan Pitt, MD, Chief Medical Officer, Avizia "Whether you're a health care provider looking to gain an understanding of the health care landscape, a health data scientist, or a seasoned business pro, you'll come away with a deeper, nuanced understanding of today's evolving health care system with this book. Krisa Tailor ties together—in a comprehensive, unique way—the worlds of health care administration, clinical practice, design thinking, and business strategy and innovation." —Steven Chan, MD, MBA, University of California, Davis

Big Data MBA

Integrate big data into business to drive competitive advantage and sustainable success Big Data MBA brings insight and expertise to leveraging big data in business so you can harness the power of analytics and gain a true business advantage. Based on a practical framework with supporting methodology and hands-on exercises, this book helps identify where and how big data can help you transform your business. You'll learn how to exploit new sources of customer, product, and operational data, coupled with advanced analytics and data science, to optimize key processes, uncover monetization opportunities, and create new sources of competitive differentiation. The discussion includes guidelines for operationalizing analytics, optimal organizational structure, and using analytic insights throughout your organization's user experience to customers and front-end employees alike. You'll learn to “think like a data scientist” as you build upon the decisions your business is trying to make, the hypotheses you need to test, and the predictions you need to produce. Business stakeholders no longer need to relinquish control of data and analytics to IT. In fact, they must champion the organization's data collection and analysis efforts. This book is a primer on the business approach to analytics, providing the practical understanding you need to convert data into opportunity. Understand where and how to leverage big data Integrate analytics into everyday operations Structure your organization to drive analytic insights Optimize processes, uncover opportunities, and stand out from the rest Help business stakeholders to “think like a data scientist” Understand appropriate business application of different analytic techniques If you want data to transform your business, you need to know how to put it to use. Big Data MBA shows you how to implement big data and analytics to make better decisions.

Obtaining Value from Big Data for Service Delivery

Big data is an emerging phenomenon that has enormous implications and impacts upon business strategy, profitability, and process improvements. All service systems generate big data these days, especially human-centered service systems. It has been characterized as the collection, analysis and use of data characterized by the five Vs: volume, velocity, variety, veracity, and value (of data). This booklet will help middle, senior, and executive managers to understand what big data is; how to recognize, collect, process, and analyze it; how to store and manage it; how to obtain useful information from it; and how to assess its contribution to operational, tactical, and strategic decision-making in service-oriented organizations.

The Definitive Guide to MongoDB: A complete guide to dealing with Big Data using MongoDB, Third Edition

The Definitive Guide to MongoDB, Third Edition, is updated for MongoDB 3 and includes all of the latest MongoDB features, including the aggregation framework introduced in version 2.2 and hashed indexes in version 2.4. The Third Edition also now includes Node.js along with Python. MongoDB is the most popular of the "Big Data" NoSQL database technologies, and it's still growing. David Hows from 10gen, along with experienced MongoDB authors Peter Membrey and Eelco Plugge, provide their expertise and experience in teaching you everything you need to know to become a MongoDB pro.

Apache Oozie Essentials

Apache Oozie Essentials serves as your guide to mastering Apache Oozie, a powerful workflow scheduler for Hadoop environments. Through lucid explanations and practical examples, you will learn how to create, schedule, and enhance workflows for data ingestion, processing, and machine learning tasks using Oozie. What this Book will help me do Install and configure Apache Oozie in your Hadoop environment to start managing workflows. Develop seamless workflows that integrate tools like Hive, Pig, and Sqoop to automate data operations. Set up coordinators to handle timed and dependent job executions efficiently. Deploy Spark jobs within your workflows for machine learning on large datasets. Harness Oozie security features to improve your system's reliability and trustworthiness. Author(s) Authored by None Singh, a seasoned developer with a deep understanding of big data processing and Apache Oozie. With their practical experience, the book intersperses technical detail with real-world examples for an effective learning experience. The author's goal is to make Oozie accessible and useful to professionals. Who is it for? This book is ideal for data engineers and Hadoop professionals looking to streamline their workflow management using Apache Oozie. Whether you're a novice to Oozie or aiming to implement complex data and ML pipelines, the book offers comprehensive guidance tailored to your needs.

Data Lake Development with Big Data

In "Data Lake Development with Big Data," you will explore the fundamental principles and techniques for constructing and managing a Data Lake tailored for your organization's big data challenges. This book provides practical advice and architectural strategies for ingesting, managing, and analyzing large-scale data efficiently and effectively. What this Book will help me do Learn how to architect a Data Lake from scratch tailored to your organizational needs. Master techniques for ingesting data using real-time and batch processing frameworks efficiently. Understand data governance, quality, and security considerations essential for scalable Data Lakes. Discover strategies for enabling users to explore data within the Data Lake effectively. Gain insights into integrating Data Lakes with Big Data analytic applications for high performance. Author(s) None Pasupuleti and Beulah Salome Purra bring their extensive expertise in big data and enterprise data management to this book. With years of hands-on experience designing and managing large-scale data architectures, their insights are rooted in practical knowledge and proven techniques. Who is it for? This book is ideal for data architects and senior managers tasked with adapting or creating scalable data solutions in enterprise contexts. Readers should have foundational knowledge of master data management and be familiar with Big Data technologies to derive maximum value from the content presented.

Streaming Analytics with IBM Streams: Analyze More, Act Faster, and Get Continuous Insights

Gain a competitive edge with IBM Streams Turn data-in-motion into solid business opportunities with IBM Streams and let Streaming Analytics with IBM Streams show you how. This comprehensive guide starts out with a brief overview of different technologies used for big data processing and explanations on how data-in-motion can be utilized for business advantages. You will learn how to apply big data analytics and how they benefit from data-in-motion. Discover all about Streams starting with the main components then dive further with Stream instillation, and upgrade and management capabilities including tools used for production. Through a solid understanding of big in motion, detailed illustrations, Endnotes that provide additional learning resources, and end of chapter summaries with helpful insight, data analysists and professionals looking to get more from their data will benefit from expert insight on: Data-in-motion processing and how it can be applied to generate new business opportunities The three approaches to processing data in motion and pros and cons of each The main components of Streams from runtime to installation and administration Multiple purposes of the Text Analytics toolkit The evolving Streams ecosystem A detailed roadmap for programmers to quickly become fluent with Streams Data-in-motion is rapidly becoming a business tool used to discover more about customers and opportunities, however it is only valuable if have the tools and knowledge to analyze and apply. This is an expert guide to IBM Streams and how you can harness this powerful tool to gain a competitive business edge.

podcast_episode
by Kyle Polich , Slater Victoroff (indico Data Solutions)

The recent opinion piece Big Data Doesn't Exist on Tech Crunch by Slater Victoroff is an interesting discussion about the usefulness of data both big and small. Slater joins me this episode to discuss and expand on this discussion. Slater Victoroff is CEO of indico Data Solutions, a company whose services turn raw text and image data into human insight. He, and his co-founders, studied at Olin College of Engineering where indico was born. indico was then accepted into the "Techstars Accelarator Program" in the Fall of 2014 and went on to raise $3M in seed funding. His recent essay "Big Data Doesn't Exist" received a lot of traction on TechCrunch, and I have invited Slater to join me today to discuss his perspective and touch on a few topics in the machine learning space as well.

Oracle Data Integration: Tools for Harnessing Data

Deliver continuous access to timely and accurate BI across your enterprise using the detailed information in this Oracle Press guide. Through clear explanations and practical examples, a team of Oracle experts shows how to assimilate data from disparate sources into a single, unified view. Find out how to transform data in real time, handle replication and migration, and deploy Oracle Data Integrator and Oracle GoldenGate. Oracle Data Integration: Tools for Harnessing Data offers complete coverage of the latest “big data” hardware and software solutions . · Efficiently move data both inside and outside an Oracle environment · Map sources to database fields using Data Merge and ETL · Export schema through transportable tablespaces and Oracle Data Pump · Capture and apply changes across heterogeneous systems with Oracle GoldenGate · Seamlessly exchange information between databases using Oracle Data Integrator · Correct errors and maximize quality through data cleansing and validation · Plan and execute successful Oracle Database migrations and replications · Handle high-volume transactions with Oracle Big Data Appliance, Oracle NoSQL, and third-party utilities