talk-data.com talk-data.com

Topic

Big Data

data_processing analytics large_datasets

1217

tagged

Activity Trend

28 peak/qtr
2020-Q1 2026-Q1

Activities

1217 activities · Newest first

Machine Learning with Spark - Second Edition

Dive into the world of distributed machine learning with Apache Spark, a powerful framework for handling, processing, and analyzing big data. This book will take you through implementing popular machine learning algorithms using Spark ML, covering end-to-end workflows such as data preparation, model building, predictive analysis, and text processing. What this Book will help me do Learn to implement scalable machine learning solutions using Spark ML. Develop the skills to set up and configure Apache Spark environments. Master the application of machine learning techniques like clustering, classification, and regression with Spark. Efficiently handle and process large-scale datasets using Spark tools. Put Spark's capabilities to work in building real-world distributed data processing solutions. Author(s) None Dua and None Ghotra bring a wealth of experience in big data and machine learning to this book. They have been involved in building scalable data systems and implementing machine learning solutions in various industry scenarios. Their approach is hands-on and focused on teaching practical, actionable knowledge. Who is it for? This book is perfect for data enthusiasts, data engineers, and machine learning practitioners who are familiar with Python and Scala, eager to apply machine learning concepts in distributed environments. It's aimed at professionals looking to develop their skills in building scalable data systems and implementing advanced machine learning workflows in Spark.

Data & Analytics Bi-Weekly Newsletter Cast April 27, 2017

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers and lead practitioners to come on show and discuss their journey in creating the data driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords: FutureOfData Data Analytics Leadership Podcast Big Data Strategy

Sams Teach Yourself Hadoop in 24 Hours

Apache Hadoop is the technology at the heart of the Big Data revolution, and Hadoop skills are in enormous demand. Now, in just 24 lessons of one hour or less, you can learn all the skills and techniques you'll need to deploy each key component of a Hadoop platform in your local environment or in the cloud, building a fully functional Hadoop cluster and using it with real programs and datasets. Each short, easy lesson builds on all that's come before, helping you master all of Hadoop's essentials, and extend it to meet your unique challenges. Apache Hadoop in 24 Hours, Sams Teach Yourself covers all this, and much more: Understanding Hadoop and the Hadoop Distributed File System (HDFS) Importing data into Hadoop, and process it there Mastering basic MapReduce Java programming, and using advanced MapReduce API concepts Making the most of Apache Pig and Apache Hive Implementing and administering YARN Taking advantage of the full Hadoop ecosystem Managing Hadoop clusters with Apache Ambari Working with the Hadoop User Environment (HUE) Scaling, securing, and troubleshooting Hadoop environments Integrating Hadoop into the enterprise Deploying Hadoop in the cloud Getting started with Apache Spark Step-by-step instructions walk you through common questions, issues, and tasks; Q-and-As, Quizzes, and Exercises build and test your knowledge; "Did You Know?" tips offer insider advice and shortcuts; and "Watch Out!" alerts help you avoid pitfalls. By the time you're finished, you'll be comfortable using Apache Hadoop to solve a wide spectrum of Big Data problems.

Usage-Driven Database Design: From Logical Data Modeling through Physical Schema Definition

Design great databases—from logical data modeling through physical schema definition. You will learn a framework that finally cracks the problem of merging data and process models into a meaningful and unified design that accounts for how data is actually used in production systems. Key to the framework is a method for taking the logical data model that is a static look at the definition of the data, and merging that static look with the process models describing how the data will be used in actual practice once a given system is implemented. The approach solves the disconnect between the static definition of data in the logical data model and the dynamic flow of the data in the logical process models. The design framework in this book can be used to create operational databases for transaction processing systems, or for data warehouses in support of decision support systems. The information manager can be a flat file, Oracle Database, IMS, NoSQL, Cassandra, Hadoop, or any other DBMS. Usage-Driven Database Design emphasizes practical aspects of design, and speaks to what works, what doesn't work, and what to avoid at all costs. Included in the book are lessons learned by the author over his 30+ years in the corporate trenches. Everything in the book is grounded on good theory, yet demonstrates a professional and pragmatic approach to design that can come only from decades of experience. Presents an end-to-end framework from logical data modeling through physical schema definition. Includes lessons learned, techniques, and tricks that can turn a database disaster into a success. Applies to all types of database management systems, including NoSQL such as Cassandra and Hadoop, and mainstream SQL databases such as Oracle and SQL Server What You'll Learn Create logical data models that accurately reflect the real world of the user Create usage scenarios reflecting how applications will use a new database Merge static data models with dynamic process models to create resilient yet flexible database designs Support application requirements by creating responsive database schemas in any database architecture Cope with big data and unstructured data for transaction processing and decision support systems Recognize when relational approaches won't work, and when to turn toward NoSQL solutions such as Cassandra or Hadoop Who This Book Is For System developers, including business analysts, database designers, database administrators, and application designers and developers who must design or interact with database systems

Mastering Spark for Data Science

Master the techniques and sophisticated analytics used to construct Spark-based solutions that scale to deliver production-grade data science products About This Book Develop and apply advanced analytical techniques with Spark Learn how to tell a compelling story with data science using Spark’s ecosystem Explore data at scale and work with cutting edge data science methods Who This Book Is For This book is for those who have beginner-level familiarity with the Spark architecture and data science applications, especially those who are looking for a challenge and want to learn cutting edge techniques. This book assumes working knowledge of data science, common machine learning methods, and popular data science tools, and assumes you have previously run proof of concept studies and built prototypes. What You Will Learn Learn the design patterns that integrate Spark into industrialized data science pipelines See how commercial data scientists design scalable code and reusable code for data science services Explore cutting edge data science methods so that you can study trends and causality Discover advanced programming techniques using RDD and the DataFrame and Dataset APIs Find out how Spark can be used as a universal ingestion engine tool and as a web scraper Practice the implementation of advanced topics in graph processing, such as community detection and contact chaining Get to know the best practices when performing Extended Exploratory Data Analysis, commonly used in commercial data science teams Study advanced Spark concepts, solution design patterns, and integration architectures Demonstrate powerful data science pipelines In Detail Data science seeks to transform the world using data, and this is typically achieved through disrupting and changing real processes in real industries. In order to operate at this level you need to build data science solutions of substance –solutions that solve real problems. Spark has emerged as the big data platform of choice for data scientists due to its speed, scalability, and easy-to-use APIs. This book deep dives into using Spark to deliver production-grade data science solutions. This process is demonstrated by exploring the construction of a sophisticated global news analysis service that uses Spark to generate continuous geopolitical and current affairs insights.You will learn all about the core Spark APIs and take a comprehensive tour of advanced libraries, including Spark SQL, Spark Streaming, MLlib, and more. You will be introduced to advanced techniques and methods that will help you to construct commercial-grade data products. Focusing on a sequence of tutorials that deliver a working news intelligence service, you will learn about advanced Spark architectures, how to work with geographic data in Spark, and how to tune Spark algorithms so they scale linearly. Style and approach This is an advanced guide for those with beginner-level familiarity with the Spark architecture and working with Data Science applications. Mastering Spark for Data Science is a practical tutorial that uses core Spark APIs and takes a deep dive into advanced libraries including: Spark SQL, visual streaming, and MLlib. This book expands on titles like: Machine Learning with Spark and Learning Spark. It is the next learning curve for those comfortable with Spark and looking to improve their skills.

Learning Apache Spark 2

Dive into the world of Big Data with "Learning Apache Spark 2". This book introduces you to the powerful Apache Spark framework, tailored for real-time data analytics and machine learning. Through practical examples and real-world use-cases, you'll gain hands-on experience in leveraging Spark's capabilities for your data processing needs. What this Book will help me do Master the fundamentals of Apache Spark 2 and its new features. Effectively use Spark SQL, MLlib, RDDs, GraphX, and Spark Streaming to tackle real-world challenges. Gain skills in data processing, transformation, and analysis with Spark. Deploy and operate your Spark applications in clustered environments. Develop your own recommendation engines and predictive analytics models with Spark. Author(s) None Abbasi brings a wealth of expertise in Big Data technologies with a keen focus on simplifying complex concepts for learners. With substantial experience working in data processing frameworks, their approach to teaching creates an engaging and practical learning experience. With "Learning Apache Spark 2", None empowers readers to confidently tackle challenges in Big Data processing and analytics. Who is it for? This book is ideal for aspiring Big Data professionals seeking an accessible introduction to Apache Spark. Beginners in Spark will find step-by-step guidance, while those familiar with earlier versions will appreciate the insights into Spark 2's new features. Familiarity with Big Data concepts and Scala programming is recommended for optimal understanding.

Creating a Data-Driven Enterprise with DataOps

Many companies are busy collecting massive amounts of data, but few are taking advantage of this treasure horde to build a truly data insights-driven organization. To do so, the data team must democratize both data and the insights in a way that provides real-time access to all employees in the organization. This report explores DataOps, the process, culture, tools, and people required to scale big data pervasively across the enterprise. Just as DevOps has enabled organizations to improve coordination between developers and the operations team, DataOps closely connects everyone who handles data, including engineers, data scientists, analysts, and business users. Democratizing data with this approach requires removing barriers typical of siloed data, teams, and systems. In this report, Apache Hive creators Ashish Thusoo and Joydeep Sen Sarma examine the characteristics of a data-driven organization that supports a self-service model. Explore related topics such as data lakes, metadata, cloud architecture, and data-infrastructure-as-a-service Examine conclusions from a survey of more than 400 senior executives whose companies are in various stages of data maturity Learn how data pioneers at Facebook, Uber, LinkedIn, Twitter, and eBay created data-driven cultures and self-service data infrastructures for their organizations

Understanding Metadata

One viable option for organizations looking to harness massive amounts of data is the data lake, a single repository for storing all the raw data, both structured and unstructured, that floods into the company. But that isn’t the end of the story. The key to making a data lake work is data governance, using metadata to provide valuable context through tagging and cataloging. This practical report examines why metadata is essential for managing, migrating, accessing, and deploying any big data solution. Authors Federico Castanedo and Scott Gidley dive into the specifics of analyzing metadata for keeping track of your data—where it comes from, where it’s located, and how it’s being used—so you can provide safeguards and reduce risk. In the process, you’ll learn about methods for automating metadata capture. This report also explains the main features of a data lake architecture, and discusses the pros and cons of several data lake management solutions that support metadata. These solutions include: Traditional data integration/management vendors such as the IBM Research Accelerated Discovery Lab Tooling from open source projects, including Teradata Kylo and Informatica Startups such as Trifacta and Zaloni that provide best of breed technology

Effective Business Intelligence with QuickSight

Effective Business Intelligence with QuickSight introduces you to Amazon QuickSight, a modern BI tool that enables interactive visualizations powered by the cloud. With comprehensive tutorials, you'll master how to load, prepare, and visualize your data for actionable insights. This book provides real-world examples to showcase how QuickSight integrates into the AWS ecosystem. What this Book will help me do Understand how to effectively use Amazon QuickSight for business intelligence. Learn how to connect QuickSight to data sources like S3, RDS, and more. Create interactive dashboards and visualizations with QuickSight tools. Gain expertise in managing users, permissions, and data security in QuickSight. Execute a real-world big data project using AWS Data Lakes and QuickSight. Author(s) None Nadipalli is a seasoned data architect with extensive experience in cloud computing and business intelligence. With expertise in the AWS ecosystem, she has worked on numerous large-scale data analytics projects. Her writing focuses on providing practical knowledge through easy-to-follow examples and actionable insights. Who is it for? This book is ideal for business intelligence architects, developers, and IT executives seeking to leverage Amazon QuickSight. It is suited for readers with foundational knowledge of AWS who want to enhance their capabilities in BI and data visualization. If your goal is to modernize your business intelligence systems and explore advanced analytics, this book is perfect for you.

Beginning Data Science in R: Data Analysis, Visualization, and Modelling for the Data Scientist

Discover best practices for data analysis and software development in R and start on the path to becoming a fully-fledged data scientist. This book teaches you techniques for both data manipulation and visualization and shows you the best way for developing new software packages for R. Beginning Data Science in R details how data science is a combination of statistics, computational science, and machine learning. You'll see how to efficiently structure and mine data to extract useful patterns and build mathematical models. This requires computational methods and programming, and R is an ideal programming language for this. This book is based on a number of lecture notes for classes the author has taught on data science and statistical programming using the R programming language. Modern data analysis requires computational skills and usually a minimum of programming. What You Will Learn Perform data science and analytics using statistics and the R programming language Visualize and explore data, including working with large data sets found in big data Build an R package Test and check your code Practice version control Profile and optimize your code Who This Book Is For Those with some data science or analytics background, but not necessarily experience with the R programming language.

In this session, Sean Naismith, Head of Analytics Services, Enova Decisions, sat with Vishal Kumar, CEO AnalyticsWeek and shared his thoughts around Analytics As a Service platform. Sean discussed how a business could decide to look for alternative solutions to get ahead in the analytics game in a rapidly evolving competitive environment.

Timeline: 0:29 Introduction. 1:05 Sean's journey. 2:29 Introducing Enova. 3:30 Enova's clientele. 4:32 Decision management system. 6:40 Structuring Decision management system. 9:28 Analytics as a service. 11:01 Analytics as a competitive edge. 13:20 Art of doing business and science of doing business. 16:28 How is the science of doing business-impacting companies? 18:59 The right time to invest in DMS. 20:10 Analytics as a service use cases. 22:58 Decision life cycle. 27:57 DMS working with new customer landscape. 30:49 DMS deployment for translation companies. 33:05 Adaptability of DMS. 35:15 DMS aiding in Analysis paralysis. 37:31 DMS working with AI. 39:41 Challenges and opportunities in DMS. 41:22 Analyzing the non-repeatable decision. 43:33 DMS and future of data.

Podcast link: https://futureofdata.org/futureofdata-podcast-conversation-sean-naismith-enova-decisions/

Here's Sean's Bio: Sean joined Enova as Head of Analytics Services for Enova Decisions in 2016. Prior to working at Enova, Sean served as an Advanced Analytics Consultant and Senior Director of Business Analytics for Leapfrog, where he led the development of the company’s predictive analytics capabilities. Before Leapfrog, Sean served as Director of Strategic Intelligence for TrendPointers, LLC, and Director of Research for Global Currency Group. He also currently serves as Managing Director and Chief Compliance Officer of Naismith Wealth Management, LLC, which he founded in 2015. Sean is a CFP® certificate and holds the CMT designation. He received his B.A. in finance from the University of Illinois at Chicago.

The podcast is sponsored by: TAO.ai(https://tao.ai), Artificial Intelligence Driven Career Coach

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Want to Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Data Science For Dummies, 2nd Edition

Your ticket to breaking into the field of data science! Jobs in data science are projected to outpace the number of people with data science skills—making those with the knowledge to fill a data science position a hot commodity in the coming years. Data Science For Dummies is the perfect starting point for IT professionals and students interested in making sense of an organization's massive data sets and applying their findings to real-world business scenarios. From uncovering rich data sources to managing large amounts of data within hardware and software limitations, ensuring consistency in reporting, merging various data sources, and beyond, you'll develop the know-how you need to effectively interpret data and tell a story that can be understood by anyone in your organization. Provides a background in data science fundamentals and preparing your data for analysis Details different data visualization techniques that can be used to showcase and summarize your data Explains both supervised and unsupervised machine learning, including regression, model validation, and clustering techniques Includes coverage of big data processing tools like MapReduce, Hadoop, Dremel, Storm, and Spark It's a big, big data world out there—let Data Science For Dummies help you harness its power and gain a competitive edge for your organization.

Big Data Visualization

Dive into 'Big Data Visualization' and uncover how to tackle the challenges of visualizing vast quantities of complex data. With a focus on scalable and dynamic techniques, this guide explores the nuances of effective data analysis. You'll master tools and approaches to display, interpret, and communicate data in impactful ways. What this Book will help me do Understand the fundamentals of big data visualization, including unique challenges and solutions. Explore practical techniques for using D3 and Python to visualize and detect anomalies in big data. Learn to leverage dashboards like Tableau to present data insights effectively. Address and improve data quality issues to enhance analysis accuracy. Gain hands-on experience with real-world use cases for tools such as Hadoop and Splunk. Author(s) James D. Miller is an IBM-certified expert specializing in data analytics and visualization. With years of experience handling massive datasets and extracting actionable insights, he is dedicated to sharing his expertise. His practical approach is evident in how he combines tool mastery with a clear understanding of data complexities. Who is it for? This book is designed for data analysts, data scientists, and others involved in interpreting and presenting big datasets. Whether you are a beginner looking to understand big data visualization or an experienced professional seeking advanced tools and techniques, this guide suits your needs perfectly. A foundational knowledge in programming languages like R and big data platforms such as Hadoop is recommended to maximize your learning.

The Data Science Handbook

A comprehensive overview of data science covering the analytics, programming, and business skills necessary to master the discipline Finding a good data scientist has been likened to hunting for a unicorn: the required combination of technical skills is simply very hard to find in one person. In addition, good data science is not just rote application of trainable skill sets; it requires the ability to think flexibly about all these areas and understand the connections between them. This book provides a crash course in data science, combining all the necessary skills into a unified discipline. Unlike many analytics books, computer science and software engineering are given extensive coverage since they play such a central role in the daily work of a data scientist. The author also describes classic machine learning algorithms, from their mathematical foundations to real-world applications. Visualization tools are reviewed, and their central importance in data science is highlighted. Classical statistics is addressed to help readers think critically about the interpretation of data and its common pitfalls. The clear communication of technical results, which is perhaps the most undertrained of data science skills, is given its own chapter, and all topics are explained in the context of solving real-world data problems. The book also features: • Extensive sample code and tutorials using Python™ along with its technical libraries • Core technologies of “Big Data,” including their strengths and limitations and how they can be used to solve real-world problems • Coverage of the practical realities of the tools, keeping theory to a minimum; however, when theory is presented, it is done in an intuitive way to encourage critical thinking and creativity • A wide variety of case studies from industry • Practical advice on the realities of being a data scientist today, including the overall workflow, where time is spent, the types of datasets worked on, and the skill sets needed The Data Science Handbook is an ideal resource for data analysis methodology and big data software tools. The book is appropriate for people who want to practice data science, but lack the required skill sets. This includes software professionals who need to better understand analytics and statisticians who need to understand software. Modern data science is a unified discipline, and it is presented as such. This book is also an appropriate reference for researchers and entry-level graduate students who need to learn real-world analytics and expand their skill set. FIELD CADY is the data scientist at the Allen Institute for Artificial Intelligence, where he develops tools that use machine learning to mine scientific literature. He has also worked at Google and several Big Data startups. He has a BS in physics and math from Stanford University, and an MS in computer science from Carnegie Mellon.

Learning PySpark

"Learning PySpark" guides you through mastering the integration of Python with Apache Spark to build scalable and efficient data applications. You'll delve into Spark 2.0's architecture, efficiently process data, and explore PySpark's capabilities ranging from machine learning to structured streaming. By the end, you'll be equipped to craft and deploy robust data pipelines and applications. What this Book will help me do Master the Spark 2.0 architecture and its Python integration with PySpark. Leverage PySpark DataFrames and RDDs for effective data manipulation and analysis. Develop scalable machine learning models using PySpark's ML and MLlib libraries. Understand advanced PySpark features such as GraphFrames for graph processing and TensorFrames for deep learning models. Gain expertise in deploying PySpark applications locally and on the cloud for production-ready solutions. Author(s) Authors None Drabas and None Lee bring extensive experience in data engineering and Python programming. They combine a practical, example-driven approach with deep insights into Apache Spark's ecosystem. Their expertise and clarity in writing make this book accessible for individuals aiming to excel in big data technologies with Python. Who is it for? This book is best suited for Python developers who want to integrate Apache Spark 2.0 into their workflow to process large-scale data. Ideal readers will have foundational knowledge of Python and seek to build scalable data-intensive applications using Spark, regardless of prior experience with Spark itself.

Mastering Elasticsearch 5.x - Third Edition

This comprehensive guide dives deep into the functionalities of Elasticsearch 5, the widely-used search and analytics engine. Leveraging the power of Apache Lucene, this book will help you understand advanced concepts like querying, indexing, and cluster management to build efficient and scalable search solutions. What this Book will help me do Master advanced features of Elasticsearch such as text scoring, sharding, and aggregation. Understand how to handle big data efficiently using Elasticsearch's architecture. Learn practical implementation techniques for Elasticsearch features through hands-on examples. Develop custom plugins for Elasticsearch to tailor its functionalities to specific needs. Scale and optimize Elasticsearch clusters for high performance in production environments. Author(s) Bharvi Dixit is an experienced software engineer and a recognized expert in implementing Elasticsearch solutions. With a strong background in distributed systems and database management, Bharvi's writing is informed by real-world experience and a focus on practical applications. Who is it for? This book is ideal for developers and data engineers with existing experience in Elasticsearch who wish to deepen their knowledge. It serves as a valuable resource for professionals tasked with creating scalable search applications. A working understanding of Elasticsearch basics and query DSL is recommended to fully benefit from this guide.

Wednesday at 9 AM Pacific

Update: Talk is available @ https://www.voiceamerica.com/episode/97300/the-quantum-disruption-in-global-business-driven-by-the-big-analytics

The Quantum disruption in Global Business driven by The Big Analytics Listen to Vishal Kumar, An Author, Innovator, and a Mentor in discussion on one of the most important and relevant subjects of the modern times: The Big Analytics, and how it is changing the landscape of Global Business

Wednesday at 9 AM Pacific Time on VoiceAmerica Business Channel

Featured Guest

Vishal Kumar

Vishal Kumar is CEO & President of AnalyticsWeek. He is a leading advocate for data-driven decision making. He is rated as top 100 global influencers to follow in data analytics by leading research organizations. He has published two books on the topics of analytics. Currently, his work involves using Artificial Intelligence to prepare the workforce for the future. Vishal has been a keynote speaker at various international conferences. He sits as advisor to various analytics startups.

Originally Posted @ VoiceAmerica

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers and lead practitioners to come on show and discuss their journey in creating the data driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords: FutureOfData Data Analytics Leadership Podcast Big Data Strategy

Big Data Now: 2016 Edition

Now in its sixth edition, O’Reilly’s annual Big Data Now report recaps the trends, tools, applications, and forecasts we’ve examined throughout 2016. This collection of blog posts, authored by leading thinkers and experts in the field, reflects a unique set of themes we’ve identified as gaining significant attention and traction. Our list of topics for 2016 includes: Careers in data Tools and architecture for big data Intelligent real-time applications Cloud infrastructure Machine learning: models and training Deep learning and artificial intelligence

Geospatial Data and Analysis

Geospatial data, or data with location information, is generated in huge volumes every day by billions of mobile phones, IoT sensors, drones, nanosatellites, and many other sources in an unending stream. This practical ebook introduces you to the landscape of tools and methods for making sense of all that data, and shows you how to apply geospatial analytics to a variety of issues, large and small. Authors Aurelia Moser, Jon Bruner, and Bill Day provide a complete picture of the geospatial analysis options available, including low-scale commercial desktop GIS tools, medium-scale options such as PostGIS and Lucene-based searching, and true big data solutions built on technologies such as Hadoop. You’ll learn when it makes sense to move from one type of solution to the next, taking increased costs and complexity into account. Explore the structure of basic webmaps, and the challenges and constraints involved when working with geo data Dive into low- to medium-scale mapping tools for use in backend and frontend web development Focus on tools for robust medium-scale geospatial projects that don’t quite justify a big data solution Learn about innovative platforms and software packages for solving issues of processing and storage of large-scale data Examine geodata analysis use cases, including disaster relief, urban planning, and agriculture and environmental monitoring