talk-data.com talk-data.com

Topic

Big Data

data_processing analytics large_datasets

1217

tagged

Activity Trend

28 peak/qtr
2020-Q1 2026-Q1

Activities

1217 activities · Newest first

Oracle R Enterprise: Harnessing the Power of R in Oracle Database

Master the Big Data Capabilities of Oracle R Enterprise Effectively manage your enterprise’s big data and keep complex processes running smoothly using the hands-on information contained in this Oracle Press guide. Oracle R Enterprise: Harnessing the Power of R in Oracle Database shows, step-by-step, how to create and execute large-scale predictive analytics and maintain superior performance. Discover how to explore and prepare your data, accurately model business processes, generate sophisticated graphics, and write and deploy powerful scripts. You will also find out how to effectively incorporate Oracle R Enterprise features in APEX applications, OBIEE dashboards, and Apache Hadoop systems. Learn to: • Install, configure, and administer Oracle R Enterprise • Establish connections and move data to the database • Create Oracle R Enterprise packages and functions • Use the R language to work with data in Oracle Database • Build models using ODM, ORE, and other algorithms • Develop and deploy R scripts and use the R script repository • Execute embedded R scripts and employ ORE SQL API functions • Map and manipulate data using Oracle R Advanced Analytics for Hadoop • Use ORE in Oracle Data Miner, OBIEE, and other applications

Spark in Action

Spark in Action teaches you the theory and skills you need to effectively handle batch and streaming data using Spark. Fully updated for Spark 2.0. About the Technology Big data systems distribute datasets across clusters of machines, making it a challenge to efficiently query, stream, and interpret them. Spark can help. It is a processing system designed specifically for distributed data. It provides easy-to-use interfaces, along with the performance you need for production-quality analytics and machine learning. Spark 2 also adds improved programming APIs, better performance, and countless other upgrades. About the Book Spark in Action teaches you the theory and skills you need to effectively handle batch and streaming data using Spark. You'll get comfortable with the Spark CLI as you work through a few introductory examples. Then, you'll start programming Spark using its core APIs. Along the way, you'll work with structured data using Spark SQL, process near-real-time streaming data, apply machine learning algorithms, and munge graph data using Spark GraphX. For a zero-effort startup, you can download the preconfigured virtual machine ready for you to try the book's code. What's Inside Updated for Spark 2.0 Real-life case studies Spark DevOps with Docker Examples in Scala, and online in Java and Python About the Reader Written for experienced programmers with some background in big data or machine learning. About the Authors Petar Zečević and Marko Bonaći are seasoned developers heavily involved in the Spark community. Quotes Dig in and get your hands dirty with one of the hottest data processing engines today. A great guide. - Jonathan Sharley, Pandora Media Must-have! Speed up your learning of Spark as a distributed computing framework. - Robert Ormandi, Yahoo! An easy-to-follow, step-by-step guide. - Gaurav Bhardwaj, 3Pillar Global An ambitiously comprehensive overview of Spark and its diverse ecosystem. - Jonathan Miller, Optensity

Fast Data Processing with Spark 2 - Third Edition

Fast Data Processing with Spark 2 takes you through the essentials of leveraging Spark for big data analysis. You will learn how to install and set up Spark, handle data using its APIs, and apply advanced functionality like machine learning and graph processing. By the end of the book, you will be well-equipped to use Spark in real-world data processing tasks. What this Book will help me do Install and configure Apache Spark for optimal performance. Interact with distributed datasets using the resilient distributed dataset (RDD) API. Leverage the flexibility of DataFrame API for efficient big data analytics. Apply machine learning models using Spark MLlib to solve complex problems. Perform graph analysis using GraphX to uncover structural insights in data. Author(s) Krishna Sankar is an experienced data scientist and thought leader in big data technologies. With a deep understanding of machine learning, distributed systems, and Apache Spark, Krishna has guided numerous projects in data engineering and big data processing. Matei Zaharia, the co-author, is also widely recognized in the field of distributed systems and cloud computing, contributing to Apache Spark development. Who is it for? This book is catered to software developers and data engineers with a foundational understanding of Scala or Java programming. Beginner to medium-level understanding of big data processing concepts is recommended for readers. If you are aspiring to solve big data problems using scalable distributed computing frameworks, this book is perfect for you. By the end, you will be confident in building Spark-powered applications and analyzing data efficiently.

Fast Data Architectures for Streaming Applications

Why have stream-oriented data systems become so popular, when batch-oriented systems have served big data needs for many years? In this report, author Dean Wampler examines the rise of streaming systems for handling time-sensitive problems—such as detecting fraudulent financial activity as it happens. You’ll explore the characteristics of fast data architectures, along with several open source tools for implementing them. Batch-mode processing isn’t going away, but exclusive use of these systems is now a competitive disadvantage. You’ll learn that, while fast data architectures are much harder to build, they represent the state of the art for dealing with mountains of data that require immediate attention. Learn step-by-step how a basic fast data architecture works Understand why event logs are the core abstraction for streaming architectures, while message queues are the core integration tool Use methods for analyzing infinite data sets, where you don’t have all the data and never will Take a tour of open source streaming engines, and discover which ones work best for different use cases Get recommendations for making real-world streaming systems responsive, resilient, elastic, and message driven Explore an example streaming application for the IoT: telemetry ingestion and anomaly detection for home automation systems

In this session, Dr. Nipa Basu, Chief Analytics Officer, Dun&Bradstreet, sat with Vishal Kumar, CEO AnalyticsWeek and shared her journey as Chief Analytics Officer, life @ D&B, Future of Credit Scoring, and some challenges/opportunities she's observing as an industry observer, executive, and practitioner.

Timeline: 0:29 Nipa's background. 4:14 What is D&B? 7:40 Depth and breadth of decision making at D&B. 9:36 Matching security with technological evolution. 13:42 Anticipatory analytics. 16:00 CAO's role in D&B: in facing or outfacing? 18:32 Future of credit scoring. 21:36 Challenges in dealing with clients. 24:08 Cultural challenges. 28:42 Good use cases in security data. 31:51 CDO, CAO, and CTO. 33:56 Optimistic trends data analytics businesses. 36:44 Social data monitoring. 39:18 Creating a holistic model for data monitoring. 41:02 Overused terms in data analytics. 42:10 Best practices for small businesses to get started with data analytics. 44:33 Indicators that indicate the need for analytics for businesses. 47:06 Advice for data-driven leaders. 49:30 Art of doing business and science of doing business.

Podcast link: https://futureofdata.org/analyticsweek-leadership-podcast-with-dr-nipa-basu-dun-bradstreet/

Here's Nipa's Bio: Dr. Nipa Basu is the Chief Analytics Officer at Dun & Bradstreet. Nipa is the main source of inspiration and leadership for Dun & Bradstreet’s extensive team of data modelers and scientists that partner with the world’s leading Fortune 500 companies to create innovative, analytic solutions to drive business growth and results. The team is highly skilled in solving a wide range of business challenges with unique, basic, and advanced analytic applications.

Nipa joined Dun & Bradstreet in 2000 and since then has held key leadership roles focused on driving the success of Dun & Bradstreet’s Analytics practice. In 2012, Nipa was named Leader, Analytic Development, and in March 2015, Nipa was named Chief Analytics Officer and appointed to Dun & Bradstreet’s executive team.

Nipa began her professional career as an Economist with the New York State Legislative Tax Study Commission. She then joined Sandia National Laboratories, a national defense laboratory where she built a Microsimulation Model of the U.S. Economy. Prior to joining Dun & Bradstreet, Nipa was a database marketing statistician for AT&T with responsibility for building predictive marketing models.

Nipa received her Ph. D. in Economics from the State University of New York at Albany, specializing in Econometrics.

Follow @nipabasu

The podcast is sponsored by: TAO.ai(https://tao.ai), Artificial Intelligence Driven Career Coach

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Want to Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

In this session, Joe DeCosmo, Chief Analytics Officer, Enova International, sat with Vishal Kumar, CEO AnalyticsWeek and shared his journey to Chief Analytics Officer, life @ Enova, and some challenges/opportunities as he is observing as an executive, industry observer, and a Chief Analytics Officer.

Timeline: 0:29 Joe's journey. 5:05 Credit risk and fraud prevention models. 6:27 Enova: in facing or outfacing? 9:12 Enova area of expertise. 10:47 Enova decisions: Center of Excellence? 12:36 Depths and breadths of decision making at Enova. 14:51 CDO, CAO, and CTO. 17:24 Who owns the data at Enova? 19:55 Challenges in building a data culture. 25:52 Convincing leaders towards data science. 31:24 Business challenges that analytics is solving. 34:15 Getting started with data analytics as a business. 38:11 Exciting trends in data analytics. 41:09 Art of doing business and science of doing business. 44:00 Advice for budding CAOs.

Podcast link: https://futureofdata.org/analyticsweek-leadership-podcast-with-joe-decosmo-enova-international/

Here's Joe's Bio: Joe DeCosmo is the CAO of Enova International, where he leads a multi-disciplinary analytics team, providing end-to-end data and analytic services to Enova’s global online financial service brands and delivering real-time predictive analytics services to clients through Enova Decisions. Prior to Enova, Joe served as Director and Practice Leader of Advanced Analytics for West Monroe Partners and held a number of executive positions at HAVI Global Solutions and the Allant Group. He is also Immediate Past-President of the Chicago Chapter of the American Statistical Association and serves on the Advisory Board of the University of Illinois at Chicago's College of Business.

The podcast is sponsored by: TAO.ai(https://tao.ai), Artificial Intelligence Driven Career Coach

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Want to Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools

Learn how to use the Apache Hadoop projects, including MapReduce, HDFS, Apache Hive, Apache HBase, Apache Kafka, Apache Mahout, and Apache Solr. From setting up the environment to running sample applications each chapter in this book is a practical tutorial on using an Apache Hadoop ecosystem project. While several books on Apache Hadoop are available, most are based on the main projects, MapReduce and HDFS, and none discusses the other Apache Hadoop ecosystem projects and how they all work together as a cohesive big data development platform. What You Will Learn: Set up the environment in Linux for Hadoop projects using Cloudera Hadoop Distribution CDH 5 Run a MapReduce job Store data with Apache Hive, and Apache HBase Index data in HDFS with Apache Solr Develop a Kafka messaging system Stream Logs to HDFS with Apache Flume Transfer data from MySQL database to Hive, HDFS, and HBase with Sqoop Create a Hive table over Apache Solr Develop a Mahout User Recommender System Who This Book Is For: Apache Hadoop developers. Pre-requisite knowledge of Linux and some knowledge of Hadoop is required.

Hadoop Blueprints

"Hadoop Blueprints" guides you through using Hadoop and its ecosystem to solve real-life business problems. You will explore six case studies covering areas like fraud detection, marketing analysis, and data lakes, providing a thorough and practical understanding of Hadoop applications. What this Book will help me do Understand how to use Hadoop to solve real-life business scenarios effectively. Learn to build a 360-degree customer view integrating different data types. Develop and deploy a fraud detection system leveraging Hadoop technologies. Explore marketing campaign analysis and improvement using data-driven workflows on Hadoop. Gain hands-on experience with creating and maintaining efficient data lakes. Author(s) Sudheesh Narayan, along with his co-authors Anurag Shrivastava and Nod Deshpande, brings extensive experience in Big Data technologies. They have been involved in developing solutions utilizing Hadoop, Apache Spark, and other ecosystem components. Their practical approach to presenting complex technical topics ensures readers can apply their knowledge to real-world scenarios. Who is it for? This book is ideal for software developers, data engineers, and IT professionals who have a foundational understanding of Hadoop and seek to expand their practical skills. Readers should be familiar with Java or other scripting languages. It's perfect for those aiming to build actionable solutions for business problems using Big Data technologies.

Practical Data Analysis - Second Edition

Practical Data Analysis provides a hands-on guide to mastering essential data analysis techniques using tools like Pandas, MongoDB, and Apache Spark. With step-by-step instructions, you'll explore how to process diverse data types, apply machine learning methods, and uncover actionable insights that can drive innovative projects and business solutions. What this Book will help me do Master data acquisition, formatting, and visualization techniques to prepare your data for analysis. Understand and apply machine learning algorithms for tasks like classification and forecasting. Learn to analyze textual data, such as performing sentiment analysis and text classification. Effectively work with databases using tools like MongoDB and handle big data with Apache Spark. Develop data-driven applications using real-world examples like image similarity searches and social network graph analysis. Author(s) None Cuesta and Dr. Sampath Kumar are experienced data scientists and educators. They have considerable experience applying data analysis techniques in various domains and a passion for teaching these skills. Their practical approach to data analysis ensures an engaging learning experience for readers. Who is it for? This book is ideal for developers and data enthusiasts aiming to incorporate practical data analysis into their projects. It is perfectly suited for readers with basic programming, statistics, and linear algebra knowledge. Even if you're new to professional data analysis, you'll find the step-by-step examples approachable. This book guides you in transforming raw data into valuable insights.

Practical Oracle E-Business Suite: An Implementation and Management Guide

Learn to build and implement a robust Oracle E-Business Suite system using the new release, EBS 12.2. This hands-on, real-world guide explains the rationale for using an Oracle E-Business Suite environment in a business enterprise and covers the major technology stack changes from EBS version 11i through R12.2. You will learn to build up an EBS environment from a simple single-node installation to a complex multi-node high available setup. Practical Oracle E-Business Suite focuses on release R12.2, but key areas in R12.1 are also covered wherever necessary. Detailed instructions are provided for the installation of EBS R12.2 in single and multi-node configurations, the logic and methodology used in EBS patching, and cloning of EBS single-node and complex multi-node environments configured with RAC. This book also provides information on FMW used in EBS 12.2, as well as performance tuning and EBS 12.2 on engineered system implementations. Understand Oracle EBS software and the underlying technology stack components Install/configure Oracle E-Business Suite R12.2 in simple and HA complex setups Manage Oracle EBS 12.2 Use online patching (adop) for Installation of Oracle EBS patches Clone an EBS environment in simple and complex configurations Perform and tune Oracle EBS in all layers (Application/DB/OS/NW) Secure E-Business Suite R12.2 Who This Book Is For: Developers, data architects, and data scientists looking to integrate the most successful big data open stack architecture and how to choose the correct technology in every layer

Spark for Data Science

Explore how to leverage Apache Spark for efficient big data analytics and machine learning solutions in "Spark for Data Science". This detailed guide provides you with the skills to process massive datasets, perform data analytics, and build predictive models using Spark's powerful tools like RDDs, DataFrames, and Datasets. What this Book will help me do Gain expertise in data processing and transformation with Spark. Perform advanced statistical analysis to uncover insights. Master machine learning techniques to create predictive models using Spark. Utilize Spark's APIs to process and visualize big data. Build scalable and efficient data science solutions. Author(s) This book is co-authored by None Singhal and None Duvvuri, both accomplished data scientists with extensive experience in Apache Spark and big data technologies. They bring their practical industry expertise to explain complex topics in a straightforward manner. Their writing emphasizes real-world applications and step-by-step procedural guidance, making this a valuable resource for learners. Who is it for? This book is ideally suited for technologists seeking to incorporate data science capabilities into their work with Apache Spark, data scientists interested in machine learning algorithms implemented in Spark, and beginners aiming to step into the field of big data analytics. Whether you are familiar with Spark or completely new, this book offers valuable insights and practical knowledge.

Big Data SMACK: A Guide to Apache Spark, Mesos, Akka, Cassandra, and Kafka

Learn how to integrate full-stack open source big data architecture and to choose the correct technology—Scala/Spark, Mesos, Akka, Cassandra, and Kafka—in every layer. Big data architecture is becoming a requirement for many different enterprises. So far, however, the focus has largely been on collecting, aggregating, and crunching large data sets in a timely manner. In many cases now, organizations need more than one paradigm to perform efficient analyses. Big Data SMACK explains each of the full-stack technologies and, more importantly, how to best integrate them. It provides detailed coverage of the practical benefits of these technologies and incorporates real-world examples in every situation. This book focuses on the problems and scenarios solved by the architecture, as well as the solutions provided by every technology. It covers the six main concepts of big data architecture and how integrate, replace, and reinforce every layer: What You'll Learn The language: Scala The engine: Spark (SQL, MLib, Streaming, GraphX) The container: Mesos, Docker The view: Akka The storage: Cassandra The message broker: Kafka What You Will Learn: Make big data architecture without using complex Greek letter architectures Build a cheap but effective cluster infrastructure Make queries, reports, and graphs that business demands Manage and exploit unstructured and No-SQL data sources Use tools to monitor the performance of your architecture Integrate all technologies and decide which ones replace and which ones reinforce Who This Book Is For Developers, data architects, and data scientists looking to integrate the most successful big data open stack architecture and to choose the correct technology in every layer

Big Data Analytics

Dive into the world of big data with "Big Data Analytics: Real Time Analytics Using Apache Spark and Hadoop." This comprehensive guide introduces readers to the fundamentals and practical applications of Apache Spark and Hadoop, covering essential topics like Spark SQL, DataFrames, structured streaming, and more. Learn how to harness the power of real-time analytics and big data tools effectively. What this Book will help me do Master the key components of Apache Spark and Hadoop ecosystems, including Spark SQL and MapReduce. Gain an understanding of DataFrames, DataSets, and structured streaming for seamless data handling. Develop skills in real-time analytics using Spark Streaming and technologies like Kafka and HBase. Learn to implement machine learning models using Spark's MLlib and ML Pipelines. Explore graph analytics with GraphX and leverage data visualization tools like Jupyter and Zeppelin. Author(s) Venkat Ankam, an expert in big data technologies, has years of experience working with Apache Hadoop and Spark. As an educator and technical consultant, Venkat has enabled numerous professionals to gain critical insights into big data ecosystems. With a pragmatic approach, his writings aim to guide readers through complex systems in a structured and easy-to-follow manner. Who is it for? This book is perfect for data analysts, data scientists, software architects, and programmers aiming to expand their knowledge of big data analytics. Readers should ideally have a basic programming background in languages like Python, Scala, R, or SQL. Prior hands-on experience with big data environments is not necessary but is an added advantage. This guide is created to cater to a range of skill levels, from beginners to intermediate learners.

In this session, Beena Ammanath, Data Science Products at General Electric, sat with Vishal Kumar, CEO AnalyticsWeek and shared her journey as an analytics executive, life @ GE, future of analytics in the industrial sector, how Predix is helping other industrial companies cope up with growing data, and some challenges/Opportunities she's observing as an analytics executive.

Timeline: 0:29 Beena's journey. 5:19 Data science in the manufacturing sector. 7:03 Driving data science in the manufacturing sector. 9:39 Bringing in the data culture into the manufacturing sector. 11:35 Upskilling and being relevant as a data scientist. 13:27 Hacks to managing data teams well. 16:08 What's Predix? 19:06 Investment opportunities for data science in manufacturing. 21:07 Challenges manufacturing businesses in data. 24:46 IoT and manufacturing. 25:18 Dealing with IoT vendors at Predix. 26:24 Ontology of data at Predix. 29:43 Dealing with the new rules and laws in the IoT sector. 31:30 Interesting use cases in the manufacturing industry. 34:37 Open source vs. enterprise. 35:35 Getting recruited as a data scientist in manufacturing. 40:07 Pitching your product for a manufacturing company.

Podcast link: https://futureofdata.org/leadership-playbook-with-beena-ammanath-ge/

Here's Beena's Bio: Beena Ammanath is Board Director at ChickTech and Head of Data Science Products at General Electric. She is a seasoned technology leader with over 24 years of a proven track record of building, and leading high-performance teams from the ground-up focused on strategy and successful execution of industrial scale technology products and services. She has an impressive track record, having worked at recognized international organizations British Telecom, E*trade, Thomson Reuters, Bank of America, and Silicon Valley startups in engineering and management positions.

She is also helping build the next-gen of computer scientists through her role on the Industry Advisory Board for Cal Poly. She holds a Masters in Computer Science and an MBA in Finance. She has been a featured speaker on the topics of data science, big data, technology transformation, and women in leadership at numerous industry conferences.

Throughout her career in technology, Beena has been a strong advocate for women in positions of technology leadership and has established herself as a voice for resolving gender disparities.

Follow @beena_ammanath

The podcast is sponsored by: TAO.ai(https://tao.ai), Artificial Intelligence Driven Career Coach

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Want to Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

The Analytic Hospitality Executive

Targeted analytics to address the unique opportunities in hospitality and gaming The Analytic Hospitality Executive helps decision makers understand big data and how it can drive value in the industry. Written by a leading business analytics expert who specializes in hospitality and travel, this book draws a direct link between big data and hospitality, and shows you how to incorporate analytics into your strategic management initiative. You'll learn which data types are critical, how to identify productive data sources, and how to integrate analytics into multiple business processes to create an overall analytic culture that turns information into insight. The discussion includes the tools and tips that help make it happen, and points you toward the specific places in your business that could benefit from advanced analytics. The hospitality and gaming industry has unique needs and opportunities, and this book's targeted guidance provides a roadmap to big data benefits. Like most industries, the hospitality and gaming industry is experiencing a rapid increase in data volume, variety, and velocity. This book shows you how to corral this growing current, and channel it into productive avenues that drive better business. Understand big data and analytics Incorporate analytics into existing business processes Identify the most valuable data sources Create a strategic analytic culture that drives value Although the industry is just beginning to recognize the value of big data, it's important to get up to speed quickly or risk losing out on benefits that could drive business to greater heights. The Analytic Hospitality Executive provides a targeted game plan from an expert on the inside, so you can start making your data work for you.

Hadoop: Data Processing and Modelling

Unlock the power of your data with Hadoop 2.X ecosystem and its data warehousing techniques across large data sets About This Book Conquer the mountain of data using Hadoop 2.X tools The authors succeed in creating a context for Hadoop and its ecosystem Hands-on examples and recipes giving the bigger picture and helping you to master Hadoop 2.X data processing platforms Overcome the challenging data processing problems using this exhaustive course with Hadoop 2.X Who This Book Is For This course is for Java developers, who know scripting, wanting a career shift to Hadoop - Big Data segment of the IT industry. So if you are a novice in Hadoop or an expert, this book will make you reach the most advanced level in Hadoop 2.X. What You Will Learn Best practices for setup and configuration of Hadoop clusters, tailoring the system to the problem at hand Integration with relational databases, using Hive for SQL queries and Sqoop for data transfer Installing and maintaining Hadoop 2.X cluster and its ecosystem Advanced Data Analysis using the Hive, Pig, and Map Reduce programs Machine learning principles with libraries such as Mahout and Batch and Stream data processing using Apache Spark Understand the changes involved in the process in the move from Hadoop 1.0 to Hadoop 2.0 Dive into YARN and Storm and use YARN to integrate Storm with Hadoop Deploy Hadoop on Amazon Elastic MapReduce and Discover HDFS replacements and learn about HDFS Federation In Detail As Marc Andreessen has said "Data is eating the world," which can be witnessed today being the age of Big Data, businesses are producing data in huge volumes every day and this rise in tide of data need to be organized and analyzed in a more secured way. With proper and effective use of Hadoop, you can build new-improved models, and based on that you will be able to make the right decisions. The first module, Hadoop beginners Guide will walk you through on understanding Hadoop with very detailed instructions and how to go about using it. Commands are explained using sections called "What just happened" for more clarity and understanding. The second module, Hadoop Real World Solutions Cookbook, 2nd edition, is an essential tutorial to effectively implement a big data warehouse in your business, where you get detailed practices on the latest technologies such as YARN and Spark. Big data has become a key basis of competition and the new waves of productivity growth. Hence, once you get familiar with the basics and implement the end-to-end big data use cases, you will start exploring the third module, Mastering Hadoop. So, now the question is if you need to broaden your Hadoop skill set to the next level after you nail the basics and the advance concepts, then this course is indispensable. When you finish this course, you will be able to tackle the real-world scenarios and become a big data expert using the tools and the knowledge based on the various step-by-step tutorials and recipes. Style and approach This course has covered everything right from the basic concepts of Hadoop till you master the advance mechanisms to become a big data expert. The goal here is to help you learn the basic essentials using the step-by-step tutorials and from there moving toward the recipes with various real-world solutions for you. It covers all the important aspects of Hadoop from system designing and configuring Hadoop, machine learning principles with various libraries with chapters illustrated with code fragments and schematic diagrams. This is a compendious course to explore Hadoop from the basics to the most advanced techniques available in Hadoop 2.X.

Practical Hive: A Guide to Hadoop's Data Warehouse System

Dive into the world of SQL on Hadoop and get the most out of your Hive data warehouses. This book is your go-to resource for using Hive: authors Scott Shaw, Ankur Gupta, David Kjerrumgaard, and Andreas Francois Vermeulen take you through learning HiveQL, the SQL-like language specific to Hive, to analyze, export, and massage the data stored across your Hadoop environment. From deploying Hive on your hardware or virtual machine and setting up its initial configuration to learning how Hive interacts with Hadoop, MapReduce, Tez and other big data technologies, Practical Hive gives you a detailed treatment of the software. In addition, this book discusses the value of open source software, Hive performance tuning, and how to leverage semi-structured and unstructured data. What You Will Learn Install and configure Hive for new and existing datasets Perform DDL operations Execute efficient DML operations Use tables, partitions, buckets, and user-defined functions Discover performance tuning tips and Hive best practices Who This Book Is For Developers, companies, and professionals who deal with large amounts of data and could use software that can efficiently manage large volumes of input. It is assumed that readers have the ability to work with SQL.

Big Data War

This book mainly focuses on why data analytics fails in business. It provides an objective analysis and root causes of the phenomenon, instead of abstract criticism of utility of data analytics. The author, then, explains in detail on how companies can survive and win the global big data competition, based on actual cases of companies. Having established the execution and performance-oriented big data methodology based on over 10 years of experience in the field as an authority in big data strategy, the author identifies core principles of data analytics using case analysis of failures and successes of actual companies. Moreover, he endeavors to share with readers the principles regarding how innovative global companies became successful through utilization of big data. This book is a quintessential big data analytics, in which the author’s knowhow from direct and indirect experiences is condensed. How do we survive at this big data war in which Facebook in SNS, Amazon in e-commerce, Google in search, expand their platforms to other areas based on their respective distinct markets? The answer can be found in this book. 

In this session, David Rose, CEO, Ditto Labs, sat with Vishal Kumar, CEO AnalyticsWeek and shared his journey as a data driven executive, best practices, shared some thought leadership in visualizations and usability. Some challenges/opportunities he's observing as an analytics-driven startup.

Timeline: 0:29 David's journey. 4:50 Bringing technology to everyday objects. 9:37 Sensor and photosensor. 13:02 Choosing the right use cases. 16:54 On deep learning. 21:49 Working on new use cases in image processing. 26:05 Ditto Labs's allure classifiers. 28:15 Challenges as an entrepreneur in an image processing company. 32:50 Technical challenges for Ditto faces. 36:58 Privacy and IoT. 40:17 Different countries, different legal norms on privacy. 42:55 Data culture and image processing company. 44:46 Opportunities in the image processing stacks.

Podcast Link: https://futureofdata.org/analyticsweek-leadership-podcast-with-david-rose-ditto-labs/

If interested in vision catalog (as discussed in the video): http://www.slideshare.net/davidloring...

David's website: enchantedobjects.com

Here's David's Bio: David is the CEO at Ditto Labs, an image-recognition software platform that scours social media photos to find brands and products.

His new book, Enchanted Objects, focuses on the future of the internet of things and how these technologies will impact how we live and work.

Prior to Ditto, David founded and was CEO at Vitality, a company that reinvented medication packaging now distributed by CVS, Walgreens, and Express Scripts.

He founded Ambient Devices, which pioneered glanceable technology: embedding internet information in everyday objects like lamps, mirrors, and umbrellas.

David holds patents for photo sharing, interactive TV, ambient information displays, and medical devices. His work has been featured at the MoMA, covered in the New York Times, WIRED, and The Economist, and parodied on the Colbert Report.

David co-teaches a popular course in tangible user interfaces at the MIT Media Lab with Hiroshi Ishii. He is a frequent speaker to corporations and design and technology conferences.

He received his BA in Physics from St. Olaf College, studied Interactive Cinema at the MIT Media Lab, and earned a Masters at Harvard.

Follow @davidrose

The podcast is sponsored by: TAO.ai(https://tao.ai), Artificial Intelligence Driven Career Coach

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Want to Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

IBM Data Engine for Hadoop and Spark

This IBM® Redbooks® publication provides topics to help the technical community take advantage of the resilience, scalability, and performance of the IBM Power Systems™ platform to implement or integrate an IBM Data Engine for Hadoop and Spark solution for analytics solutions to access, manage, and analyze data sets to improve business outcomes. This book documents topics to demonstrate and take advantage of the analytics strengths of the IBM POWER8® platform, the IBM analytics software portfolio, and selected third-party tools to help solve customer's data analytic workload requirements. This book describes how to plan, prepare, install, integrate, manage, and show how to use the IBM Data Engine for Hadoop and Spark solution to run analytic workloads on IBM POWER8. In addition, this publication delivers documentation to complement available IBM analytics solutions to help your data analytic needs. This publication strengthens the position of IBM analytics and big data solutions with a well-defined and documented deployment model within an IBM POWER8 virtualized environment so that customers have a planned foundation for security, scaling, capacity, resilience, and optimization for analytics workloads. This book is targeted at technical professionals (analytics consultants, technical support staff, IT Architects, and IT Specialists) that are responsible for delivering analytics solutions and support on IBM Power Systems.