data-engineering

Remanufacturing Modeling and Analysis

2016-04-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Surendra M. Gupta , Mehmet Ali Ilgin

data data-models

Providing a solid foundation of knowledge in modeling remanufacturing systems, this book addresses the design, planning, and processing issues faced by decision-makers in the field. With easy-to-use mathematical or simulation modeling to demonstrate solutions for each remanufacturing issue, it helps practitioners understand how a particular issue can be effectively modeled and how to choose the appropriate solution methodology. The book also discusses how increasingly stringent environmental regulations and decreasing natural resources influence manufacturers toward more environmentally conscious manufacturing and product recovery initiatives.

Oracle Database Problem Solving and Troubleshooting Handbook

2016-04-18 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Mohamed Houri , Paulo Portugal , Tariq Farooq , Syed Jaffar Hussain , Jim Czuprynski , Mike Ault , Guy Harrison

Oracle data oracle-database-solutions

An Expert Guide for Solving Complex Oracle Database Problems delivers comprehensive, practical, and up-to-date advice for running the Oracle Database reliably and efficiently in complex production environments. Seven leading Oracle experts have brought together an unmatched collection of proven solutions, hands-on examples, and step-by-step tips for Oracle Database 12 Oracle Database Problem Solving and Troubleshooting Handbook c, 11 g, and other recent versions of Oracle Database. Every solution is crafted to help experienced Oracle DBAs and DMAs understand and fix serious problems as rapidly as possible. The authors cover LOB segments, UNDO tablespaces, high GC buffer wait events, poor query response times, latch contention, indexing, XA distributed transactions, RMAN backup/recovery, and much more. They also offer in-depth coverage of a wide range of topics, including DDL optimization, VLDB tuning, database forensics, adaptive cursor sharing, data pumps, data migration, SSDs, indexes, and how to go about fixing Oracle RAC problems. Learn how to Choose the quickest path to solve high-impact problems Use modern best practices to make your day more efficient and predictable Construct your “Call 9-1-1 plan” for future database emergencies Proactively perform maintenance to improve your environment’s stability Save time with industry-standard tools and scripts Register your product at informit.com/register for convenient access to downloads, updates, and corrections as they become available.

Architecting Data Lakes

2016-04-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ashish Thusoo , Ben Sharma

Data Lake Data Management Hadoop Data Streaming data data-lake storage-repositories

Many organizations use Hadoop-driven data lakes as an adjunct staging area for their enterprise data warehouses (EDW). But for those companies ready to take the plunge, a data lake is far more useful as a one-stop-shop for extracting insights from their vast collection of data. With this eBook, you’ll learn best practices for building, maintaining, and deriving value from a Hadoop data lake in production environments. Authors Alice LaPlante and Ben Sharma explain how a data lake will enable your organization to manage an increasing volume of datasets—from blog postings and product reviews to streaming data—and to discover important relationships between them. Whether you want to control administrative costs in healthcare or reduce risk in financial services, this ebook addresses the architectural considerations and required capabilities you need to build your own data lake. With this report, you’ll learn: The key attributes of a data lake, including its ability to store information in native formats for later processing Why implementing data management and governance in your data lake is crucial How to address various challenges for building and managing a data lake Self-service options that enable different users to access the data lake without help from IT Emerging trends that will shape the future of data lakes

Mapping Workflows and Managing Knowledge

2016-04-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by John L. Kmetz

Computer Science data data-models

This book is Volume II of simple but powerful tools for performance improvement. It is written for managers, analysts, and consultants who realize the value that system dynamic modeling can bring to companies and organizations, and would like to have that capability without a degree in math or computer science. It features the iThink modeling program, which requires no extensive knowledge of math; instead, iThink uses a small set of symbols and rules to allow any keen observer of a system to create models graphically—the user literally draws a graphic of the system within the program and works from that. In Chapter 1, the author describes his own experiences with modeling, the growth and development of modeling software, and makes the case for its value. Chapter 2 is an overview of iThink symbols and rules, sufficient to enable the reader to interpret and understand iThink models; while the program has many advanced features, a great many models are based on the fundamentals in this chapter. Chapter 3 provides guidelines for converting workflow-mapping models into iThink dynamic models, and discusses approaches to building models from scratch. This approach to modeling is consistent with the author’s approach to workflow mapping and analysis, which uses a small symbol set and related discipline to map workflows in any company or organization, without the need for expensive software or extended training. That process is described in this volume of the series, and these maps are often the foundation for modeling the system as a dynamic entity.

Relational Database Design and Implementation, 4th Edition

2016-04-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jan L. Harrington

Big Data Cloud Computing Data Modelling NoSQL RDBMS SQL data relational-databases

Relational Database Design and Implementation: Clearly Explained, Fourth Edition, provides the conceptual and practical information necessary to develop a database design and management scheme that ensures data accuracy and user satisfaction while optimizing performance. Database systems underlie the large majority of business information systems. Most of those in use today are based on the relational data model, a way of representing data and data relationships using only two-dimensional tables. This book covers relational database theory as well as providing a solid introduction to SQL, the international standard for the relational database data manipulation language. The book begins by reviewing basic concepts of databases and database design, then turns to creating, populating, and retrieving data using SQL. Topics such as the relational data model, normalization, data entities, and Codd's Rules (and why they are important) are covered clearly and concisely. In addition, the book looks at the impact of big data on relational databases and the option of using NoSQL databases for that purpose. Features updated and expanded coverage of SQL and new material on big data, cloud computing, and object-relational databases Presents design approaches that ensure data accuracy and consistency and help boost performance Includes three case studies, each illustrating a different database design challenge Reviews the basic concepts of databases and database design, then turns to creating, populating, and retrieving data using SQL

The Hadoop Performance Myth

2016-04-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Courtney Webster

Big Data Hadoop data

The wish lists of many data-driven organizations seem reasonable enough. They’d like to capitalize on real-time data analysis, move beyond batch processing for time-critical insights, allow multiple users to share cluster resources, and provide predictable service levels. However, fundamental performance limitations of complex distributed systems such as Hadoop prevent much of this from happening. In this report, Courtney Webster examines the root cause of these performance problems and explains why best practices for mitigating them—cluster tuning, provisioning, and even cluster isolation for mission critical jobs—don’t provide viable, scalable, or long-term solutions. Organizations have been pushing Hadoop and other distributed systems to their performance breaking points as they seek to use clusters as shared resources across multiple business units and individual users. Once they hit this performance wall, companies will find it difficult to deliver on the big data promise at scale. Read this report to find out what the implications are for your organization.

Model-Based Testing Essentials - Guide to the ISTQB Certified Model-Based Tester

2016-04-11 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Anne Kramer , Robert V. Binder , Gualtiero Bazzana , Bruno Legeard

data data-models

Provides a practical and comprehensive introduction to the key aspects of model-based testing as taught in the ISTQB® Model-Based Tester—Foundation Level Certification Syllabus This book covers the essentials of Model-Based Testing (MBT) needed to pass the ISTQB® Foundation Level Model-Based Tester Certification. The text begins with an introduction to MBT, covering both the benefits and the limitations of MBT. The authors review the various approaches to model-based testing, explaining the fundamental processes in MBT, the different modeling languages used, common good modeling practices, and the typical mistakes and pitfalls. The book explains the specifics of MBT test implementation, the dependencies on modeling and test generation activities, and the steps required to automate the generated test cases. The text discusses the introduction of MBT in a company, presenting metrics to measure success and good practices to apply. Provides case studies illustrating different approaches to Model-Based Testing Includes in-text exercises to encourage readers to practice modeling and test generation activities Contains appendices with solutions to the in-text exercises, a short quiz to test readers, along with additional information Model-Based Testing Essentials – Guide to the ISTQB® Certified Model-Based Tester – Foundation Level is written primarily for participants of the ISTQB® Certification: software engineers, test engineers, software developers, and anybody else involved in software quality assurance. This book can also be used for anyone who wants a deeper understanding of software testing and of the use of models for test generation.

IBM System Storage DS8000 Performance Monitoring and Tuning

2016-04-07 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Wilhelm Gardt , Peter Kimmel , Flavio Morais , Kenta Yuge , Axel Westphal , Bert Dufrasne , Paulus Usong , Alexander Warmuth , Jana Jamsek

IBM data

This IBM® Redbooks® publication provides guidance about how to configure, monitor, and manage your IBM DS8880 storage systems to achieve optimum performance, and it also covers the IBM DS8870 storage system. It describes the DS8880 performance features and characteristics, including hardware-related performance features, synergy items for certain operating systems, and other functions, such as IBM Easy Tier® and the DS8000® I/O Priority Manager. The book also describes specific performance considerations that apply to particular host environments, including database applications. This book also outlines the various tools that are available for monitoring and measuring I/O performance for different server environments, and it describes how to monitor the performance of the entire DS8000 storage system. This book is intended for individuals who want to maximize the performance of their DS8880 and DS8870 storage systems and investigate the planning and monitoring tools that are available. The IBM DS8880 storage system features, as described in this book, are available for the DS8880 model family with R8.0 release bundles (Licensed Machine Code (LMC) level 7.8.0).

IBM Reference Architecture for Genomics, Power Systems Edition

2016-04-05 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Marcelo Correia Lima , Dino Quintero , Katarzyna Pasierb , William dos Santos , Luis Bolinches

IBM data

This IBM® Redbooks® publication introduces the IBM Reference Architecture for Genomics, IBM Power Systems™ edition on IBM POWER8®. It addresses topics such as why you would implement Life Sciences workloads on IBM POWER8, and shows how to use such solution to run Life Sciences workloads using IBM Platform™ Computing software to help set up the workloads. It also provides technical content to introduce the IBM POWER8 clustered solution for Life Sciences workloads. This book customizes and tests Life Sciences workloads with a combination of an IBM Platform Computing software solution stack, Open Stack, and third party applications. All of these applications use IBM POWER8, and IBM Spectrum Scale™ for a high performance file system. This book helps strengthen IBM Life Sciences solutions on IBM POWER8 with a well-defined and documented deployment model within an IBM Platform Computing and an IBM POWER8 clustered environment. This system provides clients in need of a modular, cost-effective, and robust solution with a planned foundation for future growth. This book highlights IBM POWER8 as a flexible infrastructure for clients looking to deploy life sciences workloads, and at the same time reduce capital expenditures, operational expenditures, and optimization of resources. This book helps answer clients' workload challenges in particular with Life Sciences applications, and provides expert-level documentation and how-to-skills to worldwide teams that provide Life Sciences solutions and support to give a broad understanding of a new architecture.

IT Modernization using Catalogic ECX Copy Data Management and IBM Spectrum Storage

2016-04-05 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jon Tate , Christian Burns , Peter Eicher , Kamlesh Lad , Prashant Jagannathan

Analytics Cloud Computing Data Management DevOps IBM data

Data is the currency of the new economy, and organizations are increasingly tasked with finding better ways to protect, recover, access, share, and use data. Traditional storage technologies are being stretched to the breaking point. This challenge is not because of storage hardware performance, but because management tools and techniques have not kept pace with new requirements. Primary data growth rates of 35% to 50% annually only amplify the problem. Organizations of all sizes find themselves needing to modernize their IT processes to enable critical new use cases such as storage self-service, Development and Operations (DevOps), and integration of data centers with the Cloud. They are equally challenged with improving management efficiencies for long established IT processes such as data protection, disaster recovery, reporting, and business analytics. Access to copies of data is the one common feature of all these use cases. However, the slow, manual processes common to IT organizations, including a heavy reliance on labor-intensive scripting and disparate tool sets, are no longer able to deliver the speed and agility required in today's fast-paced world. Copy Data Management (CDM) is an IT modernization technology that focuses on using existing data in a manner that is efficient, automated, scalable, and easy to use, delivering the data access that is urgently needed to meet the new use cases. Catalogic ECX, with IBM® storage, provides in-place copy data management that modernizes IT processes, enables key use cases, and does it all within existing infrastructure. This IBM Redbooks® publication shows how Catalogic Software and IBM have partnered together to create an integrated solution that addresses today's IT environment.

Oracle Database 12c Oracle RMAN Backup & Recovery

2016-04-01 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Robert Freeman , Matthew Hart

Cloud Computing Oracle Cyber Security data oracle-database-solutions

This authoritative Oracle Press resource on RMAN has been thoroughly revised to cover every new feature, offering the most up-to-date information This fully updated volume lays out the easiest, fastest, and most effective methods of deploying RMAN in Oracle Database environments of any size. Keeping with previous editions, this book teaches computing professionals at all skill levels how to fully leverage every powerful RMAN tool and protect mission-critical data. Oracle Database 12c RMAN Backup and Recovery explains how to generate reliable archives and carry out successful system restores. You will learn to work from the command line or GUI, automate the database backup process, perform Oracle Flashback recoveries, and deploy third-party administration utilities. The book features full details on cloud computing, report generation, performance tuning, and security. Offers up-to-date coverage of Oracle Database 12 c new features Examples and workshops throughout walk you through important RMAN operations

Hadoop Real-World Solutions Cookbook - Second Edition

2016-03-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Tanmay Deshpande

AI/ML Analytics Big Data Hadoop Hive ORC Parquet Spark data

Master the full potential of big data processing using Hadoop with this comprehensive guide. Featuring over 90 practical recipes, this book helps you streamline data workflows and implement machine learning models with tools like Spark, Hive, and Pig. By the end, you'll confidently handle complex data problems and optimize big data solutions effectively. What this Book will help me do Install and manage a Hadoop 2.x cluster efficiently to suit your data processing needs. Explore and utilize advanced tools like Hive, Pig, and Flume for seamless big data analysis. Master data import/export processes with Sqoop and workflows automation using Oozie. Implement machine learning and analytics tasks using Mahout and Apache Spark. Store and process data flexibly across formats like Parquet, ORC, RC, and more. Author(s) None Deshpande is an expert in big data processing and analytics with years of hands-on experience in implementing Hadoop-based solutions for real-world problems. Known for a clear and pragmatic writing style, None brings actionable wisdom and best practices to the forefront, helping readers excel in managing and utilizing big data systems. Who is it for? Designed for technical enthusiasts and professionals, this book is ideal for those familiar with basic big data concepts. If you are looking to expand your expertise in Hadoop's ecosystem and implement data-driven solutions, this book will guide you through essential skills and advanced techniques to efficiently manage complex big data projects.

MongoDB in Action, Second Edition

2016-03-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Douglas Garrett , Shaun Verch , Kyle Banker , Tim Hawkins , Peter Bakkum

Analytics Big Data Data Modelling MongoDB NoSQL data nosql-databases

GET MORE WITH MANNING An eBook copy of the previous edition, MongoDB in Action (First Edition), is included at no additional cost. It will be automatically added to your Manning Bookshelf within 24 hours of purchase. MongoDB in Action, Second Edition is a completely revised and updated version. It introduces MongoDB 3.0 and the document-oriented database model. This perfectly paced book gives you both the big picture you'll need as a developer and enough low-level detail to satisfy system engineers. About the Technology This document-oriented database was built for high availability, supports rich, dynamic schemas, and lets you easily distribute data across multiple servers. MongoDB 3.0 is flexible, scalable, and very fast, even with big data loads. About the Book MongoDB in Action, Second Edition is a completely revised and updated version. It introduces MongoDB 3.0 and the document-oriented database model. This perfectly paced book gives you both the big picture you'll need as a developer and enough low-level detail to satisfy system engineers. Lots of examples will help you develop confidence in the crucial area of data modeling. You'll also love the deep explanations of each feature, including replication, auto-sharding, and deployment. What's Inside Indexes, queries, and standard DB operations Aggregation and text searching Map-reduce for custom aggregations and reporting Deploying for scale and high availability Updated for Mongo 3.0 About the Reader Written for developers. No previous MongoDB or NoSQL experience is assumed. About the Authors After working at MongoDB, Kyle Banker is now at a startup. Peter Bakkum is a developer with MongoDB expertise. Shaun Verch has worked on the core server team at MongoDB. A Genentech engineer, Doug Garrett is one of the winners of the MongoDB Innovation Award for Analytics. A software architect, Tim Hawkins has led search engineering at Yahoo Europe. Technical Contributor: Wouter Thielen Technical Editor: Mihalis Tsoukalos Quotes A thorough manual for learning, practicing, and implementing MongoDB - Jeet Marwah, Acer Inc. A must-read to properly use MongoDB and model your data in the best possible way. - Hernan Garcia, Betterez Inc. Provides all the necessary details to get you jump-started with MongoDB. - Gregor Zurowski, Independent Software Development Consultant Awesome! MongoDB in a nutshell. - Hardy Ferentschik, Red Hat

Big Data, Open Data and Data Development

2016-03-28 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Soraya Sedkaoui , Jean-Louis Monino

Big Data Cloud Computing data

The world has become digital and technological advances have multiplied circuits with access to data, their processing and their diffusion. New technologies have now reached a certain maturity. Data are available to everyone, anywhere on the planet. The number of Internet users in 2014 was 2.9 billion or 41% of the world population. The need for knowledge is becoming apparent in order to understand this multitude of data. We must educate, inform and train the masses. The development of related technologies, such as the advent of the Internet, social networks, "cloud-computing" (digital factories), has increased the available volumes of data. Currently, each individual creates, consumes, uses digital information: more than 3.4 million e-mails are sent worldwide every second, or 107,000 billion annually with 14,600 e-mails per year per person, but more than 70% are spam. Billions of pieces of content are shared on social networks such as Facebook, more than 2.46 million every minute. We spend more than 4.8 hours a day on the Internet using a computer, and 2.1 hours using a mobile. Data, this new ethereal manna from heaven, is produced in real time. It comes in a continuous stream from a multitude of sources which are generally heterogeneous. This accumulation of data of all types (audio, video, files, photos, etc.) generates new activities, the aim of which is to analyze this enormous mass of information. It is then necessary to adapt and try new approaches, new methods, new knowledge and new ways of working, resulting in new properties and new challenges since SEO logic must be created and implemented. At company level, this mass of data is difficult to manage. Its interpretation is primarily a challenge. This impacts those who are there to "manipulate" the mass and requires a specific infrastructure for creation, storage, processing, analysis and recovery. The biggest challenge lies in "the valuing of data" available in quantity, diversity and access speed.

World-Class Warehousing and Material Handling, 2nd Edition

2016-03-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Edward Frazelle

data data-warehouse storage-repositories

The classic guide to warehouse operations—now fully revised and updated with the latest strategies, best practices, and case studies Under the influence of e-commerce, supply chain collaboration, globalization, and quick response, warehouses today are being asked to do more with less. The expectation now is that warehouses execute an increase in smaller transactions, handle and store more items, provide more product and service customization, process more returns, offer more value-added services, and receive and ship more international orders. Compounding the difficulty of meeting this increased demand is the fact that warehouses now have less time to process an order, less margin for error and fewer skilled personnel. How can a warehouse not only stay afloat but thrive in today’s marketplace? Efficiency and accuracy are the keys to success in warehousing. Despite today's just-in-time production mentality and efforts to eliminate warehouses and their inventory carrying costs, effective warehousing continues to play a critical bottom-line role for companies worldwide. World-Class Warehousing and Material Handling, 2 nd Edition is the first widely published methodology for warehouse problem solving across all areas of the supply chain, providing an organized set of principles that can be used to streamline all types of warehousing operations. Readers will discover state-of-the-art tools, metrics, and methodologies for dramatically increasing the effectiveness, accuracy, and overall productivity of warehousing operations. This comprehensive resource provides authoritative answers on such topics as: The seven principles of world-class warehousing · Warehouse activity profiling · Warehouse performance measures · Warehouse automation and computerization · Receiving, storage and retrieval operations · Picking and packing, and humanizing warehouse operations · Written by one of today's recognized logistics thought leaders, this fully updated comprehensive resource presents timeless insights for planning and managing 21st-century warehouse operations. About the Author Dr. Ed Frazelle is President and CEO of Logistics Resources International and Executive Director of The RightChain Institute. He is also the founding director of The Logistics Institute at Georgia Tech, the world's largest center for supply chain research and professional education.

Spark

2016-03-21 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Brennon York , Ema Orhian , Kai Sasaki , Ilya Ganelin

AI/ML Big Data Hadoop Java Python Scala Cyber Security Spark SQL Data Streaming apache-spark data

Production-targeted Spark guidance with real-world use cases Spark: Big Data Cluster Computing in Production goes beyond general Spark overviews to provide targeted guidance toward using lightning-fast big-data clustering in production. Written by an expert team well-known in the big data community, this book walks you through the challenges in moving from proof-of-concept or demo Spark applications to live Spark in production. Real use cases provide deep insight into common problems, limitations, challenges, and opportunities, while expert tips and tricks help you get the most out of Spark performance. Coverage includes Spark SQL, Tachyon, Kerberos, ML Lib, YARN, and Mesos, with clear, actionable guidance on resource scheduling, db connectors, streaming, security, and much more. Spark has become the tool of choice for many Big Data problems, with more active contributors than any other Apache Software project. General introductory books abound, but this book is the first to provide deep insight and real-world advice on using Spark in production. Specific guidance, expert tips, and invaluable foresight make this guide an incredibly useful resource for real production settings. Review Spark hardware requirements and estimate cluster size Gain insight from real-world production use cases Tighten security, schedule resources, and fine-tune performance Overcome common problems encountered using Spark in production Spark works with other big data tools including MapReduce and Hadoop, and uses languages you already know like Java, Scala, Python, and R. Lightning speed makes Spark too good to pass up, but understanding limitations and challenges in advance goes a long way toward easing actual production implementation. Spark: Big Data Cluster Computing in Production tells you everything you need to know, with real-world production insight and expert guidance, tips, and tricks.

Getting Started with RethinkDB

2016-03-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Gianluca Tiepolo

JavaScript NoSQL data nosql-databases rethinkdb

Dive into the world of NoSQL databases with RethinkDB, a modern and powerful document-oriented database designed for developing real-time applications. Through this book, you'll explore essential RethinkDB features and learn how to integrate it seamlessly with Node.js, enabling you to build and deploy responsive web apps. What this Book will help me do Master the basics of installing and configuring RethinkDB on your system. Learn how to use the intuitive ReQL language to perform complex queries. Set up and manage RethinkDB clusters by mastering sharding and replication. Optimize database performance using indexing and advanced query techniques. Develop interactive real-time applications by integrating RethinkDB with Node.js. Author(s) None Tiepolo is an experienced developer and educator specializing in real-time database technologies. With extensive expertise in NoSQL solutions and hands-on experience in software engineering, Tiepolo combines a teacher's clarity with a programmer's practicality to make complex topics accessible. Who is it for? This book is tailored for developers eager to grasp RethinkDB, particularly those with an interest in building real-time applications. If you are new to database programming, you'll find accessible guidance here. Developers with basic experience in JavaScript or Node.js will gain further insights into real-world applications of these skills.

Finding Profit in Your Organization's Data

2016-03-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Cameron Turner

AI/ML IoT data

Using log data to create value isn’t new to mechanized industries. But in today’s data-driven environment—particularly with the rise of the Internet of Things—this type of data exhaust can be converted from inactive, latent assets to critical-path components of an overall production ecosystem. In this report, Cameron Turner provides three real-world case studies in which his company, The Data Guild, served as a product co-development consultancy. You’ll learn how an energy efficiency firm, a tech company, and a healthcare organization combined their historical logs with newly generated sensor data from the IoT. By leveraging machine learning to proactively identify efficiency and opportunity through prediction and recommendation, each company was able to deploy an ROI-generating solution and gain a significant business advantage. This report also provides advice for successfully implementing IoT data, as well as key factors to consider when performing data analysis.

Hadoop: What You Need to Know

2016-03-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Donald Miner

Analytics Data Analytics Data Science DWH Hadoop HDFS data

Hadoop has revolutionized data processing and enterprise data warehousing, but its explosive growth has come with a large amount of uncertainty, hype, and confusion. With this report, enterprise decision makers will receive a concise crash course on what Hadoop is and why it’s important. Hadoop represents a major shift from traditional enterprise data warehousing and data analytics, and its technology can be daunting at first. Donald Miner, founder of the data science firm Miner & Kasch, covers just enough ground so you can make intelligent decisions about Hadoop in your enterprise. By the end of this report, you’ll know the basics of technologies such as HDFS, MapReduce, and YARN, without becoming mired in the details. Not only will you learn the basics of how Hadoop works and why it’s such an important technology, you’ll get examples of how you should probably be using it.

Self-Service Analytics

2016-03-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Sandra Swanson

Analytics Data Governance Cyber Security Talend data data-lake storage-repositories

Organizations today are swimming in data, but most of them manage to analyze only a fraction of what they collect. To help build a stronger data-driven culture, many organizations are adopting a new approach called self-service analytics. This O’Reilly report examines how this approach provides data access to more people across a company, allowing business users to work with data themselves and create their own customized analyses. The result? More eyes looking at more data in more ways. Along with the perceived benefits, author Sandra Swanson also delves into the potential pitfalls of self-service analytics: balancing greater data access with concerns about security, data governance, and siloed data stores. Read this report and gain insights from enterprise tech (Yahoo), government (the City of Chicago), and disruptive retail (Warby Parker and Talend). Learn how these organizations are handling self-service analytics in practice. Sandra Swanson is a Chicago-based writer who’s covered technology, science, and business for dozens of publications, including ScientificAmerican.com. Connect with her on Twitter (@saswanson) or at www.saswanson.com.

talk-data.com

Activity Trend

Top Events

Top Speakers

Remanufacturing Modeling and Analysis

Oracle Database Problem Solving and Troubleshooting Handbook

Architecting Data Lakes

Mapping Workflows and Managing Knowledge

Relational Database Design and Implementation, 4th Edition

The Hadoop Performance Myth

Model-Based Testing Essentials - Guide to the ISTQB Certified Model-Based Tester

IBM System Storage DS8000 Performance Monitoring and Tuning

IBM Reference Architecture for Genomics, Power Systems Edition

IT Modernization using Catalogic ECX Copy Data Management and IBM Spectrum Storage

Oracle Database 12c Oracle RMAN Backup & Recovery

Hadoop Real-World Solutions Cookbook - Second Edition

MongoDB in Action, Second Edition

Big Data, Open Data and Data Development

World-Class Warehousing and Material Handling, 2nd Edition

Spark

Getting Started with RethinkDB

Finding Profit in Your Organization's Data

Hadoop: What You Need to Know

Self-Service Analytics