Data Management

Introduction to the New Mainframe: IBM z/VSE Basics

2016-03-02 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Martin Walbruehl , Wolfgang Bosch , Ingolf Salm , Wayne O'Brien , Jerry Johnston , Bill Ogden , Helmut Hellner , Hans Joachim Ebert , Marco Kroll , Wilhelm Mild , Mike Ebbers , Joerg Schmidbauer

IBM data data-engineering

This IBM® Redbooks® publication is based on the book Introduction to the New Mainframe: z/OS Basics, SG24-6366, which was produced by the International Technical Support Organization (ITSO), Poughkeepsie Center. It provides students of information systems technology with the background knowledge and skills necessary to begin using the basic facilities of a mainframe computer. For optimal learning, students are assumed to have successfully completed an introductory course in computer system concepts, such as computer organization and architecture, operating systems, data management, or data communications. They should also have successfully completed courses in one or more programming languages, and be PC literate. This textbook can also be used as a prerequisite for courses in advanced topics, or for internships and special studies. It is not intended to be a complete text covering all aspects of mainframe operation. It is also not a reference book that discusses every feature and option of the mainframe facilities. Others who can benefit from this course include experienced data processing professionals who have worked with non-mainframe platforms, or who are familiar with some aspects of the mainframe but want to become knowledgeable with other facilities and benefits of the mainframe environment. As we go through this course, we suggest that the instructor alternate between text, lecture, discussions, and hands-on exercises. Many of the exercises are cumulative, and are designed to show the student how to design and implement the topic presented. The instructor-led discussions and hands-on exercises are an integral part of the course, and can include topics not covered in this textbook. In this course, we use simplified examples and focus mainly on basic system functions. Hands-on exercises are provided throughout the course to help students explore the mainframe style of computing. At the end of this course, you will be familiar with the following information: Basic concepts of the mainframe, including its usage and architecture Fundamentals of IBM z/VSE® (VSE), an IBM z™ Systems entry mainframe operating system (OS) An understanding of mainframe workloads and the major middleware applications in use on mainframes today The basis for subsequent course work in more advanced, specialized areas of z/VSE, such as system administration or application programming

MongoDB Cookbook - Second Edition - Second Edition

2016-01-13 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Amol Nayak , Cyrus Dasadia

Cloud Computing Hadoop Java MongoDB NoSQL Python data data-engineering nosql-databases

Designed to help developers and administrators harness the full potential of MongoDB, this book provides clear instruction and practical guidance no matter your level. By exploring both fundamental aspects like installation and configuration, and advanced topics like using cloud services, this book serves as a comprehensive reference for anyone navigating the modern NoSQL database capabilities of MongoDB. What this Book will help me do Understand how to install and configure MongoDB for different environments, enabling efficient setup and operation. Master database administration skills, including monitoring and backup strategies, which are essential for stability and performance. Develop applications with MongoDB using Java and Python, allowing integration into modern tech stacks. Leverage advanced querying and indexing techniques, improving data retrieval and operational efficiency. Integrate MongoDB with cloud platforms and tools like Hadoop, enhancing scalability and expanded use cases. Author(s) None Dasadia and None Nayak are seasoned database professionals with extensive experience in MongoDB and NoSQL database systems. Their practical approach to technical writing focuses on real-world applications and providing solutions to complex challenges. With backgrounds in software development and data management, they ensure that readers have a hands-on learning experience. Their passion for spreading knowledge makes this book both instructional and engaging. Who is it for? This book is ideal for database administrators and software developers interested in adopting or expanding their knowledge of MongoDB. If you're a complete novice or someone with experience who seeks hands-on solutions and examples, this book offers value. It's particularly suited for professionals working with Java or Python, as examples focus on these programming languages. Whether you're enhancing your skills for personal projects or looking to implement MongoDB at work, this resource equips you with the know-how.

Data Lake Development with Big Data

2015-11-26 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Pradeep Pasupuleti , Beulah Salome Purra

Big Data Data Governance Data Lake Master Data Management Cyber Security data data-engineering data-lake storage-repositories

In "Data Lake Development with Big Data," you will explore the fundamental principles and techniques for constructing and managing a Data Lake tailored for your organization's big data challenges. This book provides practical advice and architectural strategies for ingesting, managing, and analyzing large-scale data efficiently and effectively. What this Book will help me do Learn how to architect a Data Lake from scratch tailored to your organizational needs. Master techniques for ingesting data using real-time and batch processing frameworks efficiently. Understand data governance, quality, and security considerations essential for scalable Data Lakes. Discover strategies for enabling users to explore data within the Data Lake effectively. Gain insights into integrating Data Lakes with Big Data analytic applications for high performance. Author(s) None Pasupuleti and Beulah Salome Purra bring their extensive expertise in big data and enterprise data management to this book. With years of hands-on experience designing and managing large-scale data architectures, their insights are rooted in practical knowledge and proven techniques. Who is it for? This book is ideal for data architects and senior managers tasked with adapting or creating scalable data solutions in enterprise contexts. Readers should have foundational knowledge of master data management and be familiar with Big Data technologies to derive maximum value from the content presented.

Introducing and Implementing IBM FlashSystem V9000

2015-11-02 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Karen Orlando , Arne Lehfeldt , Christophe Fagiano , Jon Herd , Detlef Helmbrecht , Carsten Larsen , Corne Lottering , Jeffrey Irving , Alexander (Al) Watson , Brett Kerns

IBM Microsoft Cyber Security VMware data data-engineering

Storage capacity and performance requirements are growing faster than ever before, and the costs of managing this growth are depleting more of the information technology (IT) budget. The IBM® FlashSystem™ V9000 is the premier, fully integrated, Tier 1, all-flash offering from IBM. It has changed the economics of today's data center by eliminating storage bottlenecks. Its software-defined storage features simplify data management, improve data security, and preserve your investments in storage. IBM FlashSystem® V9000 includes IBM FlashCore™ technology and advanced software-defined storage available in one solution in a compact 6U form factor. FlashSystem V9000 improves business application availability. It delivers greater resource utilization so you can get the most from your storage resources, and achieve a simpler, more scalable, and cost-efficient IT Infrastructure. This IBM Redbooks® publication provides information about IBM FlashSystem V9000 Software V7.5 and its new functionality. It describes the product architecture, software, hardware, and implementation, and provides hints and tips. It illustrates use cases and independent software vendor (ISV) scenarios that demonstrate real-world solutions, and also provides examples of the benefits gained by integrating the FlashSystem storage into business environments. Using IBM FlashSystem V9000 software version 7.5 functions, management tools, and interoperability combines the performance of FlashSystem architecture with the advanced functions of software-defined storage to deliver performance, efficiency, and functions that meet the needs of enterprise workloads that demand IBM MicroLatency® response time. This book offers FlashSystem V9000 scalability concepts and guidelines for planning, installing, and configuring, which can help environments scale up and out to add more flash capacity and expand virtualized systems. Port utilization methodologies are provided to help you maximize the full potential of IBM FlashSystem V9000 performance and low latency in your scalable environment. In addition, all of the functions that FlashSystem V9000 software version 7.5 brings are explained, including IBM HyperSwap® capability, increased IBM FlashCopy® bitmap space, Microsoft Windows offloaded data transfer (ODX), and direct 16 gigabits per second (Gbps) Fibre Channel host attach support. This book also describes support for VMware 6, which enhances and improves scalability in a VMware environment. This book is intended for pre-sales and post-sales technical support professionals, storage administrators, and anyone who wants to understand how to implement this exciting technology.

Advanced Data Management

2015-10-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Lena Wiese

Big Data Cloud Computing Computer Science Data Modelling JSON XML data data-engineering

Advanced data management has always been at the core of efficient database and information systems. Recent trends like big data and cloud computing have aggravated the need for sophisticated and flexible data storage and processing solutions. This book provides a comprehensive coverage of the principles of data management developed in the last decades with a focus on data structures and query languages. It treats a wealth of different data models and surveys the foundations of structuring, processing, storing and querying data according these models. Starting off with the topic of database design, it further discusses weaknesses of the relational data model, and then proceeds to convey the basics of graph data, tree-structured XML data, key-value pairs and nested, semi-structured JSON data, columnar and record-oriented data as well as object-oriented data. The final chapters round the book off with an analysis of fragmentation, replication and consistency strategies for data management in distributed databases as well as recommendations for handling polyglot persistence in multi-model databases and multi-database architectures. While primarily geared towards students of Master-level courses in Computer Science and related areas, this book may also be of benefit to practitioners looking for a reference book on data modeling and query processing. It provides both theoretical depth and a concise treatment of open source technologies currently on the market.

IBM Software for SAP Solutions

2015-09-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Khirallah Birkler , Navneet Goyal , Peter Bahrs , Nick Norris , Michel Laaroussi , Michael Love , Bernd Eberhardt , Jörg Stolzenberg , Andrew Stalnecker , Derek Jennings , Stefan Momma , Manfred Oevers , Yaro Dunchych , Joe Kaczmarek , Martin Oberhofer , James Hunter , Paul Pacholski , Pierre Valiquette

Analytics BI DevOps IBM Master Data Management SAP Cyber Security data data-engineering

SAP is a market leader in enterprise business application software. SAP solutions provide a rich set of composable application modules, and configurable functional capabilities that are expected from a comprehensive enterprise business application software suite. In most cases, companies that adopt SAP software remain heterogeneous enterprises running both SAP and non-SAP systems to support their business processes. Regardless of the specific scenario, in heterogeneous enterprises most SAP implementations must be integrated with a variety of non-SAP enterprise systems: Portals Messaging infrastructure Business process management (BPM) tools Enterprise Content Management (ECM) methods and tools Business analytics (BA) and business intelligence (BI) technologies Security Systems of record Systems of engagement The tooling included with SAP software addresses many needs for creating SAP-centric environments. However, the classic approach to implementing SAP functionality generally leaves the business with a rigid solution that is difficult and expensive to change and enhance. When SAP software is used in a large, heterogeneous enterprise environment, SAP clients face the dilemma of selecting the correct set of tools and platforms to implement SAP functionality, and to integrate the SAP solutions with non-SAP systems. This IBM® Redbooks® publication explains the value of integrating IBM software with SAP solutions. It describes how to enhance and extend pre-built capabilities in SAP software with best-in-class IBM enterprise software, enabling clients to maximize return on investment (ROI) in their SAP investment and achieve a balanced enterprise architecture approach. This book describes IBM Reference Architecture for SAP, a prescriptive blueprint for using IBM software in SAP solutions. The reference architecture is focused on defining the use of IBM software with SAP, and is not intended to address the internal aspects of SAP components. The chapters of this book provide a specific reference architecture for many of the architectural domains that are each important for a large enterprise to establish common strategy, efficiency, and balance. The majority of the most important architectural domain topics, such as integration, process optimization, master data management, mobile access, Enterprise Content Management, business intelligence, DevOps, security, systems monitoring, and so on, are covered in the book. However, there are several other architectural domains which are not included in the book. This is not to imply that these other architectural domains are not important or are less important, or that IBM does not offer a solution to address them. It is only reflective of time constraints, available resources, and the complexity of assembling a book on an extremely broad topic. Although more content could have been added, the authors feel confident that the scope of architectural material that has been included should provide organizations with a fantastic head start in defining their own enterprise reference architecture for many of the important architectural domains, and it is hoped that this book provides great value to those reading it. This IBM Redbooks publication is targeted to the following audiences: Client decision makers and solution architects leading enterprise transformation projects and wanting to gain further insight so that they can benefit from the integration of IBM software in large-scale SAP projects. IT architects and consultants integrating IBM technology with SAP solutions.

Managing Ever-Increasing Amounts of Data with IBM DB2 for z/OS: Using Temporal Data Management, Archive Transparency, and the DB2 Analytics Accelerator

2015-09-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Craig McKellar , Mehmet Cuneyt Goksu , Xiao Hui Wang , Claire McFeely

Analytics IBM data data-engineering ibm-db2 relational-databases

IBM® DB2® Version 11.1 for z/OS® (DB2 11 for z/OS or just DB2 11 throughout this book) is the fifteenth release of DB2 for IBM MVS™. The DB2 11 environment is available either for new installations of DB2 or for migrations from DB2 10 for z/OS subsystems only. This IBM Redbooks® publication describes enhancements that are available with DB2 11 for z/OS. The contents help database administrators to understand the new extensions and performance enhancements, to plan for ways to use the key new capabilities, and to justify the investment in installing or migrating to DB2 11. Businesses are faced with a global and increasingly competitive business environment, and they need to collect and analyze ever increasing amounts of data (Figure 1). Governments also need to collect and analyze large amounts of data. The main focus of this book is to introduce recent DB2 capability that can be used to address challenges facing organizations with storing and analyzing exploding amounts of business or organizational data, while managing risk and trying to meet new regulatory and compliance requirements. This book describes recent extensions to DB2 for z/OS in V10 and V11 that can help organizations address these challenges.

Managing the Data Lake

2015-09-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Andy Oram

Big Data Data Lake Hadoop Linux RDBMS Cyber Security data data-engineering data-lake storage-repositories

Organizations across many industries have recently created fast-growing repositories to deal with an influx of new data from many sources and often in multiple formats. To manage these data lakes, companies have begun to leave the familiar confines of relational databases and data warehouses for Hadoop and various big data solutions. But adopting new technology alone won’t solve the problem. Based on interviews with several experts in data management, author Andy Oram provides an in-depth look at common issues you’re likely to encounter as you consider how to manage business data. You’ll explore five key topic areas, including: Acquisition and ingestion: how to solve these problems with a degree of automation. Metadata: how to keep track of when data came in and how it was formatted, and how to make it available at later stages of processing. Data preparation and cleaning: what you need to know before you prepare and clean your data, and what needs to be cleaned up and how. Organizing workflows: what you should do to combine your tasks—ingestion, cataloging, and data preparation—into an end-to-end workflow. Access control: how to address security and access controls at all stages of data handling. Andy Oram, an editor at O’Reilly Media since 1992, currently specializes in programming. His work for O'Reilly includes the first books on Linux ever published commercially in the United States.

SAS Essentials: Mastering SAS for Data Analytics, 2nd Edition

2015-08-17 · O'Reilly Data Science Books O'Reilly Amazon

book

by Alan C. Elliott , Wayne A. Woodward

Analytics Data Analytics SAS analytics-platforms data data-science

A step-by-step introduction to using SAS statistical software as a foundational approach to data analysis and interpretation Presenting a straightforward introduction from the ground up, SAS Essentials: Mastering SAS for Data Analytics, Second Edition illustrates SAS using hands-on learning techniques and numerous real-world examples. Keeping different experience levels in mind, the highly-qualified author team has developed the book over 20 years of teaching introductory SAS courses. Divided into two sections, the first part of the book provides an introduction to data manipulation, statistical techniques, and the SAS programming language. The second section is designed to introduce users to statistical analysis using SAS Procedures. Featuring self-contained chapters to enhance the learning process, the Second Edition also includes: Programming approaches for the most up-to-date version of the SAS platform including information on how to use the SAS University Edition Discussions to illustrate the concepts and highlight key fundamental computational skills that are utilized by business, government, and organizations alike New chapters on reporting results in tables and factor analysis Additional information on the DATA step for data management with an emphasis on importing data from other sources, combining data sets, and data cleaning Updated ANOVA and regression examples as well as other data analysis techniques A companion website with the discussed data sets, additional code, and related PowerPoint slides SAS Essentials: Mastering SAS for Data Analytics, Second Edition is an ideal textbook for upper-undergraduate and graduate-level courses in statistics, data analytics, applied SAS programming, and statistical computer applications as well as an excellent supplement for statistical methodology courses. The book is an appropriate reference for researchers and academicians who require a basic introduction to SAS for statistical analysis and for preparation for the Basic SAS Certification Exam.

Virtualizing Hadoop: How to Install, Deploy, and Optimize Hadoop in a Virtualized Architecture

2015-07-20 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by George J. Trujillo Jr. , Charles Kim , Rommel Garcia , Justin Murray , Steven Jones

Big Data Cloud Computing Hadoop HDFS Linux Cyber Security SQL data data-engineering

Plan and Implement Hadoop Virtualization for Maximum Performance, Scalability, and Business Agility Enterprises running Hadoop must absorb rapid changes in big data ecosystems, frameworks, products, and workloads. Virtualized approaches can offer important advantages in speed, flexibility, and elasticity. Now, a world-class team of enterprise virtualization and big data experts guide you through the choices, considerations, and tradeoffs surrounding Hadoop virtualization. The authors help you decide whether to virtualize Hadoop, deploy Hadoop in the cloud, or integrate conventional and virtualized approaches in a blended solution. First, Virtualizing Hadoop reviews big data and Hadoop from the standpoint of the virtualization specialist. The authors demystify MapReduce, YARN, and HDFS and guide you through each stage of Hadoop data management. Next, they turn the tables, introducing big data experts to modern virtualization concepts and best practices. Finally, they bring Hadoop and virtualization together, guiding you through the decisions you’ll face in planning, deploying, provisioning, and managing virtualized Hadoop. From security to multitenancy to day-to-day management, you’ll find reliable answers for choosing your best Hadoop strategy and executing it. Coverage includes the following: • Reviewing the frameworks, products, distributions, use cases, and roles associated with Hadoop • Understanding YARN resource management, HDFS storage, and I/O • Designing data ingestion, movement, and organization for modern enterprise data platforms • Defining SQL engine strategies to meet strict SLAs • Considering security, data isolation, and scheduling for multitenant environments • Deploying Hadoop as a service in the cloud • Reviewing the essential concepts, capabilities, and terminology of virtualization • Applying current best practices, guidelines, and key metrics for Hadoop virtualization • Managing multiple Hadoop frameworks and products as one unified system • Virtualizing master and worker nodes to maximize availability and performance • Installing and configuring Linux for a Hadoop environment

Hadoop Application Architectures

2015-07-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Mark Grover (Stemma) , Ted Malaska , Jonathan Seidman , Gwen Shapira

Hadoop data data-engineering

Get expert guidance on architecting end-to-end data management solutions with Apache Hadoop. While many sources explain how to use various components in the Hadoop ecosystem, this practical book takes you through architectural considerations necessary to tie those components together into a complete tailored application, based on your particular use case.

Infinispan data grid platform definitive guide

2015-05-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Wagner Roberto dos Santos

API Java data data-engineering infinispan nosql-databases

Dive into creating highly scalable and performant applications with this comprehensive guide to the Infinispan data grid platform. Designed for Java enterprise developers, this book provides clear and approachable instructions for implementing sophisticated data management solutions using Infinispan. What this Book will help me do Install and configure Infinispan for optimized development environments. Understand and implement data caching topologies for diverse access patterns. Leverage scalable distributed transactions with detailed Apache JGroups integrations. Monitor and manage Infinispan instances using cutting-edge tools like RHQ and JMX. Develop a real-world application using Infinispan's APIs for practical insights. Author(s) The author(s) of this book are seasoned Java developers and experts in distributed caching and data grid technologies. With years of industry experience, they bring theoretical insights paired with pragmatic application know-how. Their approach emphasizes teaching through real-life use cases, practical applications, and clear explanations, making complex concepts accessible to all readers. Who is it for? This book is perfect for Java enterprise developers who are looking to elevate their architecture skills by building applications that demand scalability and high performance. Readers should have a solid understanding of Java, though no prior experience using Infinispan is required. Whether you're transitioning from traditional databases or improving your grasp of distributed caching, this book suits your needs.

Implementation Best Practices for IBM DB2 BLU Acceleration with SAP BW on IBM Power Systems

2015-05-11 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dino Quintero , Adriana Melges Quintanilha Weingart , Speitim Velic , Yukiko Itaya

Analytics Data Analytics IBM SAP XML data data-engineering ibm-db2 relational-databases

BLU Acceleration is a new technology that has been developed by IBM® and integrated directly into the IBM DB2® engine. BLU Acceleration is a new storage engine along with integrated run time (directly into the core DB2 engine) to support the storage and analysis of column-organized tables. The BLU Acceleration processing is parallel to the regular, row-based table processing found in the DB2 engine. This is not a bolt-on technology nor is it a separate analytic engine that sits outside of DB2. Much like when IBM added XML data as a first class object within the database along with all the storage and processing enhancements that came with XML, now IBM has added column-organized tables directly into the storage and processing engine of DB2. This IBM Redbooks® publication shows examples on an IBM Power Systems™ entry server as a starter configuration for small organizations, and build larger configurations with IBM Power Systems larger servers. This publication takes you through how to build a BLU Acceleration solution on IBM POWER® having SAP Landscape integrated to it. This publication implements SAP NetWeaver Business Warehouse Systems as part of the scenario using another DB2 Feature called Near-Line Storage (NLS), on IBM POWER virtualization features to develop and document best recommendation scenarios. This publication is targeted towards technical professionals (DBAs, data architects, consultants, technical support staff, and IT specialists) responsible for delivering cost-effective data management solutions to provide the best system configuration for their clients' data analytics on Power Systems.

Hadoop Essentials

2015-04-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Shiva Achari

Analytics Big Data Data Analytics Hadoop HDFS Hive Spark data data-engineering

In 'Hadoop Essentials,' you'll embark on an engaging journey to master the Hadoop ecosystem. This book covers fundamental to advanced topics, from HDFS and MapReduce to real-time analytics with Spark, empowering you to handle modern data challenges efficiently. What this Book will help me do Understand the core components of Hadoop, including HDFS, YARN, and MapReduce, for foundational knowledge. Learn to optimize Big Data architectures and improve application performance. Utilize tools like Hive and Pig for efficient data querying and processing. Master data ingestion technologies like Sqoop and Flume for seamless data management. Achieve fluency in real-time data analytics using modern tools like Apache Spark and Apache Storm. Author(s) None Achari is a seasoned expert in Big Data and distributed systems with in-depth knowledge of the Hadoop ecosystem. With years of experience in both development and teaching, they craft content that bridges practical know-how with theoretical insights in a highly accessible style. Who is it for? This book is perfect for system and application developers aiming to learn practical applications of Hadoop. It suits professionals seeking solutions to real-world Big Data challenges as well as those familiar with distributed systems basics and looking to deepen their expertise in advanced data analysis.

Statistical Programming in SAS

2015-04-17 · O'Reilly Data Science Books O'Reilly Amazon

book

by John Bailer

SAS data data-science data-science-tasks statistics

In Statistical Programming in SAS, author A. John Bailer integrates SAS tools with interesting statistical applications and uses SAS 9.2 as a platform to introduce programming ideas for statistical analysis, data management, and data display and simulation. Written using a reader-friendly and narrative style, the book includes extensive examples and case studies to present a well-structured introduction to programming issues.

Learning MySQL and MariaDB

2015-03-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Russell J.T. Dyer

API MariaDB MySQL data data-engineering relational-databases

If you’re a programmer new to databases—or just new to MySQL and its community-driven variant, MariaDB—you’ve found the perfect introduction. This hands-on guide provides an easy, step-by-step approach to installing, using, and maintaining these popular relational database engines. Author Russell Dyer, Curriculum Manager at MariaDB and former editor of the MySQL Knowledge Base, takes you through database design and the basics of data management and manipulation, using real-world examples and many practical tips. Exercises and review questions help you practice what you’ve just learned. Create and alter MySQL tables and specify fields and columns within them Learn how to insert, select, update, delete, join, and subquery data, using practical examples Use built-in string functions to find, extract, format, and convert text from columns Learn functions for mathematical or statistical calculations, and for formatting date and time values Perform administrative duties such as managing user accounts, backing up databases, and importing large amounts of data Use APIs to connect and query MySQL and MariaDB with PHP and other languages

Using R and RStudio for Data Management, Statistical Analysis, and Graphics, 2nd Edition

2015-03-10 · O'Reilly Data Science Books O'Reilly Amazon

book

by Nicholas J. Horton , Ken Kleinman

data data-science data-science-tools r

This book covers the aspects of R most often used by statistical analysts. Incorporating the use of RStudio and the latest R packages, this second edition offers new chapters on simulation, special topics, and case studies.

Field Guide to Hadoop

2015-03-02 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Marshall Presser , Kevin Sitto

Avro Big Data Cassandra Chef Cloud Computing Docker Hadoop Apache HBase HDFS Hive JSON MongoDB +5 more

If your organization is about to enter the world of big data, you not only need to decide whether Apache Hadoop is the right platform to use, but also which of its many components are best suited to your task. This field guide makes the exercise manageable by breaking down the Hadoop ecosystem into short, digestible sections. You’ll quickly understand how Hadoop’s projects, subprojects, and related technologies work together. Each chapter introduces a different topic—such as core technologies or data transfer—and explains why certain components may or may not be useful for particular needs. When it comes to data, Hadoop is a whole new ballgame, but with this handy reference, you’ll have a good grasp of the playing field. Topics include: Core technologies—Hadoop Distributed File System (HDFS), MapReduce, YARN, and Spark Database and data management—Cassandra, HBase, MongoDB, and Hive Serialization—Avro, JSON, and Parquet Management and monitoring—Puppet, Chef, Zookeeper, and Oozie Analytic helpers—Pig, Mahout, and MLLib Data transfer—Scoop, Flume, distcp, and Storm Security, access control, auditing—Sentry, Kerberos, and Knox Cloud computing and virtualization—Serengeti, Docker, and Whirr

Big Data and Health Analytics

2014-12-20 · O'Reilly Data Science Books O'Reilly Amazon

book

by Katherine Marconi , Harold Lehmann

Analytics Big Data Data Analytics Cyber Security data data-science healthcare-analytics

Data availability is surpassing existing paradigms for governing, managing, analyzing, and interpreting health data. Big Data and Health Analytics provides frameworks, use cases, and examples that illustrate the role of big data and analytics in modern health care, including how public health information can inform health delivery. Written for health care professionals and executives, this is not a technical book on the use of statistics and machine-learning algorithms for extracting knowledge out of data, nor a book on the intricacies of database design. Instead, this book presents the current thinking of academic and industry researchers and leaders from around the world. Using non-technical language, this book is accessible to health care professionals who might not have an IT and analytics background. It includes case studies that illustrate the business processes underlying the use of big data and health analytics to improve health care delivery. Highlighting lessons learned from the case studies, the book supplies readers with the foundation required for further specialized study in health analytics and data management. Coverage includes community health information, information visualization which offers interactive environments and analytic processes that support exploration of EHR data, the governance structure required to enable data analytics and use, federal regulations and the constraints they place on analytics, and information security. Links to websites, videos, articles, and other online content that expand and support the primary learning objectives for each major section of the book are also included to help you develop the skills you will need to achieve quality improvements in health care delivery through the effective use of data and analytics.

IBM Tivoli Storage Productivity Center Beyond the Basics

2014-11-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dave Remkus , Karen Orlando , Tiberiu Hajas , Andy Benbenek , Robin Badart , Christian Sonder

Data Collection IBM Microsoft Cyber Security data data-engineering ibm-tivoli

You have installed and performed the basic customization of IBM® Tivoli® Storage Productivity Center. You have collected performance data collection and generated reports. Now it’s time to learn the best ways to use the software to manage your storage infrastructure. This IBM Redbooks® publication shows the best way to set up the software, based on your storage environment, and then how to use it to manage your infrastructure. It includes experiences from IBM clients and staff and covers the following topics: Architectural design techniques (sizing your environment, single versus multiple installations, physical versus virtual servers, deployment in a large, existing storage infrastructure) Database and server considerations (database backup and restoration methods and scripts, using IBM Data Studio Client for database administration, database placement and relocation, repository sizing and tuning, moving and migrating the server) Alerting, monitoring and reporting (monitoring thresholds and alerts, performance management and analysis of reports, real-time performance monitoring for IBM SAN Volume Controller) Security considerations (Tivoli Storage Productivity Center internal user IDs, user authentication configuration methods, how and why to set up and change passwords, configuring, querying, and testing LDAP and Microsoft Active Directory) Heath checks (server heath and logs, health and recoverability of IBM DB2® databases, using the Database Maintenance tool) Data management techniques (how to spot unusual growth incidents, scripted actions for Tivoli Storage manager and hierarchical storage management) This book is for storage administrators who are responsible for the performance and growth of the IT storage infrastructure.

talk-data.com

Activity Trend

Top Events

Top Speakers

Introduction to the New Mainframe: IBM z/VSE Basics

MongoDB Cookbook - Second Edition - Second Edition

Data Lake Development with Big Data

Introducing and Implementing IBM FlashSystem V9000

Advanced Data Management

IBM Software for SAP Solutions

Managing Ever-Increasing Amounts of Data with IBM DB2 for z/OS: Using Temporal Data Management, Archive Transparency, and the DB2 Analytics Accelerator

Managing the Data Lake

SAS Essentials: Mastering SAS for Data Analytics, 2nd Edition

Virtualizing Hadoop: How to Install, Deploy, and Optimize Hadoop in a Virtualized Architecture

Hadoop Application Architectures

Infinispan data grid platform definitive guide

Implementation Best Practices for IBM DB2 BLU Acceleration with SAP BW on IBM Power Systems

Hadoop Essentials

Statistical Programming in SAS

Learning MySQL and MariaDB

Using R and RStudio for Data Management, Statistical Analysis, and Graphics, 2nd Edition

Field Guide to Hadoop

Big Data and Health Analytics

IBM Tivoli Storage Productivity Center Beyond the Basics