O'Reilly Data Engineering Books

Integrating the IBM MQ Appliance into your IBM MQ Infrastructure

2015-11-02 O'Reilly Amazon

book

Neil Casey , Rufus Russell , Andy Emmett

data data-engineering IBM

This IBM® Redbooks® publication describes the IBM MQ Appliance M2000, an application connectivity option that combines secure, reliable IBM MQ messaging with the simplicity and low overall costs of a hardware appliance. This book presents underlying concepts and practical advice for integrating the IBM MQ Appliance M2000 into an IBM MQ infrastructure. Therefore, it is aimed at enterprises that are considering a possible first use of IBM MQ and the IBM MQ Appliance M2000 and those that already identified the appliance as a logical addition to their messaging environment. Details about new functionality and changes in approaches to application messaging are also described. The authors' goal is to help readers make informed design and implementation decisions so that the users can successfully integrate the IBM MQ Appliance M2000 into their environments. A broad understanding of enterprise messaging is required to fully comprehend the details that are provided in this book. Readers are assumed to have at least some familiarity and experience with complimentary IBM messaging products.

Introducing and Implementing IBM FlashSystem V9000

2015-11-02 O'Reilly Amazon

book

Karen Orlando , Arne Lehfeldt , Christophe Fagiano , Jon Herd , Detlef Helmbrecht , Carsten Larsen , Alexander Watson , Corne Lottering , Jeffrey Irving , Brett Kerns

data data-engineering IBM Data Management Microsoft Cyber Security

Storage capacity and performance requirements are growing faster than ever before, and the costs of managing this growth are depleting more of the information technology (IT) budget. The IBM® FlashSystem™ V9000 is the premier, fully integrated, Tier 1, all-flash offering from IBM. It has changed the economics of today's data center by eliminating storage bottlenecks. Its software-defined storage features simplify data management, improve data security, and preserve your investments in storage. IBM FlashSystem® V9000 includes IBM FlashCore™ technology and advanced software-defined storage available in one solution in a compact 6U form factor. FlashSystem V9000 improves business application availability. It delivers greater resource utilization so you can get the most from your storage resources, and achieve a simpler, more scalable, and cost-efficient IT Infrastructure. This IBM Redbooks® publication provides information about IBM FlashSystem V9000 Software V7.5 and its new functionality. It describes the product architecture, software, hardware, and implementation, and provides hints and tips. It illustrates use cases and independent software vendor (ISV) scenarios that demonstrate real-world solutions, and also provides examples of the benefits gained by integrating the FlashSystem storage into business environments. Using IBM FlashSystem V9000 software version 7.5 functions, management tools, and interoperability combines the performance of FlashSystem architecture with the advanced functions of software-defined storage to deliver performance, efficiency, and functions that meet the needs of enterprise workloads that demand IBM MicroLatency® response time. This book offers FlashSystem V9000 scalability concepts and guidelines for planning, installing, and configuring, which can help environments scale up and out to add more flash capacity and expand virtualized systems. Port utilization methodologies are provided to help you maximize the full potential of IBM FlashSystem V9000 performance and low latency in your scalable environment. In addition, all of the functions that FlashSystem V9000 software version 7.5 brings are explained, including IBM HyperSwap® capability, increased IBM FlashCopy® bitmap space, Microsoft Windows offloaded data transfer (ODX), and direct 16 gigabits per second (Gbps) Fibre Channel host attach support. This book also describes support for VMware 6, which enhances and improves scalability in a VMware environment. This book is intended for pre-sales and post-sales technical support professionals, storage administrators, and anyone who wants to understand how to implement this exciting technology.

Storytelling with Data: A Data Visualization Guide for Business Professionals

2015-11-02 O'Reilly Amazon

book

Cole Nussbaumer Knaflic

data data-science data-science-tasks data-visualization DataViz

Don't simply show your data—tell a story with it! Storytelling with Data teaches you the fundamentals of data visualization and how to communicate effectively with data. You'll discover the power of storytelling and the way to make data a pivotal point in your story. The lessons in this illuminative text are grounded in theory, but made accessible through numerous real-world examples—ready for immediate application to your next graph or presentation. Storytelling is not an inherent skill, especially when it comes to data visualization, and the tools at our disposal don't make it any easier. This book demonstrates how to go beyond conventional tools to reach the root of your data, and how to use your data to create an engaging, informative, compelling story. Specifically, you'll learn how to: Understand the importance of context and audience Determine the appropriate type of graph for your situation Recognize and eliminate the clutter clouding your information Direct your audience's attention to the most important parts of your data Think like a designer and utilize concepts of design in data visualization Leverage the power of storytelling to help your message resonate with your audience Together, the lessons in this book will help you turn your data into high impact visual stories that stick with your audience. Rid your world of ineffective graphs, one exploding 3D pie chart at a time. There is a story in your data— Storytelling with Data will give you the skills and power to tell it!

WHOIS Running the Internet: Protocol, Policy, and Privacy

2015-11-02 O'Reilly Amazon

book

Garth O. Bruen

data data-engineering data-security-privacy data security & privacy Cyber Security

Discusses the evolution of WHOIS and how policy changes will affect WHOIS' place in IT today and in the future This book provides a comprehensive overview of WHOIS. The text begins with an introduction to WHOIS and an in-depth coverage of its forty-year history. Afterwards it examines how to use WHOIS and how WHOIS fits in the overall structure of the Domain Name System (DNS). Other technical topics covered include WHOIS query code and WHOIS server details. The book also discusses current policy developments and implementations, reviews critical policy documents, and explains how they will affect the future of the Internet and WHOIS. Additional resources and content updates will be provided through a supplementary website. Includes an appendix with information on current and authoritative WHOIS services around the world Provides illustrations of actual WHOIS records and screenshots of web-based WHOIS query interfaces with instructions for navigating them Explains network dependencies and processes related to WHOIS utilizing flowcharts Contains advanced coding for programmers WHOIS Running the Internet: Protocol, Policy, and Privacy is written primarily for internet developers, policy developers, industry professionals in law enforcement, digital forensic investigators, and intellectual property attorneys. Garth O. Bruen is an Internet policy and security researcher whose work has been published in the Wall Street Journal and the Washington Post. Since 2012 Garth Bruen has served as the North American At-Large Chair to the Internet Corporation of Assigned Names and Numbers (ICANN). In 2003 Bruen created KnujOn.com with his late father, Dr. Robert Bruen, to process and investigate Internet abuse complaints (SPAM) from consumers. Bruen has trained and advised law enforcement at the federal and local levels on malicious use of the Domain Name System in the way it relates to the WHOIS record system. He has presented multiple times to the High Technology Crime Investigation Association (HTCIA) as well as other cybercrime venues including the Anti-Phishing Working Group (APWG) and the National Center for Justice and the Rule of Law at The University of Mississippi School of Law. Bruen also teaches the Fisher College Criminal Justice School in Boston where he develops new approaches to digital crime.

Real Time Analytics with SAP Hana

2015-10-30 O'Reilly Amazon

book

Vinay Singh

data data-engineering relational-databases Analytics Data Modelling SAP

"Real Time Analytics with SAP HANA" offers a comprehensive, step-by-step guide to mastering analytics and data modeling in the powerful SAP HANA environment. This book covers everything from basic data modeling concepts to more advanced techniques like creating calculation views and leveraging SAP HANA artifacts. What this Book will help me do Understand and build analytics/data models in the SAP HANA environment. Create schemas, packages, and delivery units in SAP HANA Studio. Master real-time data replication using SLT and SAP HANA Studio. Learn about full-text search, fuzzy search, and other analytical capabilities in SAP HANA. Develop comprehensive use cases combining SAP HANA concepts and tools. Author(s) Vinay Singh, the author of this book, is a seasoned SAP HANA expert with extensive experience in analytics and data modeling. He has worked on multiple SAP HANA implementation and migration projects and brings this expertise into his writing. His practical examples and hands-on approach make SAP HANA concepts accessible to learners at all levels. Who is it for? This book is ideal for SAP HANA data modelers, developers, implementation or migration consultants, project managers, and architects. It is designed for individuals aiming to enhance their skill set in SAP HANA and master real-time analytics. Whether you are actively working with SAP HANA or just starting, this book will serve as a valuable guide.

Web Development with MongoDB and NodeJS - Second Edition

2015-10-30 O'Reilly Amazon

book

Bruno Joseph D'mello , Jason Krol , Mithun Satheesh

data data-engineering nosql-databases MongoDB API AWS

Discover how to build a full-featured, interactive web application from scratch using Node.js and MongoDB in this comprehensive guide. You will learn to set up your development environment, create a web server with Express.js, and integrate MongoDB for data persistence. By the end of this book, you will have the knowledge and skills to develop and deploy robust web applications ready for the cloud. What this Book will help me do Set up a Node.js development environment and connect it to MongoDB. Develop a web server using Express.js and write integrated APIs. Implement dynamic HTML pages leveraging the Handlebars template engine. Build efficient and scalable data-driven features using Mongoose ODM. Deploy web applications seamlessly to cloud platforms like Heroku, AWS, or Azure. Author(s) This book was co-authored by experts None Satheesh, None Joseph D'mello, and Jason Krol, who bring years of experience in software development and expertise in modern web technologies. With a focus on practical application and best practices, the authors aim to empower readers to succeed in real-world development projects using the innovative Node.js and MongoDB stack. Who is it for? This book is tailored for developers who have a basic understanding of JavaScript and HTML and wish to advance their web development skills. If you are motivated to learn how to leverage Node.js and MongoDB for full-stack development or are curious about building and deploying complete web applications, this book is ideal for you. It addresses learners from early career to experienced developers looking to strengthen their skills in modern development stacks.

Advanced Data Management

2015-10-29 O'Reilly Amazon

book

Lena Wiese

data data-engineering Big Data Cloud Computing Computer Science Data Management

Advanced data management has always been at the core of efficient database and information systems. Recent trends like big data and cloud computing have aggravated the need for sophisticated and flexible data storage and processing solutions. This book provides a comprehensive coverage of the principles of data management developed in the last decades with a focus on data structures and query languages. It treats a wealth of different data models and surveys the foundations of structuring, processing, storing and querying data according these models. Starting off with the topic of database design, it further discusses weaknesses of the relational data model, and then proceeds to convey the basics of graph data, tree-structured XML data, key-value pairs and nested, semi-structured JSON data, columnar and record-oriented data as well as object-oriented data. The final chapters round the book off with an analysis of fragmentation, replication and consistency strategies for data management in distributed databases as well as recommendations for handling polyglot persistence in multi-model databases and multi-database architectures. While primarily geared towards students of Master-level courses in Computer Science and related areas, this book may also be of benefit to practitioners looking for a reference book on data modeling and query processing. It provides both theoretical depth and a concise treatment of open source technologies currently on the market.

Enterprise Search, 2nd Edition

2015-10-27 O'Reilly Amazon

book

Martin White

data data-engineering search

Is your organization rapidly accumulating more information than you know how to manage? This updated edition helps you create an enterprise search solution based on more than just technology. Author Martin White shows you how to plan and implement a managed search environment that meets the needs of your business and your employees. Learn why it’s vital to have a dedicated staff manage your search technology and support your users.

IBM Spectrum Virtualize and SAN Volume Controller Enhanced Stretched Cluster with VMware

2015-10-27 O'Reilly Amazon

book

Ole Rasmussen , Jon Tate , Angelo Bernasconi , Antonio Rainero

data data-engineering IBM Cloud Computing VMware

This IBM® Redbooks® publication describes the IBM storage area network (SAN) and IBM Spectrum™ Virtualize, and SAN Volume Controller Enhanced Stretched Cluster configuration when combined with VMware. It describe guidelines, settings, and implementation steps necessary to achieve a satisfactory implementation. Business continuity and continuous availability of applications are among the top requirements for many organizations today. Advances in virtualization, storage, and networking make enhanced business continuity possible. Information technology solutions can now be designed to manage both planned and unplanned outages, and to take advantage of the flexibility, efficient use of resources, and cost savings that cloud computing offers. The IBM Enhanced Stretched Cluster design offers significant functions for maintaining business continuity in a VMware environment. You can dynamically move applications across data centers without interruption to those applications. The live application mobility across data centers relies on these products and technologies: IBM Spectrum Virtualize and SAN Volume Controller Enhanced Stretched Cluster Solution VMware Metro vMotion for live migration of virtual machines A Layer 2 IP Network and storage networking infrastructure for high-performance traffic management Data center interconnection

Java Persistence with Hibernate, Second Edition

2015-10-27 O'Reilly Amazon

book

Gary Gregory , Christian Bauer

data data-engineering database-management-tools object-relational-mapping hibernate Java

Java Persistence with Hibernate, Second Edition explores Hibernate by developing an application that ties together hundreds of individual examples. In this revised edition, authors Christian Bauer, Gavin King, and Gary Gregory cover Hibernate 5 in detail with the Java Persistence 2.1 standard (JSR 338). All examples have been updated for the latest Hibernate and Java EE specification versions. About the Technology Persistence--the ability of data to outlive an instance of a program--is central to modern applications. Hibernate, the most popular Java persistence tool, offers automatic and transparent object/relational mapping, making it a snap to work with SQL databases in Java applications. About the Book Java Persistence with Hibernate, Second Edition explores Hibernate by developing an application that ties together hundreds of individual examples. You'll immediately dig into the rich programming model of Hibernate, working through mappings, queries, fetching strategies, transactions, conversations, caching, and more. Along the way you'll find a well-illustrated discussion of best practices in database design and optimization techniques. In this revised edition, authors Christian Bauer, Gavin King, and Gary Gregory cover Hibernate 5 in detail with the Java Persistence 2.1 standard (JSR 338). All examples have been updated for the latest Hibernate and Java EE specification versions. What's Inside Object/relational mapping concepts Efficient database application design Comprehensive Hibernate and Java Persistence reference Integration of Java Persistence with EJB, CDI, JSF, and JAX-RS Unmatched breadth and depth About the Reader The book assumes a working knowledge of Java. About the Authors Christian Bauer is a member of the Hibernate developer team and a trainer and consultant. Gavin King is the founder of the Hibernate project and a member of the Java Persistence expert group (JSR 220). Gary Gregory is a principal software engineer working on application servers and legacy integration. Quotes The most comprehensive book about Hibernate Persistence ... works well both as a tutorial and as a reference. - Sergio Fernandez Gonzalez, Accenture Software The essential guidebook for navigating the intricacies of Hibernate. - José Diaz, OptumHealth An excellent update to a classic and essential book. - Jerry Goodnough, Cognitive Medical Systems The must-have reference for every Hibernate user. - Stephan Heffner, SPIEGEL-Verlag Rudolf Augstein GmbH & Co. KG

MariaDb Essentials

2015-10-27 O'Reilly Amazon

book

Emilien Kenler

data data-engineering relational-databases MySQL MariaDB SQL

MariaDB Essentials is an approachable yet comprehensive guide to mastering SQL with MariaDB, an advanced open-source database server compatible with MySQL. Through this book, you'll gain a solid foundation in both the basics of database management and the unique features that make MariaDB stand out, all while using practical examples to enhance your learning experience. What this Book will help me do Understand how to install, configure, and start using MariaDB efficiently. Learn how to structure and manage your data using databases, tables, indexes, and SQL queries. Master key advanced features of MariaDB such as virtual columns, dynamic columns, and full-text search. Explore practical operations including importing, exporting, and manipulating data using MariaDB. Gain the skills to work with MariaDB's community tools and innovative storage engines like CONNECT. Author(s) None Kenler is a dedicated technology author and instructor who specializes in database systems and services. With a passion for making complex technical concepts approachable, None has crafted a variety of learning materials that empower readers to fully utilize their tools. None's experience with real-world database management enriches this book, providing readers with both theoretical and practical expertise. Who is it for? This book is perfect for database enthusiasts ranging from beginners unfamiliar with SQL to seasoned users looking to deepen their understanding of MariaDB. It's ideal for working professionals transitioning from MySQL to fully appreciate the unique capabilities of MariaDB. Whether you're a student, developer, or database administrator, this book will guide you in achieving your goals.

Microsoft Mapping: Geospatial Development in Windows 10 with Bing Maps and C#, Second Edition

2015-10-27 O'Reilly Amazon

book

Ray Rischpater , Carmen Au

data data-engineering location-data geographic-information-system-gis web-mapping Azure

This revised edition of Microsoft Mapping includes the latest details about SQL Server 2014 and the new 3D and Streetside-capable map control for Windows 10 applications. It contains updated chapters on Microsoft Azure and Power Map for Excel plus a new chapter on Bing Maps for Universal Windows. The book tells a story, from beginning to end, of planning and deploying a single geospatial application built using Microsoft technologies from end-to-end. Readers are expected to have basic familiarity with the fundamentals of developing for Microsoft platforms (some understanding of basic SQL, C#, .NET, and WCF); as readers work through the book they will build on their existing skills so that they will be able to deploy geospatial applications for social networking, data collection, enterprise management, or other purposes.

Hadoop 2 Quick-Start Guide: Learn the Essentials of Big Data Computing in the Apache Hadoop 2 Ecosystem

2015-10-26 O'Reilly Amazon

book

Douglas Eadline

data data-engineering Hadoop Analytics Big Data Data Analytics

Get Started Fast with Apache Hadoop ® 2, YARN, and Today’s Hadoop Ecosystem With Hadoop 2.x and YARN, Hadoop moves beyond MapReduce to become practical for virtually any type of data processing. Hadoop 2.x and the Data Lake concept represent a radical shift away from conventional approaches to data usage and storage. Hadoop 2.x installations offer unmatched scalability and breakthrough extensibility that supports new and existing Big Data analytics processing methods and models. Hadoop ® 2 Quick-Start Guide is the first easy, accessible guide to Apache Hadoop 2.x, YARN, and the modern Hadoop ecosystem. Building on his unsurpassed experience teaching Hadoop and Big Data, author Douglas Eadline covers all the basics you need to know to install and use Hadoop 2 on personal computers or servers, and to navigate the powerful technologies that complement it. Eadline concisely introduces and explains every key Hadoop 2 concept, tool, and service, illustrating each with a simple “beginning-to-end” example and identifying trustworthy, up-to-date resources for learning more. This guide is ideal if you want to learn about Hadoop 2 without getting mired in technical details. Douglas Eadline will bring you up to speed quickly, whether you’re a user, admin, devops specialist, programmer, architect, analyst, or data scientist. Coverage Includes Understanding what Hadoop 2 and YARN do, and how they improve on Hadoop 1 with MapReduce Understanding Hadoop-based Data Lakes versus RDBMS Data Warehouses Installing Hadoop 2 and core services on Linux machines, virtualized sandboxes, or clusters Exploring the Hadoop Distributed File System (HDFS) Understanding the essentials of MapReduce and YARN application programming Simplifying programming and data movement with Apache Pig, Hive, Sqoop, Flume, Oozie, and HBase Observing application progress, controlling jobs, and managing workflows Managing Hadoop efficiently with Apache Ambari–including recipes for HDFS to NFSv3 gateway, HDFS snapshots, and YARN configuration Learning basic Hadoop 2 troubleshooting, and installing Apache Hue and Apache Spark

VersaStack Solution by Cisco and IBM with SQL, Spectrum Control, and Spectrum Protect

2015-10-26 O'Reilly Amazon

book

Asher Pemberton , Sanjeev Naldurgkar , Vadi Bhatt , Jon Tate , Filip Van Den Neucker

data data-engineering IBM ibm-spectrum-control Agile/Scrum Analytics

Dynamic organizations want to accelerate growth while reducing costs. To do so, they must speed the deployment of business applications and adapt quickly to any changes in priorities. Organizations today require an IT infrastructure to be easy, efficient, and versatile. The VersaStack solution by Cisco and IBM® can help you accelerate the deployment of your data centers. It reduces costs by more efficiently managing information and resources while maintaining your ability to adapt to business change. The VersaStack solution combines the innovation of Cisco UCS Integrated Infrastructure with the efficiency of the IBM Storwize® storage system. The Cisco UCS Integrated Infrastructure includes the Cisco Unified Computing System (Cisco UCS), Cisco Nexus and Cisco MDS switches, and Cisco UCS Director. The IBM Storwize V7000 enhances virtual environments with its Data Virtualization, IBM Real-time Compression™, and IBM Easy Tier® features. These features deliver extraordinary levels of performance and efficiency. The VersaStack solution is Cisco Application Centric Infrastructure (ACI) ready. Your IT team can build, deploy, secure, and maintain applications through a more agile framework. Cisco Intercloud Fabric capabilities help enable the creation of open and highly secure solutions for the hybrid cloud. These solutions accelerate your IT transformation while delivering dramatic improvements in operational efficiency and simplicity. Cisco and IBM are global leaders in the IT industry. The VersaStack solution gives you the opportunity to take advantage of integrated infrastructure solutions that are targeted at enterprise applications, analytics, and cloud solutions. The VersaStack solution is backed by Cisco Validated Designs (CVD) to provide faster delivery of applications, greater IT efficiency, and less risk. This IBM Redbooks® publication is aimed at experienced storage administrators that are tasked with deploying a VersaStack solution with Microsoft Sequel (SQL), IBM Spectrum™ Protect, and IBM Spectrum Control™.

Expert Performance Indexing in SQL Server, Second Edition

2015-10-22 O'Reilly Amazon

book

Grant Fritchey , Jason Strate

data data-engineering SQL

This book is a deep dive into perhaps the single-most important facet of good performance: indexes, and how to best use them. The book begins in the shallow waters with explanations of the types of indexes and how they are stored in databases. Moving deeper into the topic, and further into the book, you will look at the statistics that are accumulated both by indexes and on indexes. You’ll better understand what indexes are doing in the database and what can be done to mitigate and improve their effect on performance. The final destination is a guided tour through a number of real life scenarios showing approaches you can take to investigate, mitigate, and improve the performance of your database. • Defines the types of indexes and their implementation options • Provides use cases and common patterns in applying indexing • Describes and explain the index metadata and statistics • Provides a framework of strategies and approaches for indexing databases

IBM PowerVC Version 1.2.3: Introduction and Configuration

2015-10-21 O'Reilly Amazon

book

Benoit Creau , Scott Vetter , Marco Barboni , Liang Hou Xu , Guillermo Corti

data data-engineering IBM Linux Virtual Machine

IBM® Power Virtualization Center (PowerVC™) is an advanced enterprise virtualization management offering for IBM® Power Systems™, which is based on the OpenStack framework. This IBM Redbooks® publication introduces PowerVC and helps you understand its functions, planning, installation, and setup. Starting with PowerVC version 1.2.2, the Express Edition offering is no longer available and the Standard Edition is the only offering. PowerVC supports both large and small deployments, either by managing IBM PowerVM® that is controlled with the Hardware Management Console (HMC) or by managing PowerKVM directly. PowerVC can manage IBM AIX®, IBM i, and Linux workloads that run on POWER® hardware, including IBM PurePower systems. PowerVC editions include the following features and benefits: Virtual Image capture, deployment, and management Policy-based Virtual Machine (VM) placement to improve use Management of real-time optimization and VM resilience to increase productivity Managing real-time optimization and VM resilience to increase productivity VM Mobility with placement policies to reduce the burden on IT staff in a simple-to-install and easy-to-use graphical user interface (GUI) An open and extensible PowerVM management system that you can adapt as you need and that runs in parallel with existing infrastructure, preserving your investment A management system for existing PowerVM deployments You will also find all the details about how we set up the lab environment that is used in this book. This book is for experienced users of IBM PowerVM and other virtualization solutions who want to understand and implement the next generation of enterprise virtualization management for Power Systems. Unless stated otherwise, the content of this book refers to versions 1.2.2 and 1.2.3 of IBM PowerVC. Unless stated otherwise, the content of this book refers to versions 1.2.2 and 1.2.3 of IBM PowerVC Version 1.2.3 Introduction and Configuration IBM PowerVC.

Sams Teach Yourself: Big Data Analytics with Microsoft HDInsight in 24 Hours

2015-10-21 O'Reilly Amazon

book

Manpreet Singh , Arshad Ali

data data-engineering Hadoop Analytics BI Big Data

Sams Teach Yourself Big Data Analytics with Microsoft HDInsight in 24 Hours In just 24 lessons of one hour or less, Sams Teach Yourself Big Data Analytics with Microsoft HDInsight in 24 Hours helps you leverage Hadoop’s power on a flexible, scalable cloud platform using Microsoft’s newest business intelligence, visualization, and productivity tools. This book’s straightforward, step-by-step approach shows you how to provision, configure, monitor, and troubleshoot HDInsight and use Hadoop cloud services to solve real analytics problems. You’ll gain more of Hadoop’s benefits, with less complexity–even if you’re completely new to Big Data analytics. Every lesson builds on what you’ve already learned, giving you a rock-solid foundation for real-world success. Practical, hands-on examples show you how to apply what you learn Quizzes and exercises help you test your knowledge and stretch your skills Notes and tips point out shortcuts and solutions Learn how to… Master core Big Data and NoSQL concepts, value propositions, and use cases Work with key Hadoop features, such as HDFS2 and YARN Quickly install, configure, and monitor Hadoop (HDInsight) clusters in the cloud Automate provisioning, customize clusters, install additional Hadoop projects, and administer clusters Integrate, analyze, and report with Microsoft BI and Power BI Automate workflows for data transformation, integration, and other tasks Use Apache HBase on HDInsight Use Sqoop or SSIS to move data to or from HDInsight Perform R-based statistical computing on HDInsight datasets Accelerate analytics with Apache Spark Run real-time analytics on high-velocity data streams Write MapReduce, Hive, and Pig programs Register your book at informit.com/register for convenient access to downloads, updates, and corrections as they become available.

Sams Teach Yourself T-SQL in One Hour a Day

2015-10-20 O'Reilly Amazon

book

Alison Balter

data data-engineering SQL

Master T-SQL database design, development, and administration the easy way–hands-on! In just one hour a day, you’ll build all the skills you need to create effective database applications with T-SQL and SQL Server. With this complete tutorial, you’ll quickly master the basics and then move on to more advanced features and concepts: Learn the fundamentals of T-SQL from the ground up, one step at a time Succeed with the newest versions of T-SQL, SQL Server, and SQL Server Management Studio Use T-SQL effectively as both an application developer and DBA Master powerful stored procedures, triggers, transactions, and user-defined functions (UDFs) Systematically optimize and secure your SQL Server databases Learn on your own time, at your own pace No previous T-SQL or database programming experience required Learn how to design efficient, reliable SQL Server databases Define efficient tables, table relationships, fields, and constraints Make the most of T-SQL’s SELECT and UPDATE statements Work effectively with simple and complex views and joins Master stored procedure techniques every developer should know Build and use powerful User-Defined Functions (UDFs) Secure databases with authentication, roles, permissions, and principals Configure, maintain, and tune SQL Server for maximum reliability, performance, and value Back up, restore, and audit databases Optimize databases with the SQL Server Profiler, System Monitor, and Index Tuning Wizard Leverage valuable insight and time saving techniques from a world renowned database expert Register your book at informit.com/register for access to source code, example files, updates, and corrections as they become available.

Key Management Development Models, 3rd Edition

2015-10-19 O'Reilly Amazon

book

David Cotton

data data-engineering data-models

Key Management Development Models provides the crucial information you need to develop your skills as a manager. Divided into two parts (Part 1: Developing Yourself & Part 2: Working with Others), each tool, model or idea will ensure you: · understand yourself better · understand how others perceive you · develop your credibility at work · make better choices in your management of others · become a more rounded professional, able to adapt your style to get the best out of yourself and others

IBM Content Manager OnDemand Guide

2015-10-16 O'Reilly Amazon

book

Hassan A. Shazly

data data-engineering IBM Cyber Security

This IBM® Redbooks® publication provides a practical guide to the design, installation, configuration, and maintenance of IBM Content Manager OnDemand Version 9.5. Content Manager OnDemand manages the high-volume storage and retrieval of electronic statements and provides efficient enterprise report management. Content Manager OnDemand transforms formatted computer output and printed reports, such as statements and invoices, into electronic information for easy report management. Content Manager OnDemand helps eliminate costly, high-volume print output by capturing, indexing, archiving, and presenting electronic information for improved customer service. This publication covers the key areas of Content Manager OnDemand, some of which might not be known to the Content Manager OnDemand community or are misunderstood. The book covers various topics, including basic information in administration, database structure, storage management, and security. In addition, the book covers data indexing, loading, conversion, and expiration. Other topics include user exits, performance, retention management, records management, and many more. Because many other resources are available that address subjects on different platforms, this publication is not intended as a comprehensive guide for Content Manager OnDemand. Rather, it is intended to complement the existing Content Manager OnDemand documentation and provide insight into the issues that might be encountered in the setup and use of Content Manager OnDemand. This book is intended for individuals who need to design, install, configure, and maintain Content Manager OnDemand.

Fast Data: Smart and at Scale

2015-10-15 O'Reilly Amazon

book

John Hugg , Ryan Betts

data data-engineering streaming-messaging real-time-analytics Analytics IoT

The need for fast data applications is growing rapidly, driven by the IoT, the surge in machine-to-machine (M2M) data, global mobile device proliferation, and the monetization of SaaS platforms. So how do you combine real-time, streaming analytics with real-time decisions in an architecture that’s reliable, scalable, and simple? In this O’Reilly report, Ryan Betts and John Hugg from VoltDB examine ways to develop apps for fast data, using pre-defined patterns. These patterns are general enough to suit both the do-it-yourself, hybrid batch/streaming approach, as well as the simpler, proven in-memory approach available with certain fast database offerings. Their goal is to create a collection of fast data app development recipes. We welcome your contributions, which will be tested and included in future editions of this report.

Hadoop with Python

2015-10-15 O'Reilly Amazon

book

Donald Miner , Zach Radtka

data data-engineering Hadoop Analytics API Data Science

Hadoop is mostly written in Java, but that doesn't exclude the use of other programming languages with this distributed storage and processing framework, particularly Python. With this concise book, you’ll learn how to use Python with the Hadoop Distributed File System (HDFS), MapReduce, the Apache Pig platform and Pig Latin script, and the Apache Spark cluster-computing framework. Authors Zachary Radtka and Donald Miner from the data science firm Miner & Kasch take you through the basic concepts behind Hadoop, MapReduce, Pig, and Spark. Then, through multiple examples and use cases, you'll learn how to work with these technologies by applying various Python tools. Use the Python library Snakebite to access HDFS programmatically from within Python applications Write MapReduce jobs in Python with mrjob, the Python MapReduce library Extend Pig Latin with user-defined functions (UDFs) in Python Use the Spark Python API (PySpark) to write Spark programs with Python Learn how to use the Luigi Python workflow scheduler to manage MapReduce jobs and Pig scripts Zachary Radtka, a platform engineer at Miner & Kasch, has extensive experience creating custom analytics that run on petabyte-scale data sets.

Implementing Mobile Document Capture with IBM Datacap Software

2015-10-02 O'Reilly Amazon

book

Whei-Jen Chen , Tom Stuart , Jan den Hartog , Kevin Bowe , Ben Antin , Ben Davies , Daniel Ouimet

data data-engineering IBM

Organizations face many challenges in managing ever-increasing documents that they need to conduct their businesses. IBM® content management and imaging solutions can capture, store, manage, integrate, and deliver various forms of content throughout an enterprise. These tools can help reduce costs associated with content management and help organizations deliver improved customer service. The advanced document capture capabilities are provided through IBM Datacap software. This IBM Redbooks® publication focuses on Datacap components, system architecture, functions, and capabilities. It explains how Datacap works, how to design a document image capture solution, and how to implement the solution using Datacap Developer Tools, such as Datacap FastDoc (Admin). FastDoc is the development tool that designers use to create rules and rule sets, configure a document hierarchy and task profiles, and set up a verification panel for image verification. A loan application example explains the advanced technologies of IBM Datacap Version 9. This scenario shows how to develop a versatile capture solution that is able to handle both structured and unstructured documents. Information about high availability, scalability, performance, backup and recovery options, preferable practices, and suggestions for designing and implementing an imaging solution is also included. This book is intended for IT architects and professionals who are responsible for creating, improving, designing, and implementing document imaging solutions for their organizations.

Introducing SQL Server

2015-10-02 O'Reilly Amazon

book

Mike McQuillan

data data-engineering SQL Microsoft RDBMS SQL Server

Introducing SQL Server is a fast and easy introduction to SQL Server and the world of relational databases. You’ll learn how databases work and how to use the T-SQL language by practicing on one of the most widely-used and powerful database engines in the corporate world: Microsoft SQL Server. Do you quake at the sight of a SELECT statement? Start to shiver when people start talking about tables and rows? Fear not, Introducing SQL Server is here to rescue you. The book focuses on the knowledge and skills needed to begin your journey toward becoming a solid and competent SQL Server professional and database programmer. You’ll learn the core concepts of SQL Server, from installing the software to executing and profiling queries. Introducing SQL Server is aimed at SQL Server newcomers as well as at those wanting to improve their database skills. You’ll put a comprehensive database together as you work through the book. You will create tables and learn to use constraints; create reusable functions and stored procedures; and even learn how indexes work and what they bring in terms of increased performance. Introducing SQL Server shows you that databases don’t need to be difficult. Teaches you how to build a SQL Server database from scratch Takes a tutorial-based approach, with each chapter building on the last Covers what you need to know for common SQL Server development tasks

Learning Android Google Maps

2015-09-30 O'Reilly Amazon

book

Raj Amal

data data-engineering location-data geographic-information-system-gis web-mapping google-maps

Learning Android Google Maps is the ultimate guide to integrating Google Maps into your Android applications. This book takes you through the process of setting up, customizing, and leveraging this powerful feature. By the end, you'll be adept at creating engaging map functionalities applicable for any Android project. What this Book will help me do Understand how to set up the Android development environment and obtain the Google API key to start using Maps. Gain the skills to add features to Google Maps, such as markers, overlays, and custom information windows. Learn how to work with various types of maps, enabling specific applications. Master the ability to connect your map with real-time GPS data, offering user location-based services. Discover how to implement Google Street View and other interactive geographic features into your apps. Author(s) This book is meticulously compiled by developers with extensive experience in building Android applications and implementing Google Maps. Their combined years of hands-on development ensure the instructions are clear, comprehensive, and practical. Their passion for teaching shines as they break down complex topics into easy-to-understand explanations. Who is it for? This book is ideal for Android developers looking to integrate map functionalities into their apps. Beginners can follow along due to its detailed, step-by-step approach, while intermediate developers will appreciate the customization techniques and advanced features covered. If you aim to master Google Maps API in Android development, this book is for you.

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

Integrating the IBM MQ Appliance into your IBM MQ Infrastructure

Introducing and Implementing IBM FlashSystem V9000

Storytelling with Data: A Data Visualization Guide for Business Professionals

WHOIS Running the Internet: Protocol, Policy, and Privacy

Real Time Analytics with SAP Hana

Web Development with MongoDB and NodeJS - Second Edition

Advanced Data Management

Enterprise Search, 2nd Edition

IBM Spectrum Virtualize and SAN Volume Controller Enhanced Stretched Cluster with VMware

Java Persistence with Hibernate, Second Edition

MariaDb Essentials

Microsoft Mapping: Geospatial Development in Windows 10 with Bing Maps and C#, Second Edition

Hadoop 2 Quick-Start Guide: Learn the Essentials of Big Data Computing in the Apache Hadoop 2 Ecosystem

VersaStack Solution by Cisco and IBM with SQL, Spectrum Control, and Spectrum Protect

Expert Performance Indexing in SQL Server, Second Edition

IBM PowerVC Version 1.2.3: Introduction and Configuration

Sams Teach Yourself: Big Data Analytics with Microsoft HDInsight in 24 Hours

Sams Teach Yourself T-SQL in One Hour a Day

Key Management Development Models, 3rd Edition

IBM Content Manager OnDemand Guide

Fast Data: Smart and at Scale

Hadoop with Python

Implementing Mobile Document Capture with IBM Datacap Software

Introducing SQL Server

Learning Android Google Maps