talk-data.com talk-data.com

Topic

data-engineering

3395

tagged

Activity Trend

1 peak/qtr
2020-Q1 2026-Q1

Activities

3395 activities · Newest first

Practical Hadoop Migration: How to Integrate Your RDBMS with the Hadoop Ecosystem and Re-Architect Relational Applications to NoSQL

Re-architect relational applications to NoSQL, integrate relational database management systems with the Hadoop ecosystem, and transform and migrate relational data to and from Hadoop components. This book covers the best-practice design approaches to re-architecting your relational applications and transforming your relational data to optimize concurrency, security, denormalization, and performance. Winner of IBM's 2012 Gerstner Award for his implementation of big data and data warehouse initiatives and author of Practical Hadoop Security, author Bhushan Lakhe walks you through the entire transition process. First, he lays out the criteria for deciding what blend of re-architecting, migration, and integration between RDBMS and HDFS best meets your transition objectives. Then he demonstrates how to design your transition model. Lakhe proceeds to cover the selection criteria for ETL tools, the implementation steps for migration with SQOOP- and Flume-based data transfers, and transition optimization techniques for tuning partitions, scheduling aggregations, and redesigning ETL. Finally, he assesses the pros and cons of data lakes and Lambda architecture as integrative solutions and illustrates their implementation with real-world case studies. Hadoop/NoSQL solutions do not offer by default certain relational technology features such as role-based access control, locking for concurrent updates, and various tools for measuring and enhancing performance. Practical Hadoop Migration shows how to use open-source tools to emulate such relational functionalities in Hadoop ecosystem components. What You'll Learn Decide whether you should migrate your relational applications to big data technologies or integrate them Transition your relational applications to Hadoop/NoSQL platforms in terms of logical design and physical implementation Discover RDBMS-to-HDFS integration, data transformation, and optimization techniques Consider when to use Lambda architecture and data lake solutions Select and implement Hadoop-based components and applications to speed transition, optimize integrated performance, and emulate relational functionalities Who This Book Is For Database developers, database administrators, enterprise architects, Hadoop/NoSQL developers, and IT leaders. Its secondary readership is project and program managers and advanced students of database and management information systems.

Enabling Real-time Analytics on IBM z Systems Platform

Regarding online transaction processing (OLTP) workloads, IBM® z Systems™ platform, with IBM DB2®, data sharing, Workload Manager (WLM), geoplex, and other high-end features, is the widely acknowledged leader. Most customers now integrate business analytics with OLTP by running, for example, scoring functions from transactional context for real-time analytics or by applying machine-learning algorithms on enterprise data that is kept on the mainframe. As a result, IBM adds investment so clients can keep the complete lifecycle for data analysis, modeling, and scoring on z Systems control in a cost-efficient way, keeping the qualities of services in availability, security, reliability that z Systems solutions offer. Because of the changed architecture and tighter integration, IBM has shown, in a customer proof-of-concept, that a particular client was able to achieve an orders-of-magnitude improvement in performance, allowing that client’s data scientist to investigate the data in a more interactive process. Open technologies, such as Predictive Model Markup Language (PMML) can help customers update single components instead of being forced to replace everything at once. As a result, you have the possibility to combine your preferred tool for model generation (such as SAS Enterprise Miner or IBM SPSS® Modeler) with a different technology for model scoring (such as Zementis, a company focused on PMML scoring). IBM SPSS Modeler is a leading data mining workbench that can apply various algorithms in data preparation, cleansing, statistics, visualization, machine learning, and predictive analytics. It has over 20 years of experience and continued development, and is integrated with z Systems. With IBM DB2 Analytics Accelerator 5.1 and SPSS Modeler 17.1, the possibility exists to do the complete predictive model creation including data transformation within DB2 Analytics Accelerator. So, instead of moving the data to a distributed environment, algorithms can be pushed to the data, using cost-efficient DB2 Accelerator for the required resource-intensive operations. This IBM Redbooks® publication explains the overall z Systems architecture, how the components can be installed and customized, how the new IBM DB2 Analytics Accelerator loader can help efficient data loading for z Systems data and external data, how in-database transformation, in-database modeling, and in-transactional real-time scoring can be used, and what other related technologies are available. This book is intended for technical specialists and architects, and data scientists who want to use the technology on the z Systems platform. Most of the technologies described in this book require IBM DB2 for z/OS®. For acceleration of the data investigation, data transformation, and data modeling process, DB2 Analytics Accelerator is required. Most value can be archived if most of the data already resides on z Systems platforms, although adding external data (like from social sources) poses no problem at all.

Implementing or Migrating to an IBM Gen 5 b-type SAN

The IBM® b-type Gen 5 Fibre Channel directors and switches provide reliable, scalable, and secure high-performance foundations for high-density server virtualization, cloud architectures, and next generation flash and SSD storage. They are designed to meet the demands of highly virtualized private cloud storage and data center environments. This IBM Redbooks® publication helps administrators learn how to implement or migrate to an IBM Gen 5 b-type SAN. It provides an overview of the key hardware and software products and explains how to install, monitor, tune, and troubleshoot your storage area network (SAN). Read this publication to learn about fabric design, managing and monitoring your network, key tools such as IBM Network Advisor and Fabric Vision, and troubleshooting.

T-SQL Fundamentals, Third Edition

Effectively query and modify data using Transact-SQL Master T-SQL fundamentals and write robust code for Microsoft SQL Server and Azure SQL Database. Itzik Ben-Gan explains key T-SQL concepts and helps you apply your knowledge with hands-on exercises. The book first introduces T-SQL’s roots and underlying logic. Next, it walks you through core topics such as single-table queries, joins, subqueries, table expressions, and set operators. Then the book covers more-advanced data-query topics such as window functions, pivoting, and grouping sets. The book also explains how to modify data, work with temporal tables, and handle transactions, and provides an overview of programmable objects. Related Content Book: T-SQL Fundamentals, 4th Edition Microsoft Data Platform MVP Itzik Ben-Gan shows you how to: Review core SQL concepts and its mathematical roots Create tables and enforce data integrity Perform effective single-table queries by using the SELECT statement Query multiple tables by using joins, subqueries, table expressions, and set operators Use advanced query techniques such as window functions, pivoting, and grouping sets Insert, update, delete, and merge data Use transactions in a concurrent environment Get started with programmable objects–from variables and batches to user-defined functions, stored procedures, triggers, and dynamic SQL

IBM GDPS Family: An Introduction to Concepts and Capabilities

This IBM® Redbooks® publication presents an overview of the IBM Geographically Dispersed Parallel Sysplex™ (IBM GDPS®) offerings and the roles they play in delivering a business IT resilience solution. The book begins with general concepts of business IT resilience and disaster recovery, along with issues related to high application availability, data integrity, and performance. These topics are considered within the framework of government regulation, increasing application and infrastructure complexity, and the competitive and rapidly changing modern business environment. Next, it describes the GDPS family of offerings with specific reference to how they can help you achieve your defined goals for disaster recovery and high availability. Also covered are the features that simplify and enhance data replication activities, the prerequisites for implementing each offering, and tips for planning for the future and immediate business requirements. Tables provide easy-to-use summaries and comparisons of the offerings, and the additional planning and implementation services available from IBM are explained. Then, several practical client scenarios and requirements are described, along with the most suitable GDPS solution for each case. The introductory chapters of this publication are intended for a broad technical audience, including IT System Architects, Availability Managers, Technical IT Managers, Operations Managers, System Programmers, and Disaster Recovery Planners. The subsequent chapters provide more technical details about the GDPS offerings, and each can be read independently for those readers who are interested in specific topics. Therefore, if you do read all the chapters, be aware that some information is intentionally repeated.

Expert Scripting and Automation for SQL Server DBAs

Automate your workload and manage more databases and instances with greater ease and efficiency by combining metadata-driven automation with powerful tools like PowerShell and SQL Server Agent. Automate your new instance-builds and use monitoring to drive ongoing automation, with the help of an inventory database and a management data warehouse. The market has seen a trend towards there being a much smaller ratio of DBAs to SQL Server instances. Automation is the key to responding to this challenge and continuing to run a reliable database platform service. guides you through the process of automating the maintenance of your SQL Server enterprise. Expert Scripting and Automation for SQL Server DBAs shows how to automate the SQL Server build processes, monitor multiple instances from a single location, and automate routine maintenance tasks throughout your environment. You will also learn how to create automated responses to common or time consuming break/fix scenarios. The book helps you become faster and better at what you do for a living, and thus more valuable in the job market. Expert Scripting and Automation for SQL Server DBAs Extensive coverage of automation using PowerShell and T-SQL Detailed discussion and examples on metadata-driven automation Comprehensive coverage of automated responses to break/fix scenarios What You Will Learn Automate the SQL Server build process Create intelligent, metadata-drive routines Automate common maintenance tasks Create automated responses to common break/fix scenarios Monitor multiple instance from a central location Utilize T-SQL and PowerShell for administrative purposes Who This Book Is For is a book for SQL Server database administrators responsible for managing increasingly large numbers of databases across their business enterprise. The book is also useful for any database administrator looking to ease their workload through automation. The book addresses the needs of these audiences by showing how to get more done through less effort by implementing an intelligent, automated-processes service model using tools such as T-SQL, PowerShell, Server Agent, and the Management Data Warehouse. Expert Scripting and Automation for SQL Server DBAs

Monitoring Elasticsearch

"Monitoring Elasticsearch" focuses on teaching readers how to manage and monitor the health and performance of Elasticsearch clusters. Through practical steps and real-world examples, this book ensures that users can diagnose, resolve, and prevent common issues to optimize system reliability and performance. What this Book will help me do Obtain a clear understanding of Elasticsearch monitoring tools and their features. Learn how to diagnose and troubleshoot common Elasticsearch performance issues. Master the use of Elasticsearch APIs for monitoring and analysis. Explore the best practices for effectively maintaining cluster reliability. Understand the features of tools like Kibana, Marvel, and BigDesk for Elasticsearch monitoring. Author(s) The authors of "Monitoring Elasticsearch" are experts in distributed systems and database management, with extensive experience in Elasticsearch deployment and monitoring. They bring their practical knowledge, teaching readers clear and actionable techniques. Their approachable style makes complex systems accessible, helping professionals and aficionados alike. Who is it for? This book is ideal for developers and system administrators who work with Elasticsearch, regardless of their industry. Whether you're new to Elasticsearch or aiming to deepen your expertise, you will find practical solutions and helpful tools. The content suits a range of experiences, from beginners curious about cluster monitoring to experts needing solutions for specific issues. If you use Elasticsearch or plan to, this book is for you.

IBM Netcool Operations Insight Version 1.4: Deployment Guide

IBM® Netcool® Operations Insight integrates infrastructure and operations management into a single coherent structure across business applications, virtualized servers, network devices and protocols, internet protocols, and security and storage devices. This IBM Redbooks® publication will help you install, tailor, and configure Netcool Operations Insight Version 1.4. Netcool Operations Insight consists of several products and components that can be installed on many servers in many combinations. You must make many decisions, both critical and personal preference. The purpose of this document is to accelerate the initial deployment of Netcool Operations Insight by making preferred practice choices. The target audience of this book is Netcool Operations Insight deployment specialists.

Implementing an IBM High-Performance Computing Solution on IBM Power System S822LC

This IBM® Redbooks® publication demonstrates and documents that IBM Power Systems™ high-performance computing and technical computing solutions deliver faster time to value with powerful solutions. Configurable into highly scalable Linux clusters, Power Systems offer extreme performance for demanding workloads such as genomics, finance, computational chemistry, oil and gas exploration, and high-performance data analytics. This book delivers a high-performance computing solution implemented on the IBM Power System S822LC. The solution delivers high application performance and throughput based on its built-for-big-data architecture that incorporates IBM POWER8® processors, tightly coupled Field Programmable Gate Arrays (FPGAs) and accelerators, and faster I/O by using Coherent Accelerator Processor Interface (CAPI). This solution is ideal for clients that need more processing power while simultaneously increasing workload density and reducing datacenter floor space requirements. The Power S822LC offers a modular design to scale from a single rack to hundreds, simplicity of ordering, and a strong innovation roadmap for graphics processing units (GPUs). This publication is targeted toward technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) responsible for delivering cost effective high-performance computing (HPC) solutions that help uncover insights from their data so they can optimize business results, product development, and scientific discoveries

The Language of SQL, Second Edition

The Language of SQL, Second Edition Many SQL texts attempt to serve as an encyclopedic reference on SQL syntax -- an approach that is often counterproductive, because that information is readily available in online references published by the major database vendors. For SQL beginners, it’s more important for a book to focus on general concepts and to offer clear explanations and examples of what various SQL statements can accomplish. This is that book. A number of features make The Language of SQL unique among introductory SQL books. First, you will not be required to download software or sit with a computer as you read the text. The intent of this book is to provide examples of SQL usage that can be understood simply by reading. Second, topics are organized in an intuitive and logical sequence. SQL keywords are introduced one at a time, allowing you to grow your understanding as you encounter new terms and concepts. Finally, this book covers the syntax of three widely used databases: Microsoft SQL Server, MySQL, and Oracle. Special “Database Differences” sidebars clearly show you any differences in syntax among these three databases, and instructions are included on how to obtain and install free versions of the databases. This is the only book you need to gain a quick working knowledge of SQL and relational databases. Learn How To... Use SQL to retrieve data from relational databases Apply functions and calculations to data Group and summarize data in a variety of useful ways Use complex logic to retrieve only the data you need Update data and create new tables Design relational databases so that data retrieval is easy and intuitive Use spreadsheets to transform your data into meaningful displays Retrieve data from multiple tables via joins, subqueries, views, and set logic Create, modify, and execute stored procedures Install Microsoft SQL Server, MySQL, or Oracle Contents at a Glance 1 Relational Databases and SQL 2 Basic Data Retrieval 3 Calculated Fields and Aliases 4 Using Functions 5 Sorting Data 6 Selection Criteria 7 Boolean Logic 8 Conditional Logic 9 Summarizing Data 10 Subtotals and Crosstabs 11 Inner Joins 12 Outer Joins 13 Self Joins and Views 14 Subqueries 15 Set Logic 16 Stored Procedures and Parameters 17 Modifying Data 18 Maintaining Tables 19 Principles of Database Design 20 Strategies for Displaying Data A Getting Started with Microsoft SQL Server B Getting Started with MySQL C Getting Started with Oracle

IBM Netcool Operations Insight: A Scenarios Guide

IBM® Netcool® Operations Insight empowers your IT operations to use real-time and historical analytics to identify, isolate, and resolve problems before they affect your business. Powered by IBM Tivoli® Netcool/OMNIbus and the transformative capabilities of cognitive analytics, Netcool Operations Insight consolidates millions of alerts from across local, cloud, and hybrid environments into a few actionable problems. This IBM Redbooks® publication gives a broad understanding of Netcool Operations Insight and describes several scenarios that show the capabilities of this solution in a real-life environment. Each scenario features a different capability of Netcool Operations Insight. The scenarios are documented by using step-by-step figures with explanations to make them easier to implement in your own environment. The scenarios in this book are broken into the following categories: - Network Management-related scenarios - Network Event and cognitive-related scenarios - Network Event-related scenarios The target audience of this book is network specialists, network administrators, and network operators.

The Big Data Market

Which companies have adopted technologies such as Hadoop and Spark, as well as data science in general? And which industries are lagging behind? This O’Reilly report provides the results of a unique, data-driven analysis of the market for big data products and technologies. Using eye-catching charts and visualizations, Spiderbook cofounder Aman Naimat highlights some surprising results from the analysis, such as: The relatively small number of companies using big data in production Industries that have embraced big data the most—and the least The amount of money spent on various big data use cases How many companies actually use “fast data” The results also reveal the geographical locations where companies have been quick to adopt big data, as well as the types of teams that use big data technology. In addition, Naimat takes you through the analysis process with Spiderbook’s graph-based machine-learning model. The company analyzed billions of publicly available documents, canvassed more than 500,000 companies, and searched the entire business internet to compile the most comprehensive results possible.

Architecting HBase Applications

HBase is a remarkable tool for indexing mass volumes of data, but getting started with this distributed database and its ecosystem can be daunting. With this hands-on guide, you’ll learn how to architect, design, and deploy your own HBase applications by examining real-world solutions. Along with HBase principles and cluster deployment guidelines, this book includes in-depth case studies that demonstrate how large companies solved specific use cases with HBase. Authors Jean-Marc Spaggiari and Kevin O’Dell also provide draft solutions and code examples to help you implement your own versions of those use cases, from master data management (MDM) and document storage to near real-time event processing. You’ll also learn troubleshooting techniques to help you avoid common deployment mistakes. Learn exactly what HBase does, what its ecosystem includes, and how to set up your environment Explore how real-world HBase instances were deployed and put into production Examine documented use cases for tracking healthcare claims, digital advertising, data management, and product quality Understand how HBase works with tools and techniques such as Spark, Kafka, MapReduce, and the Java API Learn how to identify the causes and understand the consequences of the most common HBase issues

IBM System Storage Solutions Handbook

The IBM® System Storage® Solutions Handbook helps you solve your current and future data storage business requirements. It helps you achieve enhanced storage efficiency by design to allow managed cost, capacity of growth, greater mobility, and stronger control over storage performance and management. It describes the most current IBM storage products, including the IBM Spectrum™ family, IBM FlashSystem®, disk, and tape, as well as virtualized solutions such IBM Storage Cloud. This IBM Redbooks® publication provides overviews and information about the most current IBM System Storage products. It shows how IBM delivers the right mix of products for nearly every aspect of business continuance and business efficiency. IBM storage products can help you store, safeguard, retrieve, and share your data. This book is intended as a reference for basic and comprehensive information about the IBM Storage products portfolio. It provides a starting point for establishing your own enterprise storage environment. This book describes the IBM Storage products as of March, 2016.

iSCSI Implementation and Best Practices on IBM Storwize

This IBM® Redbooks® publication helps administrators and technical professionals understand Internet Small Computer System Interface (iSCSI) and how to implement it for use with IBM Storwize® storage systems. iSCSI can be used alone or with other technologies. This publication provides an overview of the iSCSI protocol and helps you understand how it is similar to and different from Fibre Channel (FC) technology. It helps you plan and design your network topology. It explains how to configure your IBM Storwize storage systems and hosts (including IBM AIX®, Linux, VMware, and Microsoft Windows hosts) to interact with it. It also provides an overview of using IBM Storwize storage systems with OpenStack. This book describes iSCSI configuring for IBM Storwize and SAN Volume Controller storage systems at Version 7.6 or later. In addition to configuration, this publication provides information about performance and troubleshooting.

Cassandra: The Definitive Guide, 2nd Edition

Imagine what you could do if scalability wasn't a problem. With this hands-on guide, you’ll learn how the Cassandra database management system handles hundreds of terabytes of data while remaining highly available across multiple data centers. This expanded second edition—updated for Cassandra 3.0—provides the technical details and practical examples you need to put this database to work in a production environment. Authors Jeff Carpenter and Eben Hewitt demonstrate the advantages of Cassandra’s non-relational design, with special attention to data modeling. If you’re a developer, DBA, or application architect looking to solve a database scaling issue or future-proof your application, this guide helps you harness Cassandra’s speed and flexibility. Understand Cassandra’s distributed and decentralized structure Use the Cassandra Query Language (CQL) and cqlsh—the CQL shell Create a working data model and compare it with an equivalent relational model Develop sample applications using client drivers for languages including Java, Python, and Node.js Explore cluster topology and learn how nodes exchange data Maintain a high level of performance in your cluster Deploy Cassandra on site, in the Cloud, or with Docker Integrate Cassandra with Spark, Hadoop, Elasticsearch, Solr, and Lucene

Global Dynamics

A world model: economies, trade, migration, security and development aid. This bookprovides the analytical capability to understand and explore the dynamics of globalisation. It is anchored in economic input-output models of over 200 countries and their relationships through trade, migration, security and development aid. The tools of complexity science are brought to bear and mathematical and computer models are developed both for the elements and for an integrated whole. Models are developed at a variety of scales ranging from the global and international trade through a European model of inter-sub-regional migration to piracy in the Gulf and the London riots of 2011. The models embrace the changing technology of international shipping, the impacts of migration on economic development along with changing patterns of military expenditure and development aid. A unique contribution is the level of spatial disaggregation which presents each of 200+ countries and their mutual interdependencies – along with some finer scale analyses of cities and regions. This is the first global model which offers this depth of detail with fully work-out models, these provide tools for policy making at national, European and global scales. Global dynamics: Presents in depth models of global dynamics. Provides a world economic model of 200+ countries and their interactions through trade, migration, security and development aid. Provides pointers to the deployment of analytical capability through modelling in policy development. Features a variety of models that constitute a formidable toolkit for analysis and policy development. Offers a demonstration of the practicalities of complexity science concepts. This book is for practitioners and policy analysts as well as those interested in mathematical model building and complexity science as well as advanced undergraduate and postgraduate level students.

Beginning SQL Queries: From Novice to Professional, Second Edition

Get started on mastering the one language binding the entire database industry. That language is SQL, and how it works is must-have knowledge for anyone involved with relational databases, and surprisingly also for anyone involved with NoSQL databases. SQL is universally used in querying and reporting on large data sets in order to generate knowledge to drive business decisions. Good knowledge of SQL is crucial to anyone working with databases, because it is with SQL that you retrieve data, manipulate data, and generate business results. Every relational database supports SQL for its expressiveness in writing queries underlying reports and business intelligence dashboards. Knowing how to write good queries is the foundation for all work done in SQL, and it is a foundation that Clare Churcher's book, , 2nd Edition, lays well. Beginning SQL Queries What You Will Learn Write simple queries to extract data from a single table Combine data from many tables into one business result using set operations Translate natural language questions into database queries providing meaningful information to the business Avoid errors associated with duplicated and null values Summarize data with amazing ease using the newly-added feature of window functions Tackle tricky queries with confidence that you are generating correct results Investigate and understand the effects of indexes on the efficiency of queries Who This Book Is For Beginning SQL Queries, 2nd Edition is aimed at intelligent laypeople who need to extract information from a database, and at developers and other IT professionals who are new to SQL. The book is especially useful for business intelligence analysts who must ask more complex questions of their database than their GUI—based reporting software supports. Such people might be business owners wanting to target specific customers, scientists and students needing to extract subsets of their research data, or end users wanting to make the best use of databases for their clubs and societies.

IBM PowerHA SystemMirror V7.2 for IBM AIX Updates

This IBM® Redbooks® publication addresses topics to help answer customers' complex high availability requirements to help maximize systems availability and resources, and provide documentation to transfer the how-to-skills to the worldwide sales and support teams. This publication helps strengthen the position of the IBM PowerHA® SystemMirror® solution with a well-defined and documented deployment models within an IBM Power Systems™ virtualized environment, providing customers a planned foundation for business resilient infrastructure solutions. This book describes documentation, and other resources available to help the technical teams provide business resilience solutions and support with the IBM PowerHA SystemMirror Standard and Enterprise Editions on IBM Power Systems. This publication targets technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) responsible for providing high availability solutions and support with IBM PowerHA SystemMirror Standard and Enterprise Editions on IBM Power Systems.

Getting Started with KVM for IBM z Systems

This IBM® Redbooks® publication gives a broad explanation of the kernel-based virtual machine (KVM) for IBM z Systems™ (KVM for IBM z Systems) and how it uses the architecture of IBM z Systems platforms. It focuses on the planning of the environment and provides installation and configuration definitions that are necessary to build and manage KVM for IBM z Systems. This publication is useful to IT architects and system administrators who plan for and install KVM for IBM z Systems. The reader is expected to have a good understanding of IBM z Systems hardware, KVM for IBM z Systems, Linux on z Systems, and virtualization concepts.