talk-data.com talk-data.com

Topic

data-engineering

3395

tagged

Activity Trend

1 peak/qtr
2020-Q1 2026-Q1

Activities

3395 activities · Newest first

Pro Apache Phoenix: An SQL Driver for HBase, First Edition

Leverage Phoenix as an ANSI SQL engine built on top of the highly distributed and scalable NoSQL framework HBase. Learn the basics and best practices that are being adopted in Phoenix to enable a high write and read throughput in a big data space. This book includes real-world cases such as Internet of Things devices that send continuous streams to Phoenix, and the book explains how key features such as joins, indexes, transactions, and functions help you understand the simple, flexible, and powerful API that Phoenix provides. Examples are provided using real-time data and data-driven businesses that show you how to collect, analyze, and act in seconds. Pro Apache Phoenix covers the nuances of setting up a distributed HBase cluster with Phoenix libraries, running performance benchmarks, configuring parameters for production scenarios, and viewing the results. The book also shows how Phoenix plays well with other key frameworks in the Hadoop ecosystem such as Apache Spark, Pig, Flume, and Sqoop. You will learn how to: Handle a petabyte data store by applying familiar SQL techniques Store, analyze, and manipulate data in a NoSQL Hadoop echo system with HBase Apply best practices while working with a scalable data store on Hadoop and HBase Integrate popular frameworks (Apache Spark, Pig, Flume) to simplify big data analysis Demonstrate real-time use cases and big data modeling techniques Who This Book Is For Data engineers, Big Data administrators, and architects

Weathering the Storm

Weathering the Storm explores the factors leading up to the recent global financial and economic crisis, how the crisis unfolded, and the response of European and national authorities. The book describes the rationale behind the measures undertaken to mitigate the consequences of the recession and to ensure that a similar situation does not happen again in the future. In the wake of the crisis, various major changes continue to significantly affect the life and social organization of Europeans. For instance, a new ESM with a size financially comparable to that of the IMF was created; similarly, the reforms in economic governance imply much more intrusive participation of European countries in each other's macroeconomic policies. Moreover, the organization, regulation, and supervision of the financial sector have been drastically revamped. The decisions taken by European and national authorities affect the daily lives of hundreds of millions of European citizens and countless more around the globe. An insightful read for anyone interested in understanding the topic and its effect on their lives, the book primarily addresses undergraduate students in their final year and graduate students in fields such as economics, finance, and political science. The main messages are explained through examples and charts.

Introducing and Implementing IBM FlashSystem V9000

The success or failure of businesses often depends on how well organizations use their data assets for competitive advantage. Deeper insights from data require better information technology. As organizations modernize their IT infrastructure to boost innovation rather than limit it, they need a data storage system that can keep pace with highly virtualized environments, cloud computing, mobile and social systems of engagement, and in-depth, real-time analytics. Making the correct decision on storage investment is critical. Organizations must have enough storage performance and agility to innovate as they need to implement cloud-based IT services, deploy virtual desktop infrastructure, enhance fraud detection, and use new analytics capabilities. At the same time, future storage investments must lower IT infrastructure costs while helping organizations to derive the greatest possible value from their data assets. The IBM® FlashSystem V9000 is the premier, fully integrated, Tier 1, all-flash offering from IBM. It has changed the economics of today’s data center by eliminating storage bottlenecks. Its software-defined storage features simplify data management, improve data security, and preserve your investments in storage. The IBM FlashSystem® V9000 SAS expansion enclosures provide new tiering options with read-intensive SSDs or nearline SAS HDDs. IBM FlashSystem V9000 includes IBM FlashCore® technology and advanced software-defined storage available in one solution in a compact 6U form factor. IBM FlashSystem V9000 improves business application availability. It delivers greater resource utilization so you can get the most from your storage resources, and achieve a simpler, more scalable, and cost-efficient IT Infrastructure. This IBM Redbooks® publication provides information about IBM FlashSystem V9000 Software V7.7 and introduces the recently announced V7.8. It describes the product architecture, software, hardware, and implementation, and provides hints and tips. It illustrates use cases and independent software vendor (ISV) scenarios that demonstrate real-world solutions, and also provides examples of the benefits gained by integrating the IBM FlashSystem storage into business environments. This book offers IBM FlashSystem V9000 scalability concepts and guidelines for planning, installing, and configuring, which can help environments scale up and out to add more flash capacity and expand virtualized systems. Port utilization methodologies are provided to help you maximize the full potential of IBM FlashSystem V9000 performance and low latency in your scalable environment. This book is intended for pre-sales and post-sales technical support professionals, storage administrators, and anyone who wants to understand how to implement this exciting technology.

Weathering the Storm

Weathering the Storm explores the factors leading up to the recent global financial and economic crisis, how the crisis unfolded, and the response of European and national authorities. The book describes the rationale behind the measures undertaken to mitigate the consequences of the recession and to ensure that a similar situation does not happen again in the future. In the wake of the crisis, various major changes continue to significantly affect the life and social organization of Europeans. For instance, a new ESM with a size financially comparable to that of the IMF was created; similarly, the reforms in economic governance imply much more intrusive participation of European countries in each other's macroeconomic policies. Moreover, the organization, regulation, and supervision of the financial sector have been drastically revamped. The decisions taken by European and national authorities affect the daily lives of hundreds of millions of European citizens and countless more around the globe. An insightful read for anyone interested in understanding the topic and its effect on their lives, the book primarily addresses undergraduate students in their final year and graduate students in fields such as economics, finance, and political science. The main messages are explained through examples and charts.

Apache Spark for Data Science Cookbook

In "Apache Spark for Data Science Cookbook," you'll delve into solving real-world analytical challenges using the robust Apache Spark framework. This book features hands-on recipes that cover data analysis, distributed machine learning, and real-time data processing. You'll gain practical skills to process, visualize, and extract insights from large datasets efficiently. What this Book will help me do Master using Apache Spark for processing and analyzing large-scale datasets effectively. Harness Spark's MLLib for implementing machine learning algorithms like classification and clustering. Utilize libraries such as NumPy, SciPy, and Pandas in conjunction with Spark for numerical computations. Apply techniques like Natural Language Processing and text mining using Spark-integrated tools. Perform end-to-end data science workflows, including data exploration, modeling, and visualization. Author(s) Nagamallikarjuna Inelu and None Chitturi bring their extensive experience working with data science and distributed computing frameworks like Apache Spark. Nagamallikarjuna specializes in applying machine learning algorithms to big data problems, while None has contributed to various big data system implementations. Together, they focus on providing practitioners with practical and efficient solutions. Who is it for? This book is primarily intended for novice and intermediate data scientists and analysts who are curious about using Apache Spark to tackle data science problems. Readers are expected to have some familiarity with basic data science tasks. If you want to learn practical applications of Spark in data analysis and enhance your big data analytics skills, this resource is for you.

Fast Data Processing Systems with SMACK Stack

Fast Data Processing Systems with SMACK Stack introduces you to the SMACK stack-a combination of Spark, Mesos, Akka, Cassandra, and Kafka. You will learn to integrate these technologies to build scalable, efficient, and real-time data processing platforms tailored for solving critical business challenges. What this Book will help me do Understand the concepts of fast data pipelines and design scalable architectures using the SMACK stack Gain expertise in functional programming with Scala and leverage its power in data processing tasks Build and optimize distributed databases using Apache Cassandra for scaling extensively Deploy and manage real-time data streams using Apache Kafka to handle massive messaging workloads Implement cost-effective cluster infrastructures with Apache Mesos for efficient resource utilization Author(s) None Estrada is an expert in distributed systems and big data technologies. With years of experience implementing SMACK-based solutions across industries, Estrada offers a practical viewpoint to designing scalable systems. Their blend of theoretical knowledge and applied practices ensures readers receive actionable guidance. Who is it for? This book is perfect for software developers, data engineers, or data scientists looking to deepen their understanding of real-time data processing systems. If you have a foundational knowledge of the technologies in the SMACK stack or wish to learn how to combine these cutting-edge tools to solve complex problems, this is for you. Readers with an interest in building efficient big data solutions will find tremendous value here.

MOS 2016 Study Guide for Microsoft Access

Advance your everyday proficiency with Access 2016. And earn the credential that proves it! Demonstrate your expertise with Microsoft Access! Designed to help you practice and prepare for Microsoft Office Specialist (MOS): Access 2016 certification, this official Study Guide delivers: • In-depth preparation for each MOS objective • Detailed procedures to help build the skills measured by the exam • Hands-on tasks to practice what you’ve learned • Practice files and sample solutions Sharpen the skills measured by these objectives: • Create and manage databases • Build tables • Create queries • Create forms • Create reports

IBM Business Process Manager Operations Guide

This IBM® Redbooks® publication provides operations teams with architectural design patterns and guidelines for the day-to-day challenges that they face when managing their IBM IBM Business Process Manager (BPM) infrastructure. Today, IBM BPM L2 and L3 Support and SWAT teams are constantly advising customers how to deal with the following common challenges: Deployment options (on-premises, patterns, cloud, and so on) Administration DevOps Automation Performance monitoring and tuning Infrastructure management Scalability High Availability and Data Recovery Federation This publication enables customers to become self-sufficient, promote consistency and accelerate IBM BPM Support engagements. This IBM Redbooks publication is targeted toward technical professionals (technical support staff, IT Architects, and IT Specialists) who are responsible for meeting day-to-day challenges that they face when they are managing an IBM BPM infrastructure.

Mastering RethinkDB

Mastering RethinkDB offers a comprehensive guide to using the open-source, scalable database RethinkDB for real-time application development. Throughout this book, you'll gain practical knowledge on query management with ReQL, build dynamic web apps, and perform advanced database administration tasks. What this Book will help me do Gain expertise in managing and configuring RethinkDB clusters for optimal performance in real-time applications. Develop robust web applications using RethinkDB and integrate them seamlessly with Node.js. Leverage advanced querying features of ReQL, including geospatial and time-series queries. Enhance RethinkDB's capabilities with integration techniques for third-party libraries like ElasticSearch. Master deployment practices using platforms such as Docker and PaaS for production-grade applications. Author(s) None Shaikh, an expert in database technologies and real-time system design, brings years of hands-on experience working with open-source databases like RethinkDB. Known for writing practical technical books, None emphasizes real-world applications and clarity to help both novice and seasoned developers excel. Who is it for? This book is ideal for developers who are building real-time applications and want to adopt RethinkDB for their solutions. Readers should have a basic understanding of RethinkDB and Node.js to get the most benefit. It's particularly suited for programmers looking to deepen their database administration skills and enhance their real-time data handling expertise.

Building Web Apps that Respect a User's Privacy and Security

A recent survey from the Pew Research Center found that few Americans are confident about the security or privacy of their data—particularly when it comes to the use of online tools. As a web developer, you represent the first line of defense in protecting your user’s data and privacy. This report explores several techniques, tools, and best practices for developing and maintaining web apps that provide the privacy and security that every user needs—and deserves. Each individual now produces more data every day than people in earlier generations did throughout their lifetimes. Every time we click, tweet, or visit a site, we leave a digital trace. As web developers, we’re responsible for shaping the experiences of users’ online lives. By making ethical, user-centered choices, we can create a better Web for everyone. Learn how web tracking works, and how you can provide users with greater privacy controls Explore HTTPS and learn how to use this protocol to encrypt user connections Use web development frameworks that provide baked-in security support for protecting user data Learn methods for securing user authentication, and for sanitizing and validating user input Provide exports that allow users to reclaim their data if and when you close your service This is the third report in the Ethical Web Development series from author Adam Scott. Previous reports in this series include Building Web Apps for Everyone and Building Web Apps That Work Everywhere.

Data modeling with Cassandra

In this lesson, you’ll learn how to design data models for Cassandra, including a data modeling process and notation. To apply this knowledge, we’ll design the data model for a sample application. This will help show how all the parts fit together. Along the way, we’ll use a tool to help us manage our CQL (Cassandra Query Language) scripts. What you’ll learn—and how you can apply it You will learn common patterns and antipatterns for data modeling in Cassandra. This lesson will cover the concepts around data modeling and will compare a Cassandra data model with an equivalent relational database model. You’ll learn about defining queries and about logical and physical database modeling. You’ll learn how to optimize your model for performance, and finally you’ll learn how to implement your model schema using CQL. This lesson is for you because… You are an application developer or architect who wants to learn how data is stored and processed in Cassandra. You are a database administrator who wants to learn about Cassandra. Prerequisites Helpful but not essential to have a basic understanding of relational vs. distributed databases. Helpful but not essential to understand Cassandra Query Language, CQL. Materials or downloads needed in advance None

Determining the right model for your experience

Inherent in creating a social layer into your experience is some form of relationships between people. There are different models, each of which create different kinds of social interactions and outcomes within an experience. What you'll learn—and how you can apply it This lesson reviews the different types of relationship models and shows you how to assess your specific goals to determine which model might be the right fit for your product or needs and what supporting tools are appropriate to create a rich relationship framework. Prerequisites You want to create or enhance a product with a social layer. This Lesson is taken from , 2nd Edition, by Erin Malone and Christian Crumlish. Designing Social Interfaces

Optimizing Cassandra performance

In this lesson, we look at how to tune Cassandra to improve performance. There are a variety of settings in the configuration file and on individual tables. Although the default settings are appropriate for many use cases, there might be circumstances in which you need to change them. We’ll look at how and why to make these changes. We also see how to use the cassandra-stress test tool that ships with Cassandra to generate load against Cassandra and quickly see how it behaves under stress test circumstances. We can then tune Cassandra appropriately and feel confident that we’re ready to deploy to a production environment. What you’ll learn—and how you can apply it You’ll learn how to monitor and analyze Cassandra performance. You’ll learn about Cassandra features such as caching, memtables, commit logs, SStables, hinted handoff, compaction, and threading to improve responsiveness, consistency, and speed and reduce data loss. We’ll also look at timeout properties and JVM settings. This lesson is for you because… You are a developer, database administrator, or architect who wants to learn how to tune Cassandra. Prerequisites Understanding of Cassandra architecture and data model. If you want to run cassandra-stress Cassandra installed with a running Cassandra cluster. Materials or downloads needed A Cassandra cluster if you want to run cassandra-stress

IBM DB2 12 for z/OS Technical Overview

IBM® DB2® 12 for z/OS® delivers key innovations that increase availability, reliability, scalability, and security for your business-critical information. In addition, DB2 12 for z/OS offers performance and functional improvements for both transactional and analytical workloads and makes installation and migration simpler and faster. DB2 12 for z/OS also allows you to develop applications for the cloud and mobile devices by providing self-provisioning, multitenancy, and self-managing capabilities in an agile development environment. DB2 12 for z/OS is also the first version of DB2 built for continuous delivery. This IBM Redbooks® publication introduces the enhancements made available with DB2 12 for z/OS. The contents help database administrators to understand the new functions and performance enhancements, to plan for ways to use the key new capabilities, and to justify the investment in installing or migrating to DB2 12.

Practical Data Science with Hadoop® and Spark: Designing and Building Effective Analytics at Scale

The Complete Guide to Data Science with Hadoop—For Technical Professionals, Businesspeople, and Students Demand is soaring for professionals who can solve real data science problems with Hadoop and Spark. is your complete guide to doing just that. Drawing on immense experience with Hadoop and big data, three leading experts bring together everything you need: high-level concepts, deep-dive techniques, real-world use cases, practical applications, and hands-on tutorials. Practical Data Science with Hadoop® and Spark The authors introduce the essentials of data science and the modern Hadoop ecosystem, explaining how Hadoop and Spark have evolved into an effective platform for solving data science problems at scale. In addition to comprehensive application coverage, the authors also provide useful guidance on the important steps of data ingestion, data munging, and visualization. Once the groundwork is in place, the authors focus on specific applications, including machine learning, predictive modeling for sentiment analysis, clustering for document analysis, anomaly detection, and natural language processing (NLP). This guide provides a strong technical foundation for those who want to do practical data science, and also presents business-driven guidance on how to apply Hadoop and Spark to optimize ROI of data science initiatives. Learn What data science is, how it has evolved, and how to plan a data science career How data volume, variety, and velocity shape data science use cases Hadoop and its ecosystem, including HDFS, MapReduce, YARN, and Spark Data importation with Hive and Spark Data quality, preprocessing, preparation, and modeling Visualization: surfacing insights from huge data sets Machine learning: classification, regression, clustering, and anomaly detection Algorithms and Hadoop tools for predictive modeling Cluster analysis and similarity functions Large-scale anomaly detection NLP: applying data science to human language

Beginning Elastic Stack

Learn how to install, configure and implement the Elastic Stack (Elasticsearch, Logstash and Kibana) – the invaluable tool for anyone deploying a centralized log management solution for servers and apps. You will see how to use and configure Elastic Stack independently and alongside Puppet. Each chapter includes real-world examples and practical troubleshooting tips, enabling you to get up and running with Elastic Stack in record time. Fully customizable and easy to use, Elastic Stack enables you to be on top of your servers all the time, and resolve problems for your clients as fast as possible. Supported by Puppet and available with various plugins. Get started with Beginning Elastic Stack today and see why many consider Elastic Stack the best option for server log management. What You Will Learn: Install and configure Logstash Use Logstash with Elasticsearch and Kibana Use Logstash with Puppet and Foreman Centralize data processing Who This Book Is For: Anyone working on multiple servers who needs to search their logs using a web interface. It is ideal for server administrators who have just started their job and need to look after multiple servers efficiently.

Expert Hadoop® Administration

The Comprehensive, Up-to-Date Apache Hadoop Administration Handbook and Reference “Sam Alapati has worked with production Hadoop clusters for six years. His unique depth of experience has enabled him to write the go-to resource for all administrators looking to spec, size, expand, and secure production Hadoop clusters of any size.” –Paul Dix, Series Editor In leading Hadoop administrator Sam R. Alapati brings together authoritative knowledge for creating, configuring, securing, managing, and optimizing production Hadoop clusters in any environment. Drawing on his experience with large-scale Hadoop administration, Alapati integrates action-oriented advice with carefully researched explanations of both problems and solutions. He covers an unmatched range of topics and offers an unparalleled collection of realistic examples. Expert Hadoop® Administration, Alapati demystifies complex Hadoop environments, helping you understand exactly what happens behind the scenes when you administer your cluster. You’ll gain unprecedented insight as you walk through building clusters from scratch and configuring high availability, performance, security, encryption, and other key attributes. The high-value administration skills you learn here will be indispensable no matter what Hadoop distribution you use or what Hadoop applications you run. Understand Hadoop’s architecture from an administrator’s standpoint Create simple and fully distributed clusters Run MapReduce and Spark applications in a Hadoop cluster Manage and protect Hadoop data and high availability Work with HDFS commands, file permissions, and storage management Move data, and use YARN to allocate resources and schedule jobs Manage job workflows with Oozie and Hue Secure, monitor, log, and optimize Hadoop Benchmark and troubleshoot Hadoop

Who Knew You Could Do That with RPG IV? Modern RPG for the Modern Programmer

Application development is a key part of IBM® i businesses. The IBM i operating system is a modern, robust platform to create and develop applications. The RPG language has been around for a long time, but is still being transformed into a modern business language. This IBM Redbooks® publication is focused on helping the IBM i development community understand the modern RPG language. The world of application development has been rapidly changing over the past years. The good news is that IBM i has been changing right along with it, and has made significant changes to the RPG language. This book is intended to help developers understand what modern RPG looks like and how to move from older versions of RPG to a newer, modern version. Additionally, it covers the basics of Integrated Language Environment® (ILE), interfacing with many other languages, and the best tools for doing development on IBM i. Using modern tools, methodologies, and languages are key to continuing to stay relevant in today's world. Being able to find the right talent for your company is key to your continued success. Using the guidelines and principles in this book can help set you up to find that talent today and into the future. This publication is the result of work that was done by IBM, industry experts, business partners, and some of the original authors of the first edition of this book. This information is important not only for developers, but also business decision makers (CIO for example) to understand that the IBM i is not an 'old' system. IBM i has modern languages and tools. It is a matter of what you choose to do with the IBM i that defines its age.

MDX with Microsoft SQL Server 2016 Analysis Services Cookbook - Third Edition

Dive into the world of multidimensional data analysis with "MDX with Microsoft SQL Server 2016 Analysis Services Cookbook." This book provides over 70 practical recipes to help you understand and utilize MDX queries and calculations effectively. What this Book will help me do Master the fundamentals of MDX concepts and their applications. Learn to create time-aware calculations using the Time dimension. Develop skills to write efficient and flexible MDX queries. Gain insights into creating compact and efficient analytical reports. Understand advanced techniques for capturing MDX queries and metadata-driven calculations. Author(s) None Li and Tomislav Piasevoli are accomplished experts in multidimensional data analysis and business intelligence. Drawing from extensive experience, they offer readers a well-structured and comprehensive approach to mastering MDX. Their pedagogy emphasizes practical, real-world examples promoting clear understanding. Who is it for? This volume is designed for database administrators, multidimensional cube developers, and report writers looking to enhance their strengths in MDX. Readers with intermediate exposure to multidimensional databases will particularly benefit. It also serves as a valuable resource for business analysts and power users aiming to boost data analysis capabilities.