talk-data.com talk-data.com

Topic

data-engineering

3377

tagged

Activity Trend

1 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: O'Reilly Data Engineering Books ×
Fast Data Processing Systems with SMACK Stack

Fast Data Processing Systems with SMACK Stack introduces you to the SMACK stack-a combination of Spark, Mesos, Akka, Cassandra, and Kafka. You will learn to integrate these technologies to build scalable, efficient, and real-time data processing platforms tailored for solving critical business challenges. What this Book will help me do Understand the concepts of fast data pipelines and design scalable architectures using the SMACK stack Gain expertise in functional programming with Scala and leverage its power in data processing tasks Build and optimize distributed databases using Apache Cassandra for scaling extensively Deploy and manage real-time data streams using Apache Kafka to handle massive messaging workloads Implement cost-effective cluster infrastructures with Apache Mesos for efficient resource utilization Author(s) None Estrada is an expert in distributed systems and big data technologies. With years of experience implementing SMACK-based solutions across industries, Estrada offers a practical viewpoint to designing scalable systems. Their blend of theoretical knowledge and applied practices ensures readers receive actionable guidance. Who is it for? This book is perfect for software developers, data engineers, or data scientists looking to deepen their understanding of real-time data processing systems. If you have a foundational knowledge of the technologies in the SMACK stack or wish to learn how to combine these cutting-edge tools to solve complex problems, this is for you. Readers with an interest in building efficient big data solutions will find tremendous value here.

MOS 2016 Study Guide for Microsoft Access

Advance your everyday proficiency with Access 2016. And earn the credential that proves it! Demonstrate your expertise with Microsoft Access! Designed to help you practice and prepare for Microsoft Office Specialist (MOS): Access 2016 certification, this official Study Guide delivers: • In-depth preparation for each MOS objective • Detailed procedures to help build the skills measured by the exam • Hands-on tasks to practice what you’ve learned • Practice files and sample solutions Sharpen the skills measured by these objectives: • Create and manage databases • Build tables • Create queries • Create forms • Create reports

IBM Business Process Manager Operations Guide

This IBM® Redbooks® publication provides operations teams with architectural design patterns and guidelines for the day-to-day challenges that they face when managing their IBM IBM Business Process Manager (BPM) infrastructure. Today, IBM BPM L2 and L3 Support and SWAT teams are constantly advising customers how to deal with the following common challenges: Deployment options (on-premises, patterns, cloud, and so on) Administration DevOps Automation Performance monitoring and tuning Infrastructure management Scalability High Availability and Data Recovery Federation This publication enables customers to become self-sufficient, promote consistency and accelerate IBM BPM Support engagements. This IBM Redbooks publication is targeted toward technical professionals (technical support staff, IT Architects, and IT Specialists) who are responsible for meeting day-to-day challenges that they face when they are managing an IBM BPM infrastructure.

Mastering RethinkDB

Mastering RethinkDB offers a comprehensive guide to using the open-source, scalable database RethinkDB for real-time application development. Throughout this book, you'll gain practical knowledge on query management with ReQL, build dynamic web apps, and perform advanced database administration tasks. What this Book will help me do Gain expertise in managing and configuring RethinkDB clusters for optimal performance in real-time applications. Develop robust web applications using RethinkDB and integrate them seamlessly with Node.js. Leverage advanced querying features of ReQL, including geospatial and time-series queries. Enhance RethinkDB's capabilities with integration techniques for third-party libraries like ElasticSearch. Master deployment practices using platforms such as Docker and PaaS for production-grade applications. Author(s) None Shaikh, an expert in database technologies and real-time system design, brings years of hands-on experience working with open-source databases like RethinkDB. Known for writing practical technical books, None emphasizes real-world applications and clarity to help both novice and seasoned developers excel. Who is it for? This book is ideal for developers who are building real-time applications and want to adopt RethinkDB for their solutions. Readers should have a basic understanding of RethinkDB and Node.js to get the most benefit. It's particularly suited for programmers looking to deepen their database administration skills and enhance their real-time data handling expertise.

Building Web Apps that Respect a User's Privacy and Security

A recent survey from the Pew Research Center found that few Americans are confident about the security or privacy of their data—particularly when it comes to the use of online tools. As a web developer, you represent the first line of defense in protecting your user’s data and privacy. This report explores several techniques, tools, and best practices for developing and maintaining web apps that provide the privacy and security that every user needs—and deserves. Each individual now produces more data every day than people in earlier generations did throughout their lifetimes. Every time we click, tweet, or visit a site, we leave a digital trace. As web developers, we’re responsible for shaping the experiences of users’ online lives. By making ethical, user-centered choices, we can create a better Web for everyone. Learn how web tracking works, and how you can provide users with greater privacy controls Explore HTTPS and learn how to use this protocol to encrypt user connections Use web development frameworks that provide baked-in security support for protecting user data Learn methods for securing user authentication, and for sanitizing and validating user input Provide exports that allow users to reclaim their data if and when you close your service This is the third report in the Ethical Web Development series from author Adam Scott. Previous reports in this series include Building Web Apps for Everyone and Building Web Apps That Work Everywhere.

Data modeling with Cassandra

In this lesson, you’ll learn how to design data models for Cassandra, including a data modeling process and notation. To apply this knowledge, we’ll design the data model for a sample application. This will help show how all the parts fit together. Along the way, we’ll use a tool to help us manage our CQL (Cassandra Query Language) scripts. What you’ll learn—and how you can apply it You will learn common patterns and antipatterns for data modeling in Cassandra. This lesson will cover the concepts around data modeling and will compare a Cassandra data model with an equivalent relational database model. You’ll learn about defining queries and about logical and physical database modeling. You’ll learn how to optimize your model for performance, and finally you’ll learn how to implement your model schema using CQL. This lesson is for you because… You are an application developer or architect who wants to learn how data is stored and processed in Cassandra. You are a database administrator who wants to learn about Cassandra. Prerequisites Helpful but not essential to have a basic understanding of relational vs. distributed databases. Helpful but not essential to understand Cassandra Query Language, CQL. Materials or downloads needed in advance None

Determining the right model for your experience

Inherent in creating a social layer into your experience is some form of relationships between people. There are different models, each of which create different kinds of social interactions and outcomes within an experience. What you'll learn—and how you can apply it This lesson reviews the different types of relationship models and shows you how to assess your specific goals to determine which model might be the right fit for your product or needs and what supporting tools are appropriate to create a rich relationship framework. Prerequisites You want to create or enhance a product with a social layer. This Lesson is taken from , 2nd Edition, by Erin Malone and Christian Crumlish. Designing Social Interfaces

Optimizing Cassandra performance

In this lesson, we look at how to tune Cassandra to improve performance. There are a variety of settings in the configuration file and on individual tables. Although the default settings are appropriate for many use cases, there might be circumstances in which you need to change them. We’ll look at how and why to make these changes. We also see how to use the cassandra-stress test tool that ships with Cassandra to generate load against Cassandra and quickly see how it behaves under stress test circumstances. We can then tune Cassandra appropriately and feel confident that we’re ready to deploy to a production environment. What you’ll learn—and how you can apply it You’ll learn how to monitor and analyze Cassandra performance. You’ll learn about Cassandra features such as caching, memtables, commit logs, SStables, hinted handoff, compaction, and threading to improve responsiveness, consistency, and speed and reduce data loss. We’ll also look at timeout properties and JVM settings. This lesson is for you because… You are a developer, database administrator, or architect who wants to learn how to tune Cassandra. Prerequisites Understanding of Cassandra architecture and data model. If you want to run cassandra-stress Cassandra installed with a running Cassandra cluster. Materials or downloads needed A Cassandra cluster if you want to run cassandra-stress

IBM DB2 12 for z/OS Technical Overview

IBM® DB2® 12 for z/OS® delivers key innovations that increase availability, reliability, scalability, and security for your business-critical information. In addition, DB2 12 for z/OS offers performance and functional improvements for both transactional and analytical workloads and makes installation and migration simpler and faster. DB2 12 for z/OS also allows you to develop applications for the cloud and mobile devices by providing self-provisioning, multitenancy, and self-managing capabilities in an agile development environment. DB2 12 for z/OS is also the first version of DB2 built for continuous delivery. This IBM Redbooks® publication introduces the enhancements made available with DB2 12 for z/OS. The contents help database administrators to understand the new functions and performance enhancements, to plan for ways to use the key new capabilities, and to justify the investment in installing or migrating to DB2 12.

Practical Data Science with Hadoop® and Spark: Designing and Building Effective Analytics at Scale

The Complete Guide to Data Science with Hadoop—For Technical Professionals, Businesspeople, and Students Demand is soaring for professionals who can solve real data science problems with Hadoop and Spark. is your complete guide to doing just that. Drawing on immense experience with Hadoop and big data, three leading experts bring together everything you need: high-level concepts, deep-dive techniques, real-world use cases, practical applications, and hands-on tutorials. Practical Data Science with Hadoop® and Spark The authors introduce the essentials of data science and the modern Hadoop ecosystem, explaining how Hadoop and Spark have evolved into an effective platform for solving data science problems at scale. In addition to comprehensive application coverage, the authors also provide useful guidance on the important steps of data ingestion, data munging, and visualization. Once the groundwork is in place, the authors focus on specific applications, including machine learning, predictive modeling for sentiment analysis, clustering for document analysis, anomaly detection, and natural language processing (NLP). This guide provides a strong technical foundation for those who want to do practical data science, and also presents business-driven guidance on how to apply Hadoop and Spark to optimize ROI of data science initiatives. Learn What data science is, how it has evolved, and how to plan a data science career How data volume, variety, and velocity shape data science use cases Hadoop and its ecosystem, including HDFS, MapReduce, YARN, and Spark Data importation with Hive and Spark Data quality, preprocessing, preparation, and modeling Visualization: surfacing insights from huge data sets Machine learning: classification, regression, clustering, and anomaly detection Algorithms and Hadoop tools for predictive modeling Cluster analysis and similarity functions Large-scale anomaly detection NLP: applying data science to human language

Beginning Elastic Stack

Learn how to install, configure and implement the Elastic Stack (Elasticsearch, Logstash and Kibana) – the invaluable tool for anyone deploying a centralized log management solution for servers and apps. You will see how to use and configure Elastic Stack independently and alongside Puppet. Each chapter includes real-world examples and practical troubleshooting tips, enabling you to get up and running with Elastic Stack in record time. Fully customizable and easy to use, Elastic Stack enables you to be on top of your servers all the time, and resolve problems for your clients as fast as possible. Supported by Puppet and available with various plugins. Get started with Beginning Elastic Stack today and see why many consider Elastic Stack the best option for server log management. What You Will Learn: Install and configure Logstash Use Logstash with Elasticsearch and Kibana Use Logstash with Puppet and Foreman Centralize data processing Who This Book Is For: Anyone working on multiple servers who needs to search their logs using a web interface. It is ideal for server administrators who have just started their job and need to look after multiple servers efficiently.

Expert Hadoop® Administration

The Comprehensive, Up-to-Date Apache Hadoop Administration Handbook and Reference “Sam Alapati has worked with production Hadoop clusters for six years. His unique depth of experience has enabled him to write the go-to resource for all administrators looking to spec, size, expand, and secure production Hadoop clusters of any size.” –Paul Dix, Series Editor In leading Hadoop administrator Sam R. Alapati brings together authoritative knowledge for creating, configuring, securing, managing, and optimizing production Hadoop clusters in any environment. Drawing on his experience with large-scale Hadoop administration, Alapati integrates action-oriented advice with carefully researched explanations of both problems and solutions. He covers an unmatched range of topics and offers an unparalleled collection of realistic examples. Expert Hadoop® Administration, Alapati demystifies complex Hadoop environments, helping you understand exactly what happens behind the scenes when you administer your cluster. You’ll gain unprecedented insight as you walk through building clusters from scratch and configuring high availability, performance, security, encryption, and other key attributes. The high-value administration skills you learn here will be indispensable no matter what Hadoop distribution you use or what Hadoop applications you run. Understand Hadoop’s architecture from an administrator’s standpoint Create simple and fully distributed clusters Run MapReduce and Spark applications in a Hadoop cluster Manage and protect Hadoop data and high availability Work with HDFS commands, file permissions, and storage management Move data, and use YARN to allocate resources and schedule jobs Manage job workflows with Oozie and Hue Secure, monitor, log, and optimize Hadoop Benchmark and troubleshoot Hadoop

Who Knew You Could Do That with RPG IV? Modern RPG for the Modern Programmer

Application development is a key part of IBM® i businesses. The IBM i operating system is a modern, robust platform to create and develop applications. The RPG language has been around for a long time, but is still being transformed into a modern business language. This IBM Redbooks® publication is focused on helping the IBM i development community understand the modern RPG language. The world of application development has been rapidly changing over the past years. The good news is that IBM i has been changing right along with it, and has made significant changes to the RPG language. This book is intended to help developers understand what modern RPG looks like and how to move from older versions of RPG to a newer, modern version. Additionally, it covers the basics of Integrated Language Environment® (ILE), interfacing with many other languages, and the best tools for doing development on IBM i. Using modern tools, methodologies, and languages are key to continuing to stay relevant in today's world. Being able to find the right talent for your company is key to your continued success. Using the guidelines and principles in this book can help set you up to find that talent today and into the future. This publication is the result of work that was done by IBM, industry experts, business partners, and some of the original authors of the first edition of this book. This information is important not only for developers, but also business decision makers (CIO for example) to understand that the IBM i is not an 'old' system. IBM i has modern languages and tools. It is a matter of what you choose to do with the IBM i that defines its age.

MDX with Microsoft SQL Server 2016 Analysis Services Cookbook - Third Edition

Dive into the world of multidimensional data analysis with "MDX with Microsoft SQL Server 2016 Analysis Services Cookbook." This book provides over 70 practical recipes to help you understand and utilize MDX queries and calculations effectively. What this Book will help me do Master the fundamentals of MDX concepts and their applications. Learn to create time-aware calculations using the Time dimension. Develop skills to write efficient and flexible MDX queries. Gain insights into creating compact and efficient analytical reports. Understand advanced techniques for capturing MDX queries and metadata-driven calculations. Author(s) None Li and Tomislav Piasevoli are accomplished experts in multidimensional data analysis and business intelligence. Drawing from extensive experience, they offer readers a well-structured and comprehensive approach to mastering MDX. Their pedagogy emphasizes practical, real-world examples promoting clear understanding. Who is it for? This volume is designed for database administrators, multidimensional cube developers, and report writers looking to enhance their strengths in MDX. Readers with intermediate exposure to multidimensional databases will particularly benefit. It also serves as a valuable resource for business analysts and power users aiming to boost data analysis capabilities.

SQL Server 2016 Reporting Services Cookbook

Dive into the world of Microsoft SQL Server 2016 Reporting Services with this cookbook-style guide that covers operational reporting and mobile dashboards. By following clear, task-oriented recipes, you'll quickly learn how to leverage SSRS 2016 for creating advanced, visually appealing, and functional reports to improve your reporting workflows and decision-making processes. What this Book will help me do Understand the architectural components and key features of SQL Server 2016 Reporting Services. Create advanced reporting solutions tailored to your organization's needs using step-by-step recipes. Utilize Power BI and mobile reporting capabilities for more interactive and accessible data insights. Master administration, security, and performance optimization of reporting environments. Integrate reporting solutions into .NET applications for custom business intelligence enhancements. Author(s) None Priyankara is an industry expert with years of experience in data warehousing and reporting solutions, bringing practical insights to the complex world of SQL Server Reporting Services. Co-author Robert Cain is a seasoned technology trainer and consultant specializing in SQL Server and Power BI. Together, they provide a comprehensive, hands-on guide rooted in real-world applications and best practices. Who is it for? This book is designed for software professionals who are involved in reporting and business intelligence, such as software engineers, architects, and DW/BI experts. If you're responsible for designing, implementing, or managing reporting platforms and want to explore SSRS 2016's capabilities, this is the perfect guide for you.

High Performance SQL Server: The Go Faster Book

Design and configure SQL Server instances and databases in support of high-throughput applications that are mission-critical and provide consistent response times in the face of variations in user numbers and query volumes. Learn to configure SQL Server and design your databases to support a given instance and workload. You'll learn advanced configuration options, in-memory technologies, storage and disk configuration, and more, all toward enabling your desired application performance and throughput. Configuration doesn't stop with implementation. Workloads change over time, and other impediments can arise to thwart desired performance. covers monitoring and troubleshooting to aid in detecting and fixing production performance problems and minimizing application outages. You'll learn a variety of tools, ranging from the traditional wait analysis methodology to the new query store, and you'll learn how improving performance is really an iterative process. High Performance SQL Server is based on SQL Server 2016, although most of its content can be applied to prior versions of the product. This book is an excellent complement to performance tuning books focusing on SQL queries, and provides the other half of what you need to know by focusing on configuring the instances on which mission-critical queries are executed. High Performance SQL Server Covers SQL Server instance-configuration for optimal performance Helps in implementing SQL Server in-memory technologies Provides guidance toward monitoring and ongoing diagnostics What You Will Learn Understand SQL Server's database engine and how it processes queries Configure instances in support of high-throughput applications Provide consistent response times to varying user numbers and query volumes Design databases for high-throughput applications with focus on performance Record performance baselines and monitor SQL Server instances against them Troubleshot and fix performance problems Who This Book Is For SQL Server database administrators, developers, and data architects. The book is also of use to system administrators who are managing and are responsible for the physical servers on which SQL Server instances are run.

Implementing IBM FlashSystem 900

Today’s global organizations depend on being able to unlock business insights from massive volumes of data. Now, with IBM® FlashSystem 900, powered by IBM FlashCore™ technology, they can make faster decisions based on real-time insights and unleash the power of the most demanding applications, including online transaction processing (OLTP) and analytics databases, virtual desktop infrastructures (VDIs), technical computing applications, and cloud environments. This IBM Redbooks® publication introduces clients to the IBM FlashSystem® 900. It provides in-depth knowledge of the product architecture, software and hardware, implementation, and hints and tips. Also illustrated are use cases that show real-world solutions for tiering, flash-only, and preferred-read, and also examples of the benefits gained by integrating the FlashSystem storage into business environments. This book is intended for pre-sales and post-sales technical support professionals and storage administrators, and for anyone who wants to understand how to implement this new and exciting technology. This book describes the following offerings of the IBM Spectrum™ Storage family: IBM Spectrum Storage™ IBM Spectrum Control™ IBM Spectrum Virtualize™ IBM Spectrum Scale™ IBM Spectrum Accelerate™

Apache HBase Primer

Learn the fundamental foundations and concepts of the Apache HBase (NoSQL) open source database. It covers the HBase data model, architecture, schema design, API, and administration. Apache HBase is the database for the Apache Hadoop framework. HBase is a column family based NoSQL database that provides a flexible schema model. What You'll Learn Work with the core concepts of HBase Discover the HBase data model, schema design, and architecture Use the HBase API and administration Who This Book Is For Apache HBase (NoSQL) database users, designers, developers, and admins.

The Big Data Transformation

Business executives today are well aware of the power of data, especially for gaining actionable insight into products and services. But how do you jump into the big data analytics game without spending millions on data warehouse solutions you don’t need? This 40-page report focuses on massively parallel processing (MPP) analytical databases that enable you to run queries and dashboards on a variety of business metrics at extreme speed and Exabyte scale. Because they leverage the full computational power of a cluster, MPP analytical databases can analyze massive volumes of data—both structured and semi-structured—at unprecedented speeds. This report presents five real-world case studies from Etsy, Cerner Corporation, Criteo and other global enterprises to focus on one big data analytics platform in particular, HPE Vertica. You’ll discover: How one prominent data storage company convinced both business and tech stakeholders to adopt an MPP analytical database Why performance marketing technology company Criteo used a Center of Excellence (CoE) model to ensure the success of its big data analytics endeavors How YPSM uses Vertica to speed up its Hadoop-based data processing environment Why Cerner adopted an analytical database to scale its highly successful health information technology platform How Etsy drives success with the company’s big data initiative by avoiding common technical and organizational mistakes