Docker

Hands-On Software Engineering with Python - Second Edition

2025-12-23 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Brian Allbee

Agile/Scrum CI/CD Cloud Computing Data Modelling GitHub Pydantic Python programming-languages software-development

Grow your software engineering discipline, incorporating and mastering design, development, testing, and deployment best practices examples in a realistic Python project structure. Key Features Understand what makes Software Engineering a discipline, distinct from basic programming Gain practical insight into updating, refactoring, and scaling an existing Python system Implement robust testing, CI/CD pipelines, and cloud-ready architecture decisions Book Description Software engineering is more than coding; it’s the strategic design and continuous improvement of systems that serve real-world needs. This newly updated second edition of Hands-On Software Engineering with Python expands on its foundational approach to help you grow into a senior or staff-level engineering role. Fully revised for today’s Python ecosystem, this edition includes updated tooling, practices, and architectural patterns. You’ll explore key changes across five minor Python versions, examine new features like dataclasses and type hinting, and evaluate modern tools such as Poetry, pytest, and GitHub Actions. A new chapter introduces high-performance computing in Python, and the entire development process is enhanced with cloud-readiness in mind. You’ll follow a complete redesign and refactor of a multi-tier system from the first edition, gaining insight into how software evolves—and what it takes to do that responsibly. From system modeling and SDLC phases to data persistence, testing, and CI/CD automation, each chapter builds your engineering mindset while updating your hands-on skills. By the end of this book, you'll have mastered modern Python software engineering practices and be equipped to revise and future-proof complex systems with confidence. What you will learn Distinguish software engineering from general programming Break down and apply each phase of the SDLC to Python systems Create system models to plan architecture before writing code Apply Agile, Scrum, and other modern development methodologies Use dataclasses, pydantic, and schemas for robust data modeling Set up CI/CD pipelines with GitHub Actions and cloud build tools Write and structure unit, integration, and end-to-end tests Evaluate and integrate tools like Poetry, pytest, and Docker Who this book is for This book is for Python developers with a basic grasp of software development who want to grow into senior or staff-level engineering roles. It’s ideal for professionals looking to deepen their understanding of software architecture, system modeling, testing strategies, and cloud-aware development. Familiarity with core Python programming is required, as the book focuses on applying engineering principles to maintain, extend, and modernize real-world systems.

PostgreSQL Skills Development on Cloud: A Practical Guide to Database Management with AWS and Azure

2024-12-05 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Venkateswara Vadlamani

AWS Azure Cloud Computing Linux Redshift S3 data data-engineering postgresql relational-databases

This book provides a comprehensive approach to manage PostgreSQL cluster databases on Amazon Web Services and Azure Web Services on the cloud, as well as in Docker and container environments on a Red Hat operating system. Furthermore, detailed references for managing PostgreSQL on both Windows and Mac are provided. This book condenses all the fundamental and essential concepts you need to manage a PostgreSQL cluster into a one-stop guide that is perfect for newcomers to Postgres database administration. Each chapter of the book provides historical context and documents version changes of the PostgreSQL cluster, elucidates practical "how-to" methods, and includes illustrations and key word definitions, practices for application, a summary of key learnings, and questions to reinforce understanding. The book also outlines a clear study objective with a weekly learning schedule and hundreds of practice exercises, along with questions and answers. With its comprehensive and practical approach, this book will help you gain the confidence to manage all aspects of a PostgreSQL cluster in critical production environments so you can better support your organization's database infrastructure on the cloud and in containers. What You Will Learn Install and configure Postgres clusters on the cloud and in containers, monitor database logs, start and stop databases, troubleshoot, tune performance, backup and recover, and integrate with Amazon S3 and Azure Data Blob Manage Postgres databases on Amazon Web Services and Azure Web Services on the cloud, as well as in Docker and container environments on a Red Hat operating system Access sample references to scripting solutions and database management tools for working with Postgres, Redshift (based on Postgres 8.2), and Docker Create Amazon Machine Images (AMI) and Azure Images for managing a fleet of Postgres clusters on the cloud Reinforce knowledge with a weekly learning schedule and hundreds of practice exercises, along with questions and answers Progress from simple concepts, such as how to choose the correct instance type, to creating complex machine images Gain access to an Amazon AMI with a DBA admin tool, allowing you to learn Postgres, Redshift, and Docker in a cloud environment Refer to a comprehensive summary of documentations of Postgres, Amazon Web services, Azure Web services, and Red Hat Linux for managing all aspects of Postgres cluster management on the cloud Who This Book Is For Newcomers to PostgreSQL database administration and cross-platform support DBAs looking to master PostgreSQL on the cloud.

Applied Data Science Using PySpark: Learn the End-to-End Predictive Model-Building Cycle

2024-12-01 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Venkata Gunnu , Balaji Dhamodharan , Ramcharan Kakarla , Sundar Krishnan

AI/ML API Data Science PySpark Spark apache-spark data data-engineering

This comprehensive guide, featuring hand-picked examples of daily use cases, will walk you through the end-to-end predictive model-building cycle using the latest techniques and industry tricks. In Chapters 1, 2, and 3, we will begin by setting up the environment and covering the basics of PySpark, focusing on data manipulation. Chapter 4 delves into the art of variable selection, demonstrating various techniques available in PySpark. In Chapters 5, 6, and 7, we explore machine learning algorithms, their implementations, and fine-tuning techniques. Chapters 8 and 9 will guide you through machine learning pipelines and various methods to operationalize and serve models using Docker/API. Chapter 10 will demonstrate how to unlock the power of predictive models to create a meaningful impact on your business. Chapter 11 introduces some of the most widely used and powerful modeling frameworks to unlock real value from data. In this new edition, you will learn predictive modeling frameworks that can quantify customer lifetime values and estimate the return on your predictive modeling investments. This edition also includes methods to measure engagement and identify actionable populations for effective churn treatments. Additionally, a dedicated chapter on experimentation design has been added, covering steps to efficiently design, conduct, test, and measure the results of your models. All code examples have been updated to reflect the latest stable version of Spark. You will: Gain an overview of end-to-end predictive model building Understand multiple variable selection techniques and their implementations Learn how to operationalize models Perform data science experiments and learn useful tips

Big Data on Kubernetes

2024-07-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Neylson Crepalde

Airflow BI Big Data Kafka Kubernetes Python Spark SQL YAML data data-engineering streaming-messaging

Big Data on Kubernetes is your comprehensive guide to leveraging Kubernetes for scalable and efficient big data solutions. You will learn key concepts of Kubernetes architecture and explore tools like Apache Spark, Airflow, and Kafka. Gain hands-on experience building complete data pipelines to tackle real-world data challenges. What this Book will help me do Understand Kubernetes architecture and learn to deploy and manage clusters. Build and orchestrate big data pipelines using Spark, Airflow, and Kafka. Develop scalable and resilient data solutions with Docker and Kubernetes. Integrate and optimize data tools for real-time ingestion and processing. Apply concepts to hands-on projects addressing actual big data scenarios. Author(s) Neylson Crepalde is an experienced data specialist with extensive knowledge of Kubernetes and big data solutions. With deep practical experience, Neylson brings real-world insights to his writing. His approach emphasizes actionable guidance and relatable problem-solving with a strong foundation in scalable architecture. Who is it for? This book is ideal for data engineers, BI analysts, data team leaders, and tech managers familiar with Python, SQL, and YAML. Targeted at professionals seeking to develop or expand their expertise in scalable big data solutions, it provides practical insights into Docker, Kubernetes, and prominent big data tools.

High Performance PostgreSQL for Rails

2024-06-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Andrew Atkinson

Linux SaaS SQL data data-engineering postgresql relational-databases

Build faster, more reliable Rails apps by taking the best advanced PostgreSQL and Active Record capabilities, and using them to solve your application scale and growth challenges. Gain the skills needed to comfortably work with multi-terabyte databases, and with complex Active Record, SQL, and specialized Indexes. Develop your skills with PostgreSQL on your laptop, then take them into production, while keeping everything in sync. Make slow queries fast, perform any schema or data migration without errors, use scaling techniques like read/write splitting, partitioning, and sharding, to meet demanding workload requirements from Internet scale consumer apps to enterprise SaaS. Deepen your firsthand knowledge of high-scale PostgreSQL databases and Ruby on Rails applications with dozens of practical and hands-on exercises. Unlock the mysteries surrounding complex Active Record. Make any schema or data migration change confidently, without downtime. Grow your experience with modern and exclusive PostgreSQL features like SQL Merge, Returning, and Exclusion constraints. Put advanced capabilities like Full Text Search and Publish Subscribe mechanisms built into PostgreSQL to work in your Rails apps. Improve the quality of the data in your database, using the advanced and extensible system of types and constraints to reduce and eliminate application bugs. Tackle complex topics like how to improve query performance using specialized indexes. Discover how to effectively use built-in database functions and write your own, administer replication, and make the most of partitioning and foreign data wrappers. Use more than 40 well-supported open source tools to extend and enhance PostgreSQL and Ruby on Rails. Gain invaluable insights into database administration by conducting advanced optimizations - including high-impact database maintenance - all while solving real-world operational challenges. Take your new skills into production today and then take your PostgreSQL and Rails applications to a whole new level of reliability and performance. What You Need: A computer running macOS, Linux, or Windows and WSL2 PostgreSQL version 16, installed by package manager, compiled, or running with Docker An Internet connection

The Complete Developer

2024-03-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Martin Krause

API GitHub JavaScript MongoDB NoSQL React TypeScript data data-engineering nosql-databases

Whether you’ve been in the developer kitchen for decades or are just taking the plunge to do it yourself, The Complete Developer will show you how to build and implement every component of a modern stack—from scratch. You’ll go from a React-driven frontend to a fully fleshed-out backend with Mongoose, MongoDB, and a complete set of REST and GraphQL APIs, and back again through the whole Next.js stack. The book’s easy-to-follow, step-by-step recipes will teach you how to build a web server with Express.js, create custom API routes, deploy applications via self-contained microservices, and add a reactive, component-based UI. You’ll leverage command line tools and full-stack frameworks to build an application whose no-effort user management rides on GitHub logins. You’ll also learn how to: Work with modern JavaScript syntax, TypeScript, and the Next.js framework Simplify UI development with the React library Extend your application with REST and GraphQL APIs Manage your data with the MongoDB NoSQL database Use OAuth to simplify user management, authentication, and authorization Automate testing with Jest, test-driven development, stubs, mocks, and fakes Whether you’re an experienced software engineer or new to DIY web development, The Complete Developer will teach you to succeed with the modern full stack. After all, control matters. Covers: Docker, Express.js, JavaScript, Jest, MongoDB, Mongoose, Next.js, Node.js, OAuth, React, REST and GraphQL APIs, and TypeScript

Red Hat OpenShift Container Platform for IBM zCX

2022-10-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ravi Kumar , Andy Armstrong , Vic Cross , Redelf Janssen , Pablo Paniagua , Lydia Parziale , Maike Havemann

IBM Linux data data-engineering

Application modernization is essential for continuous improvements to your business value. Modernizing your applications includes improvements to your software architecture, application infrastructure, development techniques, and business strategies. All of which allows you to gain increased business value from existing application code. IBM® z/OS® Container Extensions (IBM zCX) is a part of the IBM z/OS operating system. It makes it possible to run Linux on IBM Z® applications that are packaged as Docker container images on z/OS. Application developers can develop, and data centers can operate, popular open source packages, Linux applications, IBM software, and third-party software together with z/OS applications and data. This IBM Redbooks® publication presents the capabilities of IBM zCX along with several use cases that demonstrate Red Hat OpenShift Container Platform for IBM zCX and the application modernization benefits your business can realize.

Logging in Action

2022-04-10 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Phil Wilkins

Cloud Computing IoT JSON Kubernetes MongoDB data data-engineering elastic-stack-elk-stack elastic stack (elk stack) elasticsearch search

Make log processing a real asset to your organization with powerful and free open source tools. In Logging in Action you will learn how to: Deploy Fluentd and Fluent Bit into traditional on-premises, IoT, hybrid, cloud, and multi-cloud environments, both small and hyperscaled Configure Fluentd and Fluent Bit to solve common log management problems Use Fluentd within Kubernetes and Docker services Connect a custom log source or destination with Fluentd’s extensible plugin framework Logging best practices and common pitfalls Logging in Action is a guide to optimize and organize logging using the CNCF Fluentd and Fluent Bit projects. You’ll use the powerful log management tool Fluentd to solve common log management, and learn how proper log management can improve performance and make management of software and infrastructure solutions easier. Through useful examples like sending log-driven events to Slack, you’ll get hands-on experience applying structure to your unstructured data. About the Technology Don’t fly blind! An effective logging system can help you see and correct problems before they cripple your software. With the Fluentd log management tool, it’s a snap to monitor the behavior and health of your software and infrastructure in real time. Designed to collect and process log data from multiple sources using the industry-standard JSON format, Fluentd delivers a truly unified logging layer across all your systems. About the Book Logging in Action teaches you to record and analyze application and infrastructure data using Fluentd. Using clear, relevant examples, it shows you exactly how to transform raw system data into a unified stream of actionable information. You’ll discover how logging configuration impacts the way your system functions and set up Fluentd to handle data from legacy IT environments, local data centers, and massive Kubernetes-driven distributed systems. You’ll even learn how to implement complex log parsing with RegEx and output events to MongoDB and Slack. What's Inside Capture log events from a wide range of systems and software, including Kubernetes and Docker Connect to custom log sources and destinations Employ Fluentd’s extensible plugin framework Create a custom plugin for niche problems About the Reader For developers, architects, and operations professionals familiar with the basics of monitoring and logging. About the Author Phil Wilkins has spent over 30 years in the software industry. Has worked for small startups through to international brands. Quotes I highly recommend using Logging in Action as a getting-started guide, a refresher, or as a way to optimize your logging journey. - From the Foreword by Anurag Gupta, Fluent maintainer and Cofounder, Calyptia Covers everything you need if you want to implement a logging system using open source technology such as Fluentd and Kubernetes. - Alex Saez, Naranja X A great exploration of the features and capabilities of Fluentd, along with very useful hands-on exercises. - George Thomas, Manhattan Associates A practical holistic guide to integrating logging into your enterprise architecture. - Satej Sahu, Honeywell

Modern Data Engineering with Apache Spark: A Hands-On Guide for Building Mission-Critical Streaming Applications

2022-03-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Scott Haines (Databricks)

AI/ML Airflow Data Contracts Data Engineering Kafka Kubernetes MySQL Redis S3 Spark SQL Data Streaming +3 more

Leverage Apache Spark within a modern data engineering ecosystem. This hands-on guide will teach you how to write fully functional applications, follow industry best practices, and learn the rationale behind these decisions. With Apache Spark as the foundation, you will follow a step-by-step journey beginning with the basics of data ingestion, processing, and transformation, and ending up with an entire local data platform running Apache Spark, Apache Zeppelin, Apache Kafka, Redis, MySQL, Minio (S3), and Apache Airflow. Apache Spark applications solve a wide range of data problems from traditional data loading and processing to rich SQL-based analysis as well as complex machine learning workloads and even near real-time processing of streaming data. Spark fits well as a central foundation for any data engineering workload. This book will teach you to write interactive Spark applications using Apache Zeppelin notebooks, write and compilereusable applications and modules, and fully test both batch and streaming. You will also learn to containerize your applications using Docker and run and deploy your Spark applications using a variety of tools such as Apache Airflow, Docker and Kubernetes. Reading this book will empower you to take advantage of Apache Spark to optimize your data pipelines and teach you to craft modular and testable Spark applications. You will create and deploy mission-critical streaming spark applications in a low-stress environment that paves the way for your own path to production. What You Will Learn Simplify data transformation with Spark Pipelines and Spark SQL Bridge data engineering with machine learning Architect modular data pipeline applications Build reusable application components and libraries Containerize your Spark applications for consistency and reliability Use Docker and Kubernetes to deploy your Spark applications Speed up application experimentation using Apache Zeppelin and Docker Understand serializable structured data and data contracts Harness effective strategies for optimizing data in your data lakes Build end-to-end Spark structured streaming applications using Redis and Apache Kafka Embrace testing for your batch and streaming applications Deploy and monitor your Spark applications Who This Book Is For Professional software engineers who want to take their current skills and apply them to new and exciting opportunities within the data ecosystem, practicing data engineers who are looking for a guiding light while traversing the many challenges of moving from batch to streaming modes, data architects who wish to provide clear and concise direction for how best to harness anduse Apache Spark within their organization, and those interested in the ins and outs of becoming a modern data engineer in today's fast-paced and data-hungry world

Cassandra: The Definitive Guide, (Revised) Third Edition, 3rd Edition

2022-01-23 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Eben Hewitt , Jeff Carpenter

Cassandra Cloud Computing Data Modelling ELK Kafka Kubernetes Spark data data-engineering nosql-databases

Imagine what you could do if scalability wasn't a problem. With this hands-on guide, you'll learn how the Cassandra database management system handles hundreds of terabytes of data while remaining highly available across multiple data centers. This revised third edition--updated for Cassandra 4.0 and new developments in the Cassandra ecosystem, including deployments in Kubernetes with K8ssandra--provides technical details and practical examples to help you put this database to work in a production environment. Authors Jeff Carpenter and Eben Hewitt demonstrate the advantages of Cassandra's nonrelational design, with special attention to data modeling. Developers, DBAs, and application architects looking to solve a database scaling issue or future-proof an application will learn how to harness Cassandra's speed and flexibility. Understand Cassandra's distributed and decentralized structure Use the Cassandra Query Language (CQL) and cqlsh (the CQL shell) Create a working data model and compare it with an equivalent relational model Design and develop applications using client drivers Explore cluster topology and learn how nodes exchange data Maintain a high level of performance in your cluster Deploy Cassandra onsite, in the cloud, or with Docker and Kubernetes Integrate Cassandra with Spark, Kafka, Elasticsearch, Solr, and Lucene

Applied Data Science Using PySpark: Learn the End-to-End Predictive Model-Building Cycle

2020-12-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ramcharan Kakarla , Sridhar Alla , Sundar Krishnan

AI/ML API Data Science PySpark apache-spark data data-engineering

Discover the capabilities of PySpark and its application in the realm of data science. This comprehensive guide with hand-picked examples of daily use cases will walk you through the end-to-end predictive model-building cycle with the latest techniques and tricks of the trade. Applied Data Science Using PySpark is divided unto six sections which walk you through the book. In section 1, you start with the basics of PySpark focusing on data manipulation. We make you comfortable with the language and then build upon it to introduce you to the mathematical functions available off the shelf. In section 2, you will dive into the art of variable selection where we demonstrate various selection techniques available in PySpark. In section 3, we take you on a journey through machine learning algorithms, implementations, and fine-tuning techniques. We will also talk about different validation metrics and how to use them for picking the best models. Sections 4 and 5 go through machine learning pipelines and various methods available to operationalize the model and serve it through Docker/an API. In the final section, you will cover reusable objects for easy experimentation and learn some tricks that can help you optimize your programs and machine learning pipelines. By the end of this book, you will have seen the flexibility and advantages of PySpark in data science applications. This book is recommended to those who want to unleash the power of parallel computing by simultaneously working with big datasets. What You Will Learn Build an end-to-end predictive model Implement multiple variable selection techniques Operationalize models Master multiple algorithms and implementations Who This Book is For Data scientists and machine learning and deep learning engineers who want to learn and use PySpark for real-time analysis of streamingdata.

MongoDB Topology Design: Scalability, Security, and Compliance on a Global Scale

2020-09-10 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Nicholas Cottrell

Cloud Computing DevOps GDPR/CCPA Kubernetes MongoDB Cyber Security data data-engineering nosql-databases

Create a world-class MongoDB cluster that is scalable, reliable, and secure. Comply with mission-critical regulatory regimes such as the European Union’s General Data Protection Regulation (GDPR). Whether you are thinking of migrating to MongoDB or need to meet legal requirements for an existing self-managed cluster, this book has you covered. It begins with the basics of replication and sharding, and quickly scales up to cover everything you need to know to control your data and keep it safe from unexpected data loss or downtime. This book covers best practices for stable MongoDB deployments. For example, a well-designed MongoDB cluster should have no single point of failure. The book covers common use cases when only one or two data centers are available. It goes into detail about creating geopolitical sharding configurations to cover the most stringent data protection regulation compliance. The book also covers different tools and approaches for automating and monitoring a cluster with Kubernetes, Docker, and popular cloud provider containers. What You Will Learn Get started with the basics of MongoDB clusters Protect and monitor a MongoDB deployment Deepen your expertise around replication and sharding Keep effective backups and plan ahead for disaster recovery Recognize and avoid problems that can occur in distributed databases Build optimal MongoDB deployments within hardware and data center limitations Who This Book Is For Solutions architects, DevOps architects and engineers, automation and cloud engineers, and database administrators who are new to MongoDB and distributed databases or who need to scale up simple deployments. This book is a complete guide to planning a deployment for optimal resilience, performance, and scaling, and covers all the details required to meet the new set of data protection regulations such as the GDPR. This book is particularly relevant for large global organizations such as financial and medical institutions, as well as government departments that need to control data in the whole stack and are prohibited from using managed cloud services.

Mastering SQL Server 2017

2019-08-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Christian Cote , Milos Radivojevic , William Durkin , Dejan Sarka , Matija Lah

AI/ML Azure BI DWH ETL/ELT JSON Linux Microsoft Python SQL SQL Server SSIS +4 more

Leverage the power of SQL Server 2017 Integration Services to build data integration solutions with ease Key Features Work with temporal tables to access information stored in a table at any time Get familiar with the latest features in SQL Server 2017 Integration Services Program and extend your packages to enhance their functionality Book Description Microsoft SQL Server 2017 uses the power of R and Python for machine learning and containerization-based deployment on Windows and Linux. By learning how to use the features of SQL Server 2017 effectively, you can build scalable apps and easily perform data integration and transformation. You'll start by brushing up on the features of SQL Server 2017. This Learning Path will then demonstrate how you can use Query Store, columnstore indexes, and In-Memory OLTP in your apps. You'll also learn to integrate Python code in SQL Server and graph database implementations for development and testing. Next, you'll get up to speed with designing and building SQL Server Integration Services (SSIS) data warehouse packages using SQL server data tools. Toward the concluding chapters, you'll discover how to develop SSIS packages designed to maintain a data warehouse using the data flow and other control flow tasks. By the end of this Learning Path, you'll be equipped with the skills you need to design efficient, high-performance database applications with confidence. This Learning Path includes content from the following Packt books: SQL Server 2017 Developer's Guide by Milos Radivojevic, Dejan Sarka, et. al SQL Server 2017 Integration Services Cookbook by Christian Cote, Dejan Sarka, et. al What you will learn Use columnstore indexes to make storage and performance improvements Extend database design solutions using temporal tables Exchange JSON data between applications and SQL Server Migrate historical data to Microsoft Azure by using Stretch Database Design the architecture of a modern Extract, Transform, and Load (ETL) solution Implement ETL solutions using Integration Services for both on-premise and Azure data Who this book is for This Learning Path is for database developers and solution architects looking to develop ETL solutions with SSIS, and explore the new features in SSIS 2017. Advanced analysis practitioners, business intelligence developers, and database consultants dealing with performance tuning will also find this book useful. Basic understanding of database concepts and T-SQL is required to get the best out of this Learning Path.

Deploying a Database Instance in an IBM Cloud Private Cluster on IBM Z

2019-07-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Christian May

Cloud Computing IBM Kubernetes MariaDB Virtual Machine data data-engineering

This IBM® Redpaper™ publication shows you how to deploy a database instance within a container using an IBM Cloud™ Private cluster on IBM Z®. A preinstalled IBM Spectrum™ Scale 5.0.3 cluster file system provides back-end storage for the persistent volumes bound to the database. A container is a standard unit of software that packages code and all its dependencies, so the application runs quickly and reliably from one computing environment to another. By default, containers are ephemeral. However, stateful applications, such as databases, require some type of persistent storage that can survive service restarts or container crashes. IBM provides several products helping organizations build an environment on an IBM Z infrastructure to develop and manage containerized applications, including dynamic provisioning of persistent volumes. As an example for a stateful application, this paper describes how to deploy the relational database MariaDB using a Helm chart. The IBM Spectrum Scale V5.0.3 cluster file system is providing back-end storage for the persistent volumes. This document provides step-by-step guidance regarding how to install and configure the following components: IBM Cloud Private 3.1.2 (including Kubernetes) Docker 18.03.1-ce IBM Storage Enabler for Containers 2.0.0 and 2.1.0 This Redpaper demonstrates how we set up the example for a stateful application in our lab. The paper gives you insights about planning for your implementation. IBM Z server hardware, the IBM Z hypervisor z/VM®, and the IBM Spectrum Scale cluster file system are prerequisites to set up the example environment. The Redpaper is written with the assumption that you have familiarity with and basic knowledge of the software products used in setting up the environment. The intended audience includes the following roles: Storage administrators IT/Cloud administrators Technologists IT specialists

Pro SQL Server on Linux: Including Container-Based Deployment with Docker and Kubernetes

2018-10-27 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Bob Ward (Azure Data)

Kubernetes Linux Microsoft Oracle Cyber Security SQL SQL Server data data-engineering microsoft-sql-server postgresql relational-databases

Get SQL Server up and running on the Linux operating system and containers. No database professional managing or developing SQL Server on Linux will want to be without this deep and authoritative guide by one of the most respected experts on SQL Server in the industry. Get an inside look at how SQL Server for Linux works through the eyes of an engineer on the team that made it possible. Microsoft SQL Server is one of the leading database platforms in the industry, and SQL Server 2017 offers developers and administrators the ability to run a database management system on Linux, offering proven support for enterprise-level features and without onerous licensing terms. Organizations invested in Microsoft and open source technologies are now able to run a unified database platform across all their operating system investments. Organizations are further able to take full advantage of containerization through popular platforms such as Docker and Kubernetes. Pro SQL Server on Linux walks you through installing and configuring SQL Server on the Linux platform. The author is one of the principal architects of SQL Server for Linux, and brings a corresponding depth of knowledge that no database professional or developer on Linux will want to be without. Throughout this book are internals of how SQL Server on Linux works including an in depth look at the innovative architecture. The book covers day-to-day management and troubleshooting, including diagnostics and monitoring, the use of containers to manage deployments, and the use of self-tuning and the in-memory capabilities. Also covered are performance capabilities, high availability, and disaster recovery along with security and encryption. The book covers the product-specific knowledge to bring SQL Server and its powerful features to life on the Linux platform, including coverage of containerization through Docker and Kubernetes. What You'll Learn Learn about the history and internal of the unique SQL Server on Linux architecture. Install and configure Microsoft’s flagship database product on the Linux platform Manage your deployments using container technology through Docker and Kubernetes Know the basics of building databases, the T-SQL language, and developing applications against SQL Server on Linux Use tools and features to diagnose, manage, and monitor SQL Server on Linux Scale your application by learning the performance capabilities of SQL Server Deliver high availability and disaster recovery to ensure business continuity Secure your database from attack, and protect sensitive data through encryption Take advantage of powerful features such as Failover Clusters, Availability Groups, In-Memory Support, and SQL Server’sSelf-Tuning Engine Learn how to migrate your database from older releases of SQL Server and other database platforms such as Oracle and PostgreSQL Build and maintain schemas, and perform management tasks from both GUI and command line Who This Book Is For Developers and IT professionals who are new to SQL Server and wish to configure it on the Linux operating system. This book is also useful to those familiar with SQL Server on Windows who want to learn the unique aspects of managing SQL Server on the Linux platform and Docker containers. Readers should have a grasp of relational database concepts and be comfortable with the SQL language.

Database Benchmarking and Stress Testing: An Evidence-Based Approach to Decisions on Architecture and Technology

2018-10-08 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Bert Scalzo

Cloud Computing data data-engineering database-architecture

Provide evidence-based answers that can be measured and relied upon by your business. Database administrators will be able to make sound architectural decisions in a fast-changing landscape of virtualized servers and container-based solutions based on the empirical method presented in this book for answering “what if” questions about database performance. Today’s database administrators face numerous questions such as: What if we consolidate databases using multitenant features? What if we virtualize database servers as Docker containers? What if we deploy the latest in NVMe flash disks to speed up IO access? Do features such as compression, partitioning, and in-memory OLTP earn back their price? What if we move our databases to the cloud? As an administrator, do you know the answers or even how to test the assumptions? Database Benchmarking and Stress Testing introduces you to database benchmarking using industry-standard test suites such as the TCP series of benchmarks, which are the same benchmarks that vendors rely upon. You’ll learn to run these industry-standard benchmarks and collect results to use in answering questions about the performance impact of architectural changes, technology changes, and even down to the brand of database software. You’ll learn to measure performance and predict the specific impact of changes to your environment. You’ll know the limitations of the benchmarks and the crucial difference between benchmarking and workload capture/reply. This book teaches you how to create empirical evidence in support of business and technology decisions. It’s about not guessing when you should be measuring. Empirical testing is scientific testing that delivers measurable results. Begin with a hypothesis about the impact of a possible architecture or technology change. Then run the appropriate benchmarks to gather data and predict whether the change you’re exploring will be beneficial, and by what order of magnitude. Stop guessing. Start measuring. Let Database Benchmarking and Stress Testing show the way. What You'll Learn Understand the industry-standard database benchmarks, and when each is best used Prepare for a database benchmarking effort so reliable results can be achieved Perform database benchmarking for consolidation, virtualization, and cloud projects Recognize and avoid common mistakes in benchmarking database performance Measure and interpret results in a rational, concise manner for reliable comparisons Choose and provide advice on benchmarking tools based on their pros and cons Who This Book Is For Database administrators and professionals responsible for advising on architectural decisions such as whether to use cloud-based services, whether to consolidate and containerize, and who must make recommendations on storage or any other technology that impacts database performance

Microsoft SQL Server 2017 on Linux

2018-06-08 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Benjamin Nevarez

Linux Microsoft Cyber Security SQL SQL Server data data-engineering microsoft-sql-server relational-databases

Essential Microsoft® SQL Server® 2017 installation, configuration, and management techniques for Linux Foreword by Kalen Delaney, Microsoft SQL Server MVP This comprehensive guide shows, step-by-step, how to set up, configure, and administer SQL Server 2017 on Linux for high performance and high availability. Written by a SQL Server expert and respected author, Microsoft SQL Server 2017 on Linux teaches valuable Linux skills to Windows-based SQL Server professionals. You will get clear coverage of both Linux and SQL Server and complete explanations of the latest features, tools, and techniques. The book offers clear instruction on adaptive query processing, automatic tuning, disaster recovery, security, and much more. •Understand how SQL Server 2017 on Linux works •Install and configure SQL Server on Linux •Run SQL Server on Docker containers •Learn Linux Administration •Troubleshoot and tune query performance in SQL Server •Learn what is new in SQL Server 2017 •Work with adaptive query processing and automatic tuning techniques •Implement high availability and disaster recovery for SQL Server on Linux •Learn the security features available in SQL Server

Camel in Action, Second Edition

2018-02-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jonathan Anstey , Claus Ibsen

Cloud Computing Java Kubernetes Cyber Security XML camel data data-engineering streaming-messaging

Camel in Action, Second Edition is the most complete Camel book on the market. Written by core developers of Camel and the authors of the highly acclaimed first edition, this book distills their experience and practical insights so that you can tackle integration tasks like a pro. About the Technology Apache Camel is a Java framework that implements enterprise integration patterns (EIPs) and comes with over 200 adapters to third-party systems. A concise DSL lets you build integration logic into your app with just a few lines of Java or XML. By using Camel, you benefit from the testing and experience of a large and vibrant open source community. About the Book Camel in Action, Second Edition is the definitive guide to the Camel framework. It starts with core concepts like sending, receiving, routing, and transforming data. It then goes in depth on many topics such as how to develop, debug, test, deal with errors, secure, scale, cluster, deploy, and monitor your Camel applications. The book also discusses how to run Camel with microservices, reactive systems, containers, and in the cloud. What's Inside Coverage of all relevant EIPs Camel microservices with Spring Boot Camel on Docker and Kubernetes Error handling, testing, security, clustering, monitoring, and deployment Hundreds of examples in Java and XML About the Reader Readers should be familiar with Java. This book is accessible to beginners and invaluable to experts. About the Authors Claus Ibsen is a senior principal engineer working for Red Hat specializing in cloud and integration. He has worked on Apache Camel for the last nine years where he heads the project. Claus lives in Denmark. Jonathan Anstey is an engineering manager at Red Hat and a core Camel contributor. He lives in Newfoundland, Canada. Quotes I highly recommend this book to anyone with even a passing interest in Apache Camel. Do take Camel for a ride...and don't get the hump! - From the Foreword by James Strachan, Creator of Apache Camel Claus and Jon are great writers, relying on figures and diagrams where needed and presenting lots of code snippets and worked examples. - From the Foreword by Dr. Mark Little, Technical Director of JBoss The second edition of this all-time classic is an indispensable companion for your Apache Camel rides. - Gregor Zurowski, Apache Camel Committer The absolute best way to learn and use Camel - top to bottom, front to back, and all the way through. Camel is a fantastic tool - every Java coder should have a copy of this book. - Rick Wagner, Red Hat An excellent book and the definite reference for experienced engineers. - Yan Guo, EventBrite

Expert Apache Cassandra Administration

2017-12-09 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Sam R. Alapati

Amazon EC2 Big Data Cassandra Data Modelling ELK Spark data data-engineering nosql-databases

Follow this handbook to build, configure, tune, and secure Apache Cassandra databases. Start with the installation of Cassandra and move on to the creation of a single instance, and then a cluster of Cassandra databases. Cassandra is increasingly a key player in many big data environments, and this book shows you how to use Cassandra with Apache Spark, a popular big data processing framework. Also covered are day-to-day topics of importance such as the backup and recovery of Cassandra databases, using the right compression and compaction strategies, and loading and unloading data. Expert Apache Cassandra Administration provides numerous step-by-step examples starting with the basics of a Cassandra database, and going all the way through backup and recovery, performance optimization, and monitoring and securing the data. The book serves as an authoritative and comprehensive guide to the building and management of simpleto complex Cassandra databases. The book: Takes you through building a Cassandra database from installation of the software and creation of a single database, through to complex clusters and data centers Provides numerous examples of actual commands in a real-life Cassandra environment that show how to confidently configure, manage, troubleshoot, and tune Cassandra databases Shows how to use the Cassandra configuration properties to build a highly stable, available, and secure Cassandra database that always operates at peak efficiency What You'll Learn Install the Cassandra software and create your first database Understand the Cassandra data model, and the internal architecture of a Cassandra database Create your own Cassandra cluster, step-by-step Run a Cassandra cluster on Docker Work with Apache Spark by connecting to a Cassandra database Deploy Cassandra clusters in your data center, or on Amazon EC2 instances Back up and restore mission-critical Cassandra databases Monitor, troubleshoot, and tune production Cassandra databases, and cut your spending on resources such as memory, servers, and storage Who This Book Is For Database administrators, developers, and architects who are looking for an authoritative and comprehensive single volume for all their Cassandra administration needs. Also for administrators who are tasked with setting up and maintaining highly reliable and high-performing Cassandra databases. An excellent choice for big data administrators, database administrators, architects, and developers who use Cassandra as their key data store, to support high volume online transactions, or as a decentralized, elastic data store.

Mastering RethinkDB

2016-12-16 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Shahid Shaikh

ELK JavaScript data data-engineering nosql-databases rethinkdb

Mastering RethinkDB offers a comprehensive guide to using the open-source, scalable database RethinkDB for real-time application development. Throughout this book, you'll gain practical knowledge on query management with ReQL, build dynamic web apps, and perform advanced database administration tasks. What this Book will help me do Gain expertise in managing and configuring RethinkDB clusters for optimal performance in real-time applications. Develop robust web applications using RethinkDB and integrate them seamlessly with Node.js. Leverage advanced querying features of ReQL, including geospatial and time-series queries. Enhance RethinkDB's capabilities with integration techniques for third-party libraries like ElasticSearch. Master deployment practices using platforms such as Docker and PaaS for production-grade applications. Author(s) None Shaikh, an expert in database technologies and real-time system design, brings years of hands-on experience working with open-source databases like RethinkDB. Known for writing practical technical books, None emphasizes real-world applications and clarity to help both novice and seasoned developers excel. Who is it for? This book is ideal for developers who are building real-time applications and want to adopt RethinkDB for their solutions. Readers should have a basic understanding of RethinkDB and Node.js to get the most benefit. It's particularly suited for programmers looking to deepen their database administration skills and enhance their real-time data handling expertise.

talk-data.com

Activity Trend

Top Events

Top Speakers

Hands-On Software Engineering with Python - Second Edition

PostgreSQL Skills Development on Cloud: A Practical Guide to Database Management with AWS and Azure

Applied Data Science Using PySpark: Learn the End-to-End Predictive Model-Building Cycle

Big Data on Kubernetes

High Performance PostgreSQL for Rails

The Complete Developer

Red Hat OpenShift Container Platform for IBM zCX

Logging in Action

Modern Data Engineering with Apache Spark: A Hands-On Guide for Building Mission-Critical Streaming Applications

Cassandra: The Definitive Guide, (Revised) Third Edition, 3rd Edition

Applied Data Science Using PySpark: Learn the End-to-End Predictive Model-Building Cycle

MongoDB Topology Design: Scalability, Security, and Compliance on a Global Scale

Mastering SQL Server 2017

Deploying a Database Instance in an IBM Cloud Private Cluster on IBM Z

Pro SQL Server on Linux: Including Container-Based Deployment with Docker and Kubernetes

Database Benchmarking and Stress Testing: An Evidence-Based Approach to Decisions on Architecture and Technology

Microsoft SQL Server 2017 on Linux

Camel in Action, Second Edition

Expert Apache Cassandra Administration

Mastering RethinkDB