Docker

Hands-On Software Engineering with Python - Second Edition

2025-12-23 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Brian Allbee

Agile/Scrum CI/CD Cloud Computing Data Modelling GitHub Pydantic Python programming-languages software-development

Grow your software engineering discipline, incorporating and mastering design, development, testing, and deployment best practices examples in a realistic Python project structure. Key Features Understand what makes Software Engineering a discipline, distinct from basic programming Gain practical insight into updating, refactoring, and scaling an existing Python system Implement robust testing, CI/CD pipelines, and cloud-ready architecture decisions Book Description Software engineering is more than coding; it’s the strategic design and continuous improvement of systems that serve real-world needs. This newly updated second edition of Hands-On Software Engineering with Python expands on its foundational approach to help you grow into a senior or staff-level engineering role. Fully revised for today’s Python ecosystem, this edition includes updated tooling, practices, and architectural patterns. You’ll explore key changes across five minor Python versions, examine new features like dataclasses and type hinting, and evaluate modern tools such as Poetry, pytest, and GitHub Actions. A new chapter introduces high-performance computing in Python, and the entire development process is enhanced with cloud-readiness in mind. You’ll follow a complete redesign and refactor of a multi-tier system from the first edition, gaining insight into how software evolves—and what it takes to do that responsibly. From system modeling and SDLC phases to data persistence, testing, and CI/CD automation, each chapter builds your engineering mindset while updating your hands-on skills. By the end of this book, you'll have mastered modern Python software engineering practices and be equipped to revise and future-proof complex systems with confidence. What you will learn Distinguish software engineering from general programming Break down and apply each phase of the SDLC to Python systems Create system models to plan architecture before writing code Apply Agile, Scrum, and other modern development methodologies Use dataclasses, pydantic, and schemas for robust data modeling Set up CI/CD pipelines with GitHub Actions and cloud build tools Write and structure unit, integration, and end-to-end tests Evaluate and integrate tools like Poetry, pytest, and Docker Who this book is for This book is for Python developers with a basic grasp of software development who want to grow into senior or staff-level engineering roles. It’s ideal for professionals looking to deepen their understanding of software architecture, system modeling, testing strategies, and cloud-aware development. Familiarity with core Python programming is required, as the book focuses on applying engineering principles to maintain, extend, and modernize real-world systems.

Getting Started with Taipy

2025-10-06 · O'Reilly Data Science Books O'Reilly Amazon

book

by Eric Narro

AI/ML BI Cloud Computing KPI Pandas Python data data-science data-science-tools

Share your machine learning models, create chatbots, as well as build and deploy insightful dashboards speedily using Taipy with this hands-on book featuring real-world application examples from multiple industries Free with your book: DRM-free PDF version + access to Packt's next-gen Reader Key Features Create visually compelling, interactive data applications with Taipy Bring predictive models to end users and create data pipelines to compare scenarios with what-if analyses Go beyond prototypes to build and deploy production-ready applications using the cloud provider of your choice Purchase of the print or Kindle book includes a free PDF eBook in full color Book Description While data analysts, data scientists, and BI experts have the tools to analyze data, build models, and create compelling visuals, they often struggle to translate these insights into practical, user-friendly applications that help end users answer real-world questions, such as identifying revenue trends, predicting inventory needs, or detecting fraud, without wading through complex code. This book is a comprehensive guide to overcoming this challenge. This book teaches you how to use Taipy, a powerful open-source Python library, to build intuitive, production-ready data apps quickly and efficiently. Instead of creating prototypes that nobody uses, you'll learn how to build faster applications that process large amounts of data for multiple users and deliver measurable business impact. Taipy does the heavy lifting to enable your users to visualize their KPIs, interact with charts and maps, and compare scenarios for better decision-making. You’ll learn to use Taipy to build apps that make your data accessible and actionable in production environments like the cloud or Docker. By the end of this book, you won’t just understand Taipy, you'll be able to transform your data skills into impactful solutions that address real-world needs and deliver valuable insights. Email sign-up and proof of purchase required What you will learn Explore Taipy, its use cases, and how it's different from other projects Discover how to create visually appealing interactive apps, display KPIs, charts, and maps Understand how to compare scenarios to make better decisions Connect Taipy applications to several data sources and services Develop apps for diverse use cases, including chatbots, dashboards, ML apps, and maps Deploy Taipy applications on different types of servers and services Master advanced concepts for simplifying and accelerating your development workflow Who this book is for If you’re a data analyst, data scientist, or BI analyst looking to build production-ready data apps entirely in Python, this book is for you. If your scripts and models sit idle because non-technical stakeholders can’t use them, this book shows you how to turn them into full applications fast with Taipy, so your work delivers real business value. It’s also valuable for developers and engineers who want to streamline their data workflows and build UIs in pure Python.

PostgreSQL Skills Development on Cloud: A Practical Guide to Database Management with AWS and Azure

2024-12-05 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Venkateswara Vadlamani

AWS Azure Cloud Computing Linux Redshift S3 data data-engineering postgresql relational-databases

This book provides a comprehensive approach to manage PostgreSQL cluster databases on Amazon Web Services and Azure Web Services on the cloud, as well as in Docker and container environments on a Red Hat operating system. Furthermore, detailed references for managing PostgreSQL on both Windows and Mac are provided. This book condenses all the fundamental and essential concepts you need to manage a PostgreSQL cluster into a one-stop guide that is perfect for newcomers to Postgres database administration. Each chapter of the book provides historical context and documents version changes of the PostgreSQL cluster, elucidates practical "how-to" methods, and includes illustrations and key word definitions, practices for application, a summary of key learnings, and questions to reinforce understanding. The book also outlines a clear study objective with a weekly learning schedule and hundreds of practice exercises, along with questions and answers. With its comprehensive and practical approach, this book will help you gain the confidence to manage all aspects of a PostgreSQL cluster in critical production environments so you can better support your organization's database infrastructure on the cloud and in containers. What You Will Learn Install and configure Postgres clusters on the cloud and in containers, monitor database logs, start and stop databases, troubleshoot, tune performance, backup and recover, and integrate with Amazon S3 and Azure Data Blob Manage Postgres databases on Amazon Web Services and Azure Web Services on the cloud, as well as in Docker and container environments on a Red Hat operating system Access sample references to scripting solutions and database management tools for working with Postgres, Redshift (based on Postgres 8.2), and Docker Create Amazon Machine Images (AMI) and Azure Images for managing a fleet of Postgres clusters on the cloud Reinforce knowledge with a weekly learning schedule and hundreds of practice exercises, along with questions and answers Progress from simple concepts, such as how to choose the correct instance type, to creating complex machine images Gain access to an Amazon AMI with a DBA admin tool, allowing you to learn Postgres, Redshift, and Docker in a cloud environment Refer to a comprehensive summary of documentations of Postgres, Amazon Web services, Azure Web services, and Red Hat Linux for managing all aspects of Postgres cluster management on the cloud Who This Book Is For Newcomers to PostgreSQL database administration and cross-platform support DBAs looking to master PostgreSQL on the cloud.

Applied Data Science Using PySpark: Learn the End-to-End Predictive Model-Building Cycle

2024-12-01 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Venkata Gunnu , Balaji Dhamodharan , Ramcharan Kakarla , Sundar Krishnan

AI/ML API Data Science PySpark Spark apache-spark data data-engineering

This comprehensive guide, featuring hand-picked examples of daily use cases, will walk you through the end-to-end predictive model-building cycle using the latest techniques and industry tricks. In Chapters 1, 2, and 3, we will begin by setting up the environment and covering the basics of PySpark, focusing on data manipulation. Chapter 4 delves into the art of variable selection, demonstrating various techniques available in PySpark. In Chapters 5, 6, and 7, we explore machine learning algorithms, their implementations, and fine-tuning techniques. Chapters 8 and 9 will guide you through machine learning pipelines and various methods to operationalize and serve models using Docker/API. Chapter 10 will demonstrate how to unlock the power of predictive models to create a meaningful impact on your business. Chapter 11 introduces some of the most widely used and powerful modeling frameworks to unlock real value from data. In this new edition, you will learn predictive modeling frameworks that can quantify customer lifetime values and estimate the return on your predictive modeling investments. This edition also includes methods to measure engagement and identify actionable populations for effective churn treatments. Additionally, a dedicated chapter on experimentation design has been added, covering steps to efficiently design, conduct, test, and measure the results of your models. All code examples have been updated to reflect the latest stable version of Spark. You will: Gain an overview of end-to-end predictive model building Understand multiple variable selection techniques and their implementations Learn how to operationalize models Perform data science experiments and learn useful tips

Big Data on Kubernetes

2024-07-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Neylson Crepalde

Airflow BI Big Data Kafka Kubernetes Python Spark SQL YAML data data-engineering streaming-messaging

Big Data on Kubernetes is your comprehensive guide to leveraging Kubernetes for scalable and efficient big data solutions. You will learn key concepts of Kubernetes architecture and explore tools like Apache Spark, Airflow, and Kafka. Gain hands-on experience building complete data pipelines to tackle real-world data challenges. What this Book will help me do Understand Kubernetes architecture and learn to deploy and manage clusters. Build and orchestrate big data pipelines using Spark, Airflow, and Kafka. Develop scalable and resilient data solutions with Docker and Kubernetes. Integrate and optimize data tools for real-time ingestion and processing. Apply concepts to hands-on projects addressing actual big data scenarios. Author(s) Neylson Crepalde is an experienced data specialist with extensive knowledge of Kubernetes and big data solutions. With deep practical experience, Neylson brings real-world insights to his writing. His approach emphasizes actionable guidance and relatable problem-solving with a strong foundation in scalable architecture. Who is it for? This book is ideal for data engineers, BI analysts, data team leaders, and tech managers familiar with Python, SQL, and YAML. Targeted at professionals seeking to develop or expand their expertise in scalable big data solutions, it provides practical insights into Docker, Kubernetes, and prominent big data tools.

High Performance PostgreSQL for Rails

2024-06-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Andrew Atkinson

Linux SaaS SQL data data-engineering postgresql relational-databases

Build faster, more reliable Rails apps by taking the best advanced PostgreSQL and Active Record capabilities, and using them to solve your application scale and growth challenges. Gain the skills needed to comfortably work with multi-terabyte databases, and with complex Active Record, SQL, and specialized Indexes. Develop your skills with PostgreSQL on your laptop, then take them into production, while keeping everything in sync. Make slow queries fast, perform any schema or data migration without errors, use scaling techniques like read/write splitting, partitioning, and sharding, to meet demanding workload requirements from Internet scale consumer apps to enterprise SaaS. Deepen your firsthand knowledge of high-scale PostgreSQL databases and Ruby on Rails applications with dozens of practical and hands-on exercises. Unlock the mysteries surrounding complex Active Record. Make any schema or data migration change confidently, without downtime. Grow your experience with modern and exclusive PostgreSQL features like SQL Merge, Returning, and Exclusion constraints. Put advanced capabilities like Full Text Search and Publish Subscribe mechanisms built into PostgreSQL to work in your Rails apps. Improve the quality of the data in your database, using the advanced and extensible system of types and constraints to reduce and eliminate application bugs. Tackle complex topics like how to improve query performance using specialized indexes. Discover how to effectively use built-in database functions and write your own, administer replication, and make the most of partitioning and foreign data wrappers. Use more than 40 well-supported open source tools to extend and enhance PostgreSQL and Ruby on Rails. Gain invaluable insights into database administration by conducting advanced optimizations - including high-impact database maintenance - all while solving real-world operational challenges. Take your new skills into production today and then take your PostgreSQL and Rails applications to a whole new level of reliability and performance. What You Need: A computer running macOS, Linux, or Windows and WSL2 PostgreSQL version 16, installed by package manager, compiled, or running with Docker An Internet connection

The Complete Developer

2024-03-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Martin Krause

API GitHub JavaScript MongoDB NoSQL React TypeScript data data-engineering nosql-databases

Whether you’ve been in the developer kitchen for decades or are just taking the plunge to do it yourself, The Complete Developer will show you how to build and implement every component of a modern stack—from scratch. You’ll go from a React-driven frontend to a fully fleshed-out backend with Mongoose, MongoDB, and a complete set of REST and GraphQL APIs, and back again through the whole Next.js stack. The book’s easy-to-follow, step-by-step recipes will teach you how to build a web server with Express.js, create custom API routes, deploy applications via self-contained microservices, and add a reactive, component-based UI. You’ll leverage command line tools and full-stack frameworks to build an application whose no-effort user management rides on GitHub logins. You’ll also learn how to: Work with modern JavaScript syntax, TypeScript, and the Next.js framework Simplify UI development with the React library Extend your application with REST and GraphQL APIs Manage your data with the MongoDB NoSQL database Use OAuth to simplify user management, authentication, and authorization Automate testing with Jest, test-driven development, stubs, mocks, and fakes Whether you’re an experienced software engineer or new to DIY web development, The Complete Developer will teach you to succeed with the modern full stack. After all, control matters. Covers: Docker, Express.js, JavaScript, Jest, MongoDB, Mongoose, Next.js, Node.js, OAuth, React, REST and GraphQL APIs, and TypeScript

Red Hat OpenShift Container Platform for IBM zCX

2022-10-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ravi Kumar , Andy Armstrong , Vic Cross , Redelf Janssen , Pablo Paniagua , Lydia Parziale , Maike Havemann

IBM Linux data data-engineering

Application modernization is essential for continuous improvements to your business value. Modernizing your applications includes improvements to your software architecture, application infrastructure, development techniques, and business strategies. All of which allows you to gain increased business value from existing application code. IBM® z/OS® Container Extensions (IBM zCX) is a part of the IBM z/OS operating system. It makes it possible to run Linux on IBM Z® applications that are packaged as Docker container images on z/OS. Application developers can develop, and data centers can operate, popular open source packages, Linux applications, IBM software, and third-party software together with z/OS applications and data. This IBM Redbooks® publication presents the capabilities of IBM zCX along with several use cases that demonstrate Red Hat OpenShift Container Platform for IBM zCX and the application modernization benefits your business can realize.

Effective Data Science Infrastructure

2022-08-09 · O'Reilly Data Science Books O'Reilly Amazon

book

by Ville Tuulos

AI/ML Analytics AWS Cloud Computing Data Science MLOps NumPy Python data data-science

Simplify data science infrastructure to give data scientists an efficient path from prototype to production. In Effective Data Science Infrastructure you will learn how to: Design data science infrastructure that boosts productivity Handle compute and orchestration in the cloud Deploy machine learning to production Monitor and manage performance and results Combine cloud-based tools into a cohesive data science environment Develop reproducible data science projects using Metaflow, Conda, and Docker Architect complex applications for multiple teams and large datasets Customize and grow data science infrastructure Effective Data Science Infrastructure: How to make data scientists more productive is a hands-on guide to assembling infrastructure for data science and machine learning applications. It reveals the processes used at Netflix and other data-driven companies to manage their cutting edge data infrastructure. In it, you’ll master scalable techniques for data storage, computation, experiment tracking, and orchestration that are relevant to companies of all shapes and sizes. You’ll learn how you can make data scientists more productive with your existing cloud infrastructure, a stack of open source software, and idiomatic Python. The author is donating proceeds from this book to charities that support women and underrepresented groups in data science. About the Technology Growing data science projects from prototype to production requires reliable infrastructure. Using the powerful new techniques and tooling in this book, you can stand up an infrastructure stack that will scale with any organization, from startups to the largest enterprises. About the Book Effective Data Science Infrastructure teaches you to build data pipelines and project workflows that will supercharge data scientists and their projects. Based on state-of-the-art tools and concepts that power data operations of Netflix, this book introduces a customizable cloud-based approach to model development and MLOps that you can easily adapt to your company’s specific needs. As you roll out these practical processes, your teams will produce better and faster results when applying data science and machine learning to a wide array of business problems. What's Inside Handle compute and orchestration in the cloud Combine cloud-based tools into a cohesive data science environment Develop reproducible data science projects using Metaflow, AWS, and the Python data ecosystem Architect complex applications that require large datasets and models, and a team of data scientists About the Reader For infrastructure engineers and engineering-minded data scientists who are familiar with Python. About the Author At Netflix, Ville Tuulos designed and built Metaflow, a full-stack framework for data science. Currently, he is the CEO of a startup focusing on data science infrastructure. Quotes By reading and referring to this book, I’m confident you will learn how to make your machine learning operations much more efficient and productive. - From the Foreword by Travis Oliphant, Author of NumPy, Founder of Anaconda, PyData, and NumFOCUS Effective Data Science Infrastructure is a brilliant book. It’s a must-have for every data science team. - Ninoslav Cerkez, Logit More data science. Less headaches. - Dr. Abel Alejandro Coronado Iruegas, National Institute of Statistics and Geography of Mexico Indispensable. A copy should be on every data engineer’s bookshelf. - Matthew Copple, Grand River Analytics

Logging in Action

2022-04-10 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Phil Wilkins

Cloud Computing IoT JSON Kubernetes MongoDB data data-engineering elastic-stack-elk-stack elastic stack (elk stack) elasticsearch search

Make log processing a real asset to your organization with powerful and free open source tools. In Logging in Action you will learn how to: Deploy Fluentd and Fluent Bit into traditional on-premises, IoT, hybrid, cloud, and multi-cloud environments, both small and hyperscaled Configure Fluentd and Fluent Bit to solve common log management problems Use Fluentd within Kubernetes and Docker services Connect a custom log source or destination with Fluentd’s extensible plugin framework Logging best practices and common pitfalls Logging in Action is a guide to optimize and organize logging using the CNCF Fluentd and Fluent Bit projects. You’ll use the powerful log management tool Fluentd to solve common log management, and learn how proper log management can improve performance and make management of software and infrastructure solutions easier. Through useful examples like sending log-driven events to Slack, you’ll get hands-on experience applying structure to your unstructured data. About the Technology Don’t fly blind! An effective logging system can help you see and correct problems before they cripple your software. With the Fluentd log management tool, it’s a snap to monitor the behavior and health of your software and infrastructure in real time. Designed to collect and process log data from multiple sources using the industry-standard JSON format, Fluentd delivers a truly unified logging layer across all your systems. About the Book Logging in Action teaches you to record and analyze application and infrastructure data using Fluentd. Using clear, relevant examples, it shows you exactly how to transform raw system data into a unified stream of actionable information. You’ll discover how logging configuration impacts the way your system functions and set up Fluentd to handle data from legacy IT environments, local data centers, and massive Kubernetes-driven distributed systems. You’ll even learn how to implement complex log parsing with RegEx and output events to MongoDB and Slack. What's Inside Capture log events from a wide range of systems and software, including Kubernetes and Docker Connect to custom log sources and destinations Employ Fluentd’s extensible plugin framework Create a custom plugin for niche problems About the Reader For developers, architects, and operations professionals familiar with the basics of monitoring and logging. About the Author Phil Wilkins has spent over 30 years in the software industry. Has worked for small startups through to international brands. Quotes I highly recommend using Logging in Action as a getting-started guide, a refresher, or as a way to optimize your logging journey. - From the Foreword by Anurag Gupta, Fluent maintainer and Cofounder, Calyptia Covers everything you need if you want to implement a logging system using open source technology such as Fluentd and Kubernetes. - Alex Saez, Naranja X A great exploration of the features and capabilities of Fluentd, along with very useful hands-on exercises. - George Thomas, Manhattan Associates A practical holistic guide to integrating logging into your enterprise architecture. - Satej Sahu, Honeywell

Modern Data Engineering with Apache Spark: A Hands-On Guide for Building Mission-Critical Streaming Applications

2022-03-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Scott Haines (Databricks)

AI/ML Airflow Data Contracts Data Engineering Kafka Kubernetes MySQL Redis S3 Spark SQL Data Streaming +3 more

Leverage Apache Spark within a modern data engineering ecosystem. This hands-on guide will teach you how to write fully functional applications, follow industry best practices, and learn the rationale behind these decisions. With Apache Spark as the foundation, you will follow a step-by-step journey beginning with the basics of data ingestion, processing, and transformation, and ending up with an entire local data platform running Apache Spark, Apache Zeppelin, Apache Kafka, Redis, MySQL, Minio (S3), and Apache Airflow. Apache Spark applications solve a wide range of data problems from traditional data loading and processing to rich SQL-based analysis as well as complex machine learning workloads and even near real-time processing of streaming data. Spark fits well as a central foundation for any data engineering workload. This book will teach you to write interactive Spark applications using Apache Zeppelin notebooks, write and compilereusable applications and modules, and fully test both batch and streaming. You will also learn to containerize your applications using Docker and run and deploy your Spark applications using a variety of tools such as Apache Airflow, Docker and Kubernetes. Reading this book will empower you to take advantage of Apache Spark to optimize your data pipelines and teach you to craft modular and testable Spark applications. You will create and deploy mission-critical streaming spark applications in a low-stress environment that paves the way for your own path to production. What You Will Learn Simplify data transformation with Spark Pipelines and Spark SQL Bridge data engineering with machine learning Architect modular data pipeline applications Build reusable application components and libraries Containerize your Spark applications for consistency and reliability Use Docker and Kubernetes to deploy your Spark applications Speed up application experimentation using Apache Zeppelin and Docker Understand serializable structured data and data contracts Harness effective strategies for optimizing data in your data lakes Build end-to-end Spark structured streaming applications using Redis and Apache Kafka Embrace testing for your batch and streaming applications Deploy and monitor your Spark applications Who This Book Is For Professional software engineers who want to take their current skills and apply them to new and exciting opportunities within the data ecosystem, practicing data engineers who are looking for a guiding light while traversing the many challenges of moving from batch to streaming modes, data architects who wish to provide clear and concise direction for how best to harness anduse Apache Spark within their organization, and those interested in the ins and outs of becoming a modern data engineer in today's fast-paced and data-hungry world

Cassandra: The Definitive Guide, (Revised) Third Edition, 3rd Edition

2022-01-23 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Eben Hewitt , Jeff Carpenter

Cassandra Cloud Computing Data Modelling ELK Kafka Kubernetes Spark data data-engineering nosql-databases

Imagine what you could do if scalability wasn't a problem. With this hands-on guide, you'll learn how the Cassandra database management system handles hundreds of terabytes of data while remaining highly available across multiple data centers. This revised third edition--updated for Cassandra 4.0 and new developments in the Cassandra ecosystem, including deployments in Kubernetes with K8ssandra--provides technical details and practical examples to help you put this database to work in a production environment. Authors Jeff Carpenter and Eben Hewitt demonstrate the advantages of Cassandra's nonrelational design, with special attention to data modeling. Developers, DBAs, and application architects looking to solve a database scaling issue or future-proof an application will learn how to harness Cassandra's speed and flexibility. Understand Cassandra's distributed and decentralized structure Use the Cassandra Query Language (CQL) and cqlsh (the CQL shell) Create a working data model and compare it with an equivalent relational model Design and develop applications using client drivers Explore cluster topology and learn how nodes exchange data Maintain a high level of performance in your cluster Deploy Cassandra onsite, in the cloud, or with Docker and Kubernetes Integrate Cassandra with Spark, Kafka, Elasticsearch, Solr, and Lucene

Data Science at the Command Line, 2nd Edition

2021-08-17 · O'Reilly Data Science Books O'Reilly Amazon

book

by Jeroen Janssens

Agile/Scrum API CSV Data Science HTML JSON Linux Python Spark Unix XML data +1 more

This thoroughly revised guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You'll learn how to combine small yet powerful command-line tools to quickly obtain, scrub, explore, and model your data. To get you started, author Jeroen Janssens provides a Docker image packed with over 100 Unix power tools--useful whether you work with Windows, macOS, or Linux. You'll quickly discover why the command line is an agile, scalable, and extensible technology. Even if you're comfortable processing data with Python or R, you'll learn how to greatly improve your data science workflow by leveraging the command line's power. This book is ideal for data scientists, analysts, engineers, system administrators, and researchers. Obtain data from websites, APIs, databases, and spreadsheets Perform scrub operations on text, CSV, HTML, XML, and JSON files Explore data, compute descriptive statistics, and create visualizations Manage your data science workflow Create your own tools from one-liners and existing Python or R code Parallelize and distribute data-intensive pipelines Model data with dimensionality reduction, regression, and classification algorithms Leverage the command line from Python, Jupyter, R, RStudio, and Apache Spark

Applied Data Science Using PySpark: Learn the End-to-End Predictive Model-Building Cycle

2020-12-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ramcharan Kakarla , Sridhar Alla , Sundar Krishnan

AI/ML API Data Science PySpark apache-spark data data-engineering

Discover the capabilities of PySpark and its application in the realm of data science. This comprehensive guide with hand-picked examples of daily use cases will walk you through the end-to-end predictive model-building cycle with the latest techniques and tricks of the trade. Applied Data Science Using PySpark is divided unto six sections which walk you through the book. In section 1, you start with the basics of PySpark focusing on data manipulation. We make you comfortable with the language and then build upon it to introduce you to the mathematical functions available off the shelf. In section 2, you will dive into the art of variable selection where we demonstrate various selection techniques available in PySpark. In section 3, we take you on a journey through machine learning algorithms, implementations, and fine-tuning techniques. We will also talk about different validation metrics and how to use them for picking the best models. Sections 4 and 5 go through machine learning pipelines and various methods available to operationalize the model and serve it through Docker/an API. In the final section, you will cover reusable objects for easy experimentation and learn some tricks that can help you optimize your programs and machine learning pipelines. By the end of this book, you will have seen the flexibility and advantages of PySpark in data science applications. This book is recommended to those who want to unleash the power of parallel computing by simultaneously working with big datasets. What You Will Learn Build an end-to-end predictive model Implement multiple variable selection techniques Operationalize models Master multiple algorithms and implementations Who This Book is For Data scientists and machine learning and deep learning engineers who want to learn and use PySpark for real-time analysis of streamingdata.

MongoDB Topology Design: Scalability, Security, and Compliance on a Global Scale

2020-09-10 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Nicholas Cottrell

Cloud Computing DevOps GDPR/CCPA Kubernetes MongoDB Cyber Security data data-engineering nosql-databases

Create a world-class MongoDB cluster that is scalable, reliable, and secure. Comply with mission-critical regulatory regimes such as the European Union’s General Data Protection Regulation (GDPR). Whether you are thinking of migrating to MongoDB or need to meet legal requirements for an existing self-managed cluster, this book has you covered. It begins with the basics of replication and sharding, and quickly scales up to cover everything you need to know to control your data and keep it safe from unexpected data loss or downtime. This book covers best practices for stable MongoDB deployments. For example, a well-designed MongoDB cluster should have no single point of failure. The book covers common use cases when only one or two data centers are available. It goes into detail about creating geopolitical sharding configurations to cover the most stringent data protection regulation compliance. The book also covers different tools and approaches for automating and monitoring a cluster with Kubernetes, Docker, and popular cloud provider containers. What You Will Learn Get started with the basics of MongoDB clusters Protect and monitor a MongoDB deployment Deepen your expertise around replication and sharding Keep effective backups and plan ahead for disaster recovery Recognize and avoid problems that can occur in distributed databases Build optimal MongoDB deployments within hardware and data center limitations Who This Book Is For Solutions architects, DevOps architects and engineers, automation and cloud engineers, and database administrators who are new to MongoDB and distributed databases or who need to scale up simple deployments. This book is a complete guide to planning a deployment for optimal resilience, performance, and scaling, and covers all the details required to meet the new set of data protection regulations such as the GDPR. This book is particularly relevant for large global organizations such as financial and medical institutions, as well as government departments that need to control data in the whole stack and are prohibited from using managed cloud services.

Introducing Microsoft SQL Server 2019

2020-04-27 · O'Reilly SQL Books O'Reilly Amazon

book

by James Rowland-Jones , Mitchell Pearson , Arun Sirpal , Dave Noderer , Dustin Ryan , Kellyn Gorman , Buck Woody , Allan Hirt

Analytics Azure BI Big Data Cloud Computing Data Management Hadoop HDFS Kubernetes Microsoft NoSQL Power BI +4 more

Introducing Microsoft SQL Server 2019 is the must-have guide for database professionals eager to leverage the latest advancements in SQL Server 2019. This book covers the features and capabilities that make SQL Server 2019 a powerful tool for managing and analyzing data both on-premises and in the cloud. What this Book will help me do Understand the new features introduced in SQL Server 2019 and their practical applications. Confidently manage and analyze relational, NoSQL, and big data within SQL Server 2019. Implement containerization for SQL Server using Docker and Kubernetes. Migrate and integrate your databases effectively to use Power BI Report Server. Query data from Hadoop Distributed File System with Azure Data Studio. Author(s) The authors of 'Introducing Microsoft SQL Server 2019' are subject matter experts including Kellyn Gorman, Allan Hirt, and others. With years of professional experience in database management and SQL Server, they bring a wealth of practical insight and knowledge to the book. Their experience spans roles as administrators, architects, and educators in the field. Who is it for? This book is aimed at database professionals such as DBAs, architects, and big data engineers who are currently using earlier versions of SQL Server or other database platforms. It is particularly well-suited for professionals aiming to understand and implement SQL Server 2019's new features. Readers should have basic familiarity with SQL Server and RDBMS concepts. If you're looking to explore SQL Server 2019 to improve data management and analytics in your organization, this book is for you.

Mastering SQL Server 2017

2019-08-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Christian Cote , Milos Radivojevic , William Durkin , Dejan Sarka , Matija Lah

AI/ML Azure BI DWH ETL/ELT JSON Linux Microsoft Python SQL SQL Server SSIS +4 more

Leverage the power of SQL Server 2017 Integration Services to build data integration solutions with ease Key Features Work with temporal tables to access information stored in a table at any time Get familiar with the latest features in SQL Server 2017 Integration Services Program and extend your packages to enhance their functionality Book Description Microsoft SQL Server 2017 uses the power of R and Python for machine learning and containerization-based deployment on Windows and Linux. By learning how to use the features of SQL Server 2017 effectively, you can build scalable apps and easily perform data integration and transformation. You'll start by brushing up on the features of SQL Server 2017. This Learning Path will then demonstrate how you can use Query Store, columnstore indexes, and In-Memory OLTP in your apps. You'll also learn to integrate Python code in SQL Server and graph database implementations for development and testing. Next, you'll get up to speed with designing and building SQL Server Integration Services (SSIS) data warehouse packages using SQL server data tools. Toward the concluding chapters, you'll discover how to develop SSIS packages designed to maintain a data warehouse using the data flow and other control flow tasks. By the end of this Learning Path, you'll be equipped with the skills you need to design efficient, high-performance database applications with confidence. This Learning Path includes content from the following Packt books: SQL Server 2017 Developer's Guide by Milos Radivojevic, Dejan Sarka, et. al SQL Server 2017 Integration Services Cookbook by Christian Cote, Dejan Sarka, et. al What you will learn Use columnstore indexes to make storage and performance improvements Extend database design solutions using temporal tables Exchange JSON data between applications and SQL Server Migrate historical data to Microsoft Azure by using Stretch Database Design the architecture of a modern Extract, Transform, and Load (ETL) solution Implement ETL solutions using Integration Services for both on-premise and Azure data Who this book is for This Learning Path is for database developers and solution architects looking to develop ETL solutions with SSIS, and explore the new features in SSIS 2017. Advanced analysis practitioners, business intelligence developers, and database consultants dealing with performance tuning will also find this book useful. Basic understanding of database concepts and T-SQL is required to get the best out of this Learning Path.

Deploying a Database Instance in an IBM Cloud Private Cluster on IBM Z

2019-07-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Christian May

Cloud Computing IBM Kubernetes MariaDB Virtual Machine data data-engineering

This IBM® Redpaper™ publication shows you how to deploy a database instance within a container using an IBM Cloud™ Private cluster on IBM Z®. A preinstalled IBM Spectrum™ Scale 5.0.3 cluster file system provides back-end storage for the persistent volumes bound to the database. A container is a standard unit of software that packages code and all its dependencies, so the application runs quickly and reliably from one computing environment to another. By default, containers are ephemeral. However, stateful applications, such as databases, require some type of persistent storage that can survive service restarts or container crashes. IBM provides several products helping organizations build an environment on an IBM Z infrastructure to develop and manage containerized applications, including dynamic provisioning of persistent volumes. As an example for a stateful application, this paper describes how to deploy the relational database MariaDB using a Helm chart. The IBM Spectrum Scale V5.0.3 cluster file system is providing back-end storage for the persistent volumes. This document provides step-by-step guidance regarding how to install and configure the following components: IBM Cloud Private 3.1.2 (including Kubernetes) Docker 18.03.1-ce IBM Storage Enabler for Containers 2.0.0 and 2.1.0 This Redpaper demonstrates how we set up the example for a stateful application in our lab. The paper gives you insights about planning for your implementation. IBM Z server hardware, the IBM Z hypervisor z/VM®, and the IBM Spectrum Scale cluster file system are prerequisites to set up the example environment. The Redpaper is written with the assumption that you have familiarity with and basic knowledge of the software products used in setting up the environment. The intended audience includes the following roles: Storage administrators IT/Cloud administrators Technologists IT specialists

Data Science with Python and Dask

2019-07-18 · O'Reilly Data Science Books O'Reilly Amazon

book

by Jesse Daniel

AI/ML Analytics AWS Cloud Computing Data Science NumPy Pandas PySpark Python Scikit-learn Seaborn dask +3 more

Dask is a native parallel analytics tool designed to integrate seamlessly with the libraries you’re already using, including Pandas, NumPy, and Scikit-Learn. With Dask you can crunch and work with huge datasets, using the tools you already have. And Data Science with Python and Dask is your guide to using Dask for your data projects without changing the way you work! About the Technology An efficient data pipeline means everything for the success of a data science project. Dask is a flexible library for parallel computing in Python that makes it easy to build intuitive workflows for ingesting and analyzing large, distributed datasets. Dask provides dynamic task scheduling and parallel collections that extend the functionality of NumPy, Pandas, and Scikit-learn, enabling users to scale their code from a single laptop to a cluster of hundreds of machines with ease. About the Book Data Science with Python and Dask teaches you to build scalable projects that can handle massive datasets. After meeting the Dask framework, you’ll analyze data in the NYC Parking Ticket database and use DataFrames to streamline your process. Then, you’ll create machine learning models using Dask-ML, build interactive visualizations, and build clusters using AWS and Docker. What's Inside Working with large, structured and unstructured datasets Visualization with Seaborn and Datashader Implementing your own algorithms Building distributed apps with Dask Distributed Packaging and deploying Dask apps About the Reader For data scientists and developers with experience using Python and the PyData stack. About the Author Jesse Daniel is an experienced Python developer. He taught Python for Data Science at the University of Denver and leads a team of data scientists at a Denver-based media technology company. We interviewed Jesse as a part of our Six Questions series. Check it out here. Quotes The most comprehensive coverage of Dask to date, with real-world examples that made a difference in my daily work. - Al Krinker, United States Patent and Trademark Office An excellent alternative to PySpark for those who are not on a cloud platform. The author introduces Dask in a way that speaks directly to an analyst. - Jeremy Loscheider, Panera Bread A greatly paced introduction to Dask with real-world datasets. - George Thomas, R&D Architecture Manhattan Associates The ultimate resource to quickly get up and running with Dask and parallel processing in Python. - Gustavo Patino, Oakland University William Beaumont School of Medicine

Pro SQL Server on Linux: Including Container-Based Deployment with Docker and Kubernetes

2018-10-27 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Bob Ward (Azure Data)

Kubernetes Linux Microsoft Oracle Cyber Security SQL SQL Server data data-engineering microsoft-sql-server postgresql relational-databases

Get SQL Server up and running on the Linux operating system and containers. No database professional managing or developing SQL Server on Linux will want to be without this deep and authoritative guide by one of the most respected experts on SQL Server in the industry. Get an inside look at how SQL Server for Linux works through the eyes of an engineer on the team that made it possible. Microsoft SQL Server is one of the leading database platforms in the industry, and SQL Server 2017 offers developers and administrators the ability to run a database management system on Linux, offering proven support for enterprise-level features and without onerous licensing terms. Organizations invested in Microsoft and open source technologies are now able to run a unified database platform across all their operating system investments. Organizations are further able to take full advantage of containerization through popular platforms such as Docker and Kubernetes. Pro SQL Server on Linux walks you through installing and configuring SQL Server on the Linux platform. The author is one of the principal architects of SQL Server for Linux, and brings a corresponding depth of knowledge that no database professional or developer on Linux will want to be without. Throughout this book are internals of how SQL Server on Linux works including an in depth look at the innovative architecture. The book covers day-to-day management and troubleshooting, including diagnostics and monitoring, the use of containers to manage deployments, and the use of self-tuning and the in-memory capabilities. Also covered are performance capabilities, high availability, and disaster recovery along with security and encryption. The book covers the product-specific knowledge to bring SQL Server and its powerful features to life on the Linux platform, including coverage of containerization through Docker and Kubernetes. What You'll Learn Learn about the history and internal of the unique SQL Server on Linux architecture. Install and configure Microsoft’s flagship database product on the Linux platform Manage your deployments using container technology through Docker and Kubernetes Know the basics of building databases, the T-SQL language, and developing applications against SQL Server on Linux Use tools and features to diagnose, manage, and monitor SQL Server on Linux Scale your application by learning the performance capabilities of SQL Server Deliver high availability and disaster recovery to ensure business continuity Secure your database from attack, and protect sensitive data through encryption Take advantage of powerful features such as Failover Clusters, Availability Groups, In-Memory Support, and SQL Server’sSelf-Tuning Engine Learn how to migrate your database from older releases of SQL Server and other database platforms such as Oracle and PostgreSQL Build and maintain schemas, and perform management tasks from both GUI and command line Who This Book Is For Developers and IT professionals who are new to SQL Server and wish to configure it on the Linux operating system. This book is also useful to those familiar with SQL Server on Windows who want to learn the unique aspects of managing SQL Server on the Linux platform and Docker containers. Readers should have a grasp of relational database concepts and be comfortable with the SQL language.

talk-data.com

Activity Trend

Top Events

Top Speakers

Hands-On Software Engineering with Python - Second Edition

Getting Started with Taipy

PostgreSQL Skills Development on Cloud: A Practical Guide to Database Management with AWS and Azure

Applied Data Science Using PySpark: Learn the End-to-End Predictive Model-Building Cycle

Big Data on Kubernetes

High Performance PostgreSQL for Rails

The Complete Developer

Red Hat OpenShift Container Platform for IBM zCX

Effective Data Science Infrastructure

Logging in Action

Modern Data Engineering with Apache Spark: A Hands-On Guide for Building Mission-Critical Streaming Applications

Cassandra: The Definitive Guide, (Revised) Third Edition, 3rd Edition

Data Science at the Command Line, 2nd Edition

Applied Data Science Using PySpark: Learn the End-to-End Predictive Model-Building Cycle

MongoDB Topology Design: Scalability, Security, and Compliance on a Global Scale

Introducing Microsoft SQL Server 2019

Mastering SQL Server 2017

Deploying a Database Instance in an IBM Cloud Private Cluster on IBM Z

Data Science with Python and Dask

Pro SQL Server on Linux: Including Container-Based Deployment with Docker and Kubernetes