talk-data.com talk-data.com

Event

O'Reilly Data Engineering Books

2001-10-19 – 2027-05-25 Oreilly Visit website ↗

Activities tracked

3432

Collection of O'Reilly books on Data Engineering.

Sessions & talks

Showing 201–225 of 3432 · Newest first

Search within this event →
Elasticsearch in Action, Second Edition

Build powerful, production-ready search applications using the incredible features of Elasticsearch. In Elasticsearch in Action, Second Edition you will discover: Architecture, concepts, and fundamentals of Elasticsearch Installing, configuring, and running Elasticsearch and Kibana Creating an index with custom settings Data types, mapping fundamentals, and templates Fundamentals of text analysis and working with text analyzers Indexing, deleting, and updating documents Indexing data in bulk, and reindexing and aliasing operations Learning search concepts, relevancy scores, and similarity algorithms Elasticsearch in Action, Second Edition teaches you to build scalable search applications using Elasticsearch. This completely new edition explores Elasticsearch fundamentals from the ground up. You’ll deep dive into design principles, search architectures, and Elasticsearch’s essential APIs. Every chapter is clearly illustrated with diagrams and hands-on examples. You’ll even explore real-world use cases for full text search, data visualizations, and machine learning. Plus, its comprehensive nature means you’ll keep coming back to the book as a handy reference! About the Technology Create fully professional-grade search engines with Elasticsearch and Kibana! Rewritten for the latest version of Elasticsearch, this practical book explores Elasticsearch’s high-level architecture, reveals infrastructure patterns, and walks through the search and analytics capabilities of numerous Elasticsearch APIs. About the Book Elasticsearch in Action, Second Edition teaches you how to add modern search features to websites and applications using Elasticsearch 8. In it, you’ll quickly progress from the basics of installation and configuring clusters, to indexing documents, advanced aggregations, and putting your servers into production. You’ll especially appreciate the mix of technical detail with techniques for designing great search experiences. What's Inside Understanding search architecture Full text and term-level search queries Analytics and aggregations High-level visualizations in Kibana Configure, scale, and tune clusters About the Reader For application developers comfortable with scripting and command-line applications. About the Author Madhusudhan Konda is a full-stack lead engineer, architect, mentor, and conference speaker. He delivers live online training on Elasticsearch and the Elastic Stack. Quotes Madhu’s passion comes across in the depth and breadth of this book, the enthusiastic tone, and the hands-on examples. I hope you will take what you have read and put it ‘in action’. - From the Foreword by Shay Banon, Founder of Elasticsearch Practical and well-written. A great starting point for beginners and a comprehensive guide for more experienced professionals. - Simona Russo, Serendipity The author’s excitement is evident from the first few paragraphs. Couple that with extensive experience and technical prowess, and you have an instant classic. - Herodotos Koukkides and Semi Koen, Global Japanese Financial Institution

IBM Z Server Time Protocol Guide

Server Time Protocol (STP) is a server-wide facility that is implemented in the Licensed Internal Code (LIC) of the IBM Z® platform. It provides improved time synchronization in a sysplex or non-sysplex configuration. This IBM Redbooks® publication is intended for infrastructure architects and system programmers who need to understand the STP functions. Readers are expected to be familiar with IBM Z technology and terminology. This book provides planning and implementation information for STP functions and associated software support for the IBM z16™, IBM z15®, and IBM z14® platforms.

PostgreSQL 16 Administration Cookbook

This cookbook is a comprehensive guide to mastering PostgreSQL 16 database administration. With over 180 practical recipes, this book covers everything from query performance and backup strategies to replication and high availability. You'll gain hands-on expertise in solving real-world challenges while leveraging the new and improved features of PostgreSQL 16. What this Book will help me do Perform efficient batch processing with Postgres' SQL MERGE statement. Implement parallel transaction processes using logical replication. Enhance database backups and recovery with advanced compression techniques. Monitor and fine-tune database performance for optimal operation. Apply new PostgreSQL 16 features for secure and reliable databases. Author(s) The team of authors, including Gianni Ciolli, Boriss Mejías, Jimmy Angelakos, Vibhor Kumar, and Simon Riggs, bring years of experience in PostgreSQL database management and development. Their expertise spans professional system administration, academic research, and contributions to PostgreSQL development. Their collaborative insights enrich this comprehensive guide. Who is it for? This book is ideal for PostgreSQL database administrators seeking advanced techniques, data architects managing PostgreSQL in production, and developers interested in mastering PostgreSQL 16. Whether you're an experienced DBA upgrading to PostgreSQL 16 or a newcomer looking for practical recipes, this book provides valuable strategies and solutions.

Vector Search for Practitioners with Elastic

The book "Vector Search for Practitioners with Elastic" provides a comprehensive guide to leveraging vector search technology within Elastic for applications in NLP, cybersecurity, and observability. By exploring practical examples and advanced techniques, this book teaches you how to optimize and implement vector search to address complex challenges in modern data management. What this Book will help me do Gain a deep understanding of implementing vector search with Elastic. Learn techniques to optimize vector data storage and retrieval for practical applications. Understand how to apply vector search for image similarity in Elastic. Discover methods for utilizing vector search for security and observability enhancements. Develop skills to integrate modern NLP tools with vector databases and Elastic. Author(s) Bahaaldine Azarmi, with his extensive experience in Elastic and NLP technologies, brings a practitioner's insight into the world of vector search. Co-author None Vestal contributes expertise in observability and system optimization. Together, they deliver practical and actionable knowledge in a clear and approachable manner. Who is it for? This book is designed for data professionals seeking to deepen their expertise in vector search and Elastic technologies. It is ideal for individuals in observability, search technology, or cybersecurity roles. If you have foundational knowledge in machine learning models, Python, and Elastic, this book will enable you to effectively utilize vector search in your projects.

Data Exploration and Preparation with BigQuery

In "Data Exploration and Preparation with BigQuery," Michael Kahn provides a hands-on guide to understanding and utilizing Google's powerful data warehouse solution, BigQuery. This comprehensive book equips you with the skills needed to clean, transform, and analyze large datasets for actionable business insights. What this Book will help me do Master the process of exploring and assessing the quality of datasets. Learn SQL for performing efficient and advanced data transformations in BigQuery. Optimize the performance of BigQuery queries for speed and cost-effectiveness. Discover best practices for setting up and managing BigQuery resources. Apply real-world case studies to analyze data and derive meaningful insights. Author(s) Michael Kahn is an experienced data engineer and author specializing in big data solutions and technologies. With years of hands-on experience working with Google Cloud Platform and BigQuery, he has assisted organizations in optimizing their data pipelines for effective decision-making. His accessible writing style ensures complex topics become approachable, enabling readers of various skill levels to succeed. Who is it for? This book is tailored for data analysts, data engineers, and data scientists who want to learn how to effectively use BigQuery for data exploration and preparation. Whether you're new to BigQuery or looking to deepen your expertise in working with large datasets, this book provides clear guidance and practical examples to achieve your goals.

Kafka Troubleshooting in Production: Stabilizing Kafka Clusters in the Cloud and On-premises

This book provides Kafka administrators, site reliability engineers, and DataOps and DevOps practitioners with a list of real production issues that can occur in Kafka clusters and how to solve them. The production issues covered are assembled into a comprehensive troubleshooting guide for those engineers who are responsible for the stability and performance of Kafka clusters in production, whether those clusters are deployed in the cloud or on-premises. This book teaches you how to detect and troubleshoot the issues, and eventually how to prevent them. Kafka stability is hard to achieve, especially in high throughput environments, and the purpose of this book is not only to make troubleshooting easier, but also to prevent production issues from occurring in the first place. The guidance in this book is drawn from the author's years of experience in helping clients and internal customers diagnose and resolve knotty production problems and stabilize their Kafka environments. The book is organized into recipe-style troubleshooting checklists that field engineers can easily follow when under pressure to fix an unstable cluster. This is the book you will want by your side when the stakes are high, and your job is on the line. What You Will Learn Monitor and resolve production issues in your Kafka clusters Provision Kafka clusters with the lowest costs and still handle the required loads Perform root cause analyses of issues affecting your Kafka clusters Know the ways in which your Kafka cluster can affect its consumers and producers Prevent or minimize data loss and delays in data streaming Forestall production issues through an understanding of common failure points Create checklists for troubleshooting your Kafka clusters when problems occur Who This Book Is For Site reliability engineers tasked with maintaining stability of Kafka clusters, Kafka administrators who troubleshoot production issues around Kafka, DevOps and DataOps experts who are involved with provisioning Kafka (whether on-premises or in the cloud), developers of Kafka consumers and producers who wish to learn more about Kafka

MySQL Crash Course, 2nd Edition

MySQL is one of the most popular database management systems available, powering everything from Internet powerhouses to individual corporate databases to simple end-user applications, and everything in between. This book will teach you all you need to know to be immediately productive with the latest version of MySQL. By working through 30 highly focused hands-on lessons, your MySQL Crash Course will be both easier and more effective than youd have thought possible. Learn How To Retrieve and Sort Data Filter Data Using Comparisons, Regular Expressions, Full Text Search, and Much More Join Relational Data Create and Alter Tables Insert, Update, and Delete Data Leverage the Power of Stored Procedures and Triggers Use Views and Cursors Manage Transactional Processing Create User Accounts and Manage Security via Access Control ...

SAP S/4HANA Asset Management: Configure, Equip, and Manage your Enterprise

S/4HANA empowers enterprises to take big steps towards digitalization, innovation, and being mobile-friendly. This book is a concise guide to SAP S/4HANA Asset Management and will help you begin leveraging the platform’s capabilities quickly and efficiently. SAP S/4HANA Asset Management begins with an overview of the platform and its structure. You will learn how it can help with data storage and analysis, business processes, and reporting and analytics. As the book progresses, you will gain insight into single, time-based, performance-based, and multiple counter-based strategy plans. Any project is incomplete without a budget, and this book will help you understand how to use SAP S/4HANA to create and manage yours. The book’s real-life examples of asset management from contemporary industries reinforce each concept you learn, and its coverage of newer technologies and offerings in S/4HANA Asset Management will give you a sense of the immense potential offered by the platform. When you have finished this book, you will be ready to begin using SAP/S4HANA Asset Management to improve operational planning, maintenance, and scheduling activities in your own business. What You Will Learn Position S/4HANA Asset Management within the overall Business Applications suite Explore essential functionalities for enterprise asset hierarchy mapping Efficiently map both unplanned and planned maintenance activities Seamlessly integrate asset management, finance, controlling, and budgeting Unleash reporting and analytics in Asset Management Configure Asset Management to meet your S/4HANA requirements Who This Book Is For Consultants, project managers, and SAP users who are looking for a complete reference guide on S/4HANA Asset Management.

Cracking the Data Engineering Interview

"Cracking the Data Engineering Interview" is your essential guide to mastering the data engineering interview process. This book offers practical insights and techniques to build your resume, refine your skills in Python, SQL, data modeling, and ETL, and confidently tackle over 100 mock interview questions. Gain the knowledge and confidence to land your dream role in data engineering. What this Book will help me do Craft a compelling data engineering portfolio to stand out to employers. Refresh and deepen understanding of essential topics like Python, SQL, and ETL. Master over 100 interview questions that cover both technical and behavioral aspects. Understand data engineering concepts such as data modeling, security, and CI/CD. Develop negotiation, networking, and personal branding skills crucial for job applications. Author(s) None Bryan and None Ransome are seasoned authors with a wealth of experience in data engineering and professional development. Drawing from their extensive industry backgrounds, they provide actionable strategies for aspiring data engineers. Their approachable writing style and real-world insights make complex topics accessible to readers. Who is it for? This book is ideal for aspiring data engineers looking to navigate the job application process effectively. Readers should be familiar with data engineering fundamentals, including Python, SQL, cloud data platforms, and ETL processes. It's tailored for professionals aiming to enhance their portfolios, tackle challenging interviews, and boost their chances of landing a data engineering role.

IBM TS7700 R5 DS8000 Object Store User's Guide

The IBM® TS7700 features a functional enhancement that allows for the TS7700 to act as an object store for transparent cloud tiering with IBM DS8000®, DFSMShsm (HSM), and native DFSMSdss (DSS). This function can be used to move data sets directly from DS8000 to TS7700. This IBM Redpaper publication provides a functional overview of the features, provides client value information, and walks through DFSMS, DS8000, and TS7700 set up steps.

Data Engineering with AWS - Second Edition

Learn data engineering and modern data pipeline design with AWS in this comprehensive guide! You will explore key AWS services like S3, Glue, Redshift, and QuickSight to ingest, transform, and analyze data, and you'll gain hands-on experience creating robust, scalable solutions. What this Book will help me do Understand and implement data ingestion and transformation processes using AWS tools. Optimize data for analytics with advanced AWS-powered workflows. Build end-to-end modern data pipelines leveraging cutting-edge AWS technologies. Design data governance strategies using AWS services for security and compliance. Visualize data and extract insights using Amazon QuickSight and other tools. Author(s) Gareth Eagar is a Senior Data Architect with over 25 years of experience in designing and implementing data solutions across various industries. He combines his deep technical expertise with a passion for teaching, aiming to make complex concepts approachable for learners at all levels. Who is it for? This book is intended for current or aspiring data engineers, data architects, and analysts seeking to leverage AWS for data engineering. It suits beginners with a basic understanding of data concepts who want to gain practical experience as well as intermediate professionals aiming to expand into AWS-based systems.

Learn PostgreSQL - Second Edition

Learn PostgreSQL, a comprehensive guide to mastering PostgreSQL 16, takes readers on a journey from the fundamentals to advanced concepts, such as replication and database optimization. With hands-on exercises and practical examples, this book provides all you need to confidently use, manage, and build secure and scalable databases. What this Book will help me do Master the essentials of PostgreSQL 16, including advanced SQL features and performance tuning. Understand database replication methods and manage a scalable architecture. Enhance database security through roles, schemas, and strict privilege management. Learn how to personalize your experience with custom extensions and functions. Acquire practical skills in backup, restoration, and disaster recovery planning. Author(s) Luca Ferrari and Enrico Pirozzi are experienced database engineers and PostgreSQL enthusiasts with years of experience using and teaching PostgreSQL technology. They specialize in creating learning content that is practical and focused on real-world situations. Their writing emphasizes clarity and systematically equips readers with professional skills. Who is it for? This book is perfect for database professionals, software developers, and system administrators looking to develop their PostgreSQL expertise. Beginners with an interest in databases will also find this book highly approachable. Ideal for readers seeking to improve their database scalability and robustness. If you aim to hone practical PostgreSQL skills, this guide is essential.

Procedural Programming with PostgreSQL PL/pgSQL: Design Complex Database-Centric Applications with PL/pgSQL

Learn the fundamentals of PL/PGSQL, the programming language of PostgreSQL which is most robust Open Source Relational Database. This book provides practical insights into developing database code objects such as functions and procedures, with a focus on effectively handling strings, numbers, and arrays to achieve desired outcomes, and transaction management. The unique approach to handling Triggers in PostgreSQL ensures that both functionality and performance are maintained without compromise. You'll gain proficiency in writing inline/anonymous server-side code within the limitations, along with learning essential debugging and profiling techniques. Additionally, the book delves into statistical analysis of PL/PGSQL code and offers valuable knowledge on managing exceptions while writing code blocks. Finally, you'll explore the installation and configuration of extensions to enhance the performance of stored procedures and functions. What You'll Learn Understand the PL/PGSQL concepts Learn to debug, profile, and optimize PL/PGSQL code Study linting PL/PGSQL code Review transaction management within PL/PGSQL code Work with developer friendly features like operators, casts, and aggregators Who Is This Book For App developers, database migration consultants, and database administrators.

Designing a Modern Application Data Stack

Today's massive datasets represent an unprecedented opportunity for organizations to build data-intensive applications. With this report, product leads, architects, and others who deal with applications and application development will explore why a cloud data platform is a great fit for data-intensive applications. You'll learn how to carefully consider scalability, data processing, and application distribution when making data app design decisions. Cloud data platforms are the modern infrastructure choice for data applications, as they offer improved scalability, elasticity, and cost efficiency. With a better understanding of data-intensive application architectures on cloud-based data platforms and the best practices outlined in this report, application teams can take full advantage of advances in data processing and app distribution to accelerate development, deployment, and adoption cycles. With this insightful report, you will: Learn why a modern cloud data platform is essential for building data-intensive applications Explore how scalability, data processing, and distribution models are key for today's data apps Implement best practices to improve application scalability and simplify data processing for efficiency gains Modernize application distribution plans to meet the needs of app providers and consumers About the authors: Adam Morton works with Intelligen Group, a Snowflake pure-play data and analytics consultancy. Kevin McGinley is technical director of the Snowflake customer acceleration team. Brad Culberson is a data platform architect specializing in data applications at Snowflake.

Cyber Resiliency with IBM Storage Sentinel and IBM Storage Safeguarded Copy

IBM Storage Sentinel is a cyber resiliency solution for SAP HANA, Oracle, and Epic healthcare systems, designed to help organizations enhance ransomware detection and incident recovery. IBM Storage Sentinel automates the creation of immutable backup copies of your data, then uses machine learning to detect signs of possible corruption and generate forensic reports that help you quickly diagnose and identify the source of the attack. Because IBM Storage Sentinel can intelligently isolate infected backups, your organization can identify the most recent verified and validated backup copies, greatly accelerating your time to recovery. This IBM Redbooks publication explains how to implement a cyber resiliency solution for SAP HANA, Oracle, and Epic healthcare systems using IBM Storage Sentinel and IBM Storage Safeguarded Copy. Target audience of this document is cyber security and storage specialists.

Delta Lake: Up and Running

With the surge in big data and AI, organizations can rapidly create data products. However, the effectiveness of their analytics and machine learning models depends on the data's quality. Delta Lake's open source format offers a robust lakehouse framework over platforms like Amazon S3, ADLS, and GCS. This practical book shows data engineers, data scientists, and data analysts how to get Delta Lake and its features up and running. The ultimate goal of building data pipelines and applications is to gain insights from data. You'll understand how your storage solution choice determines the robustness and performance of the data pipeline, from raw data to insights. You'll learn how to: Use modern data management and data engineering techniques Understand how ACID transactions bring reliability to data lakes at scale Run streaming and batch jobs against your data lake concurrently Execute update, delete, and merge commands against your data lake Use time travel to roll back and examine previous data versions Build a streaming data quality pipeline following the medallion architecture

IBM Storage Virtualize, IBM Storage FlashSystem, and IBM SAN Volume Controller Security Feature Checklist - For IBM Storage Virtualize 8.5.3

IBM® Storage Virtualize based storage systems are secure storage platforms that implement various security-related features, in terms of system-level access controls and data-level security features. This document outlines the available security features and options of IBM Storage Virtualize based storage systems. It is not intended as a "how to" or best practice document. Instead, it is a checklist of features that can be reviewed by a user security team to aid in the definition of a policy to be followed when implementing IBM FlashSystem®, IBM SAN Volume Controller, and IBM Storage Virtualize for Public Cloud. IBM Storage Virtualize features the following levels of security to protect against threats and to keep the attack surface as small as possible: The first line of defense is to offer strict verification features that stop unauthorized users from using login interfaces and gaining access to the system and its configuration. The second line of defense is to offer least privilege features that restrict the environment and limit any effect if a malicious actor does access the system configuration. The third line of defense is to run in a minimal, locked down, mode to prevent damage spreading to the kernel and rest of the operating system. The fourth line of defense is to protect the data at rest that is stored on the system from theft, loss, or corruption (malicious or accidental). The topics that are discussed in this paper can be broadly split into two categories: System security: This type of security encompasses the first three lines of defense that prevent unauthorized access to the system, protect the logical configuration of the storage system, and restrict what actions users can perform. It also ensures visibility and reporting of system level events that can be used by a Security Information and Event Management (SIEM) solution, such as IBM QRadar®. Data security: This type of security encompasses the fourth line of defense. It protects the data that is stored on the system against theft, loss, or attack. These data security features include Encryption of Data At Rest (EDAR) or IBM Safeguarded Copy (SGC). This document is correct as of IBM Storage Virtualize 8.5.3.

Amazon Redshift: The Definitive Guide

Amazon Redshift powers analytic cloud data warehouses worldwide, from startups to some of the largest enterprise data warehouses available today. This practical guide thoroughly examines this managed service and demonstrates how you can use it to extract value from your data immediately, rather than go through the heavy lifting required to run a typical data warehouse. Analytic specialists Rajesh Francis, Rajiv Gupta, and Milind Oke detail Amazon Redshift's underlying mechanisms and options to help you explore out-of-the box automation. Whether you're a data engineer who wants to learn the art of the possible or a DBA looking to take advantage of machine learning-based auto-tuning, this book helps you get the most value from Amazon Redshift. By understanding Amazon Redshift features, you'll achieve excellent analytic performance at the best price, with the least effort. This book helps you: Build a cloud data strategy around Amazon Redshift as foundational data warehouse Get started with Amazon Redshift with simple-to-use data models and design best practices Understand how and when to use Redshift Serverless and Redshift provisioned clusters Take advantage of auto-tuning options inherent in Amazon Redshift and understand manual tuning options Transform your data platform for predictive analytics using Redshift ML and break silos using data sharing Learn best practices for security, monitoring, resilience, and disaster recovery Leverage Amazon Redshift integration with other AWS services to unlock additional value

Geospatial Analysis with SQL

"Geospatial Analysis with SQL" is a practical guide that teaches you how to use SQL for geospatial data analysis. With direct, actionable guidance, you will learn to explore and analyze data using geospatial techniques without needing additional programming. This book equips you with the knowledge to solve location-based queries and perform advanced geospatial operations. What this Book will help me do Master the fundamentals of geospatial analysis and learn the importance of location-based data. Develop skills in creating and manipulating spatial database objects in SQL. Gain proficiency in using tools such as PostGIS and QGIS for geospatial data analysis. Learn techniques to visualize spatial data effectively and communicate results. Perform both single-layer and multi-layer spatial analysis for complex real-world scenarios. Author(s) Bonny P. McClain, the author of "Geospatial Analysis with SQL", brings extensive experience as a spatial data analyst and GIS expert. Bonny specializes in helping practitioners make data-driven insights through geospatial techniques. With a passion for teaching, Bonny's goal is to make complex concepts accessible and practical for analysts and developers alike. Who is it for? This book is ideal for GIS analysts, data analysts, and data scientists who have a basic understanding of SQL and geospatial concepts and want to expand their analytical capabilities. Readers looking to perform professional-grade geospatial analysis using SQL will find this book especially valuable. It caters to professionals wishing to use their SQL skills to understand and work with spatial datasets effectively.

IBM SAN Volume Controller Best Practices and Performance Guidelines

This IBM® Redbooks® publication describes several of the preferred practices and describes the performance gains that can be achieved by implementing the IBM SAN Volume Controller powered by IBM Spectrum® Virtualize V8.4. These practices are based on field experience. This book highlights configuration guidelines and preferred practices for the storage area network (SAN) topology, clustered system, back-end storage, storage pools, and managed disks, volumes, Remote Copy services, and hosts. Then, it provides performance guidelines for IBM SAN Volume Controller, back-end storage, and applications. It explains how you can optimize disk performance with the IBM System Storage Easy Tier® function. It also provides preferred practices for monitoring, maintaining, and troubleshooting IBM SAN Volume Controller. This book is intended for experienced storage, SAN, and IBM SAN Volume Controller administrators and technicians. Understanding this book requires advanced knowledge of the IBM SAN Volume Controller, IBM FlashSystem, and SAN environments.

IBM SAN Volume Controller Best Practices and Performance Guidelines for IBM Spectrum Virtualize Version 8.4.2

This IBM® Redbooks® publication captures several of the preferred practices and describes the performance gains that can be achieved by implementing the IBM SAN Volume Controller powered by IBM Spectrum® Virtualize Version 8.4.2. These practices are based on field experience. This book highlights configuration guidelines and preferred practices for the storage area network (SAN) topology, clustered system, back-end storage, storage pools and managed disks, volumes, Remote Copy services and hosts. It explains how you can optimize disk performance with the IBM System Storage Easy Tier® function. It also provides preferred practices for monitoring, maintaining, and troubleshooting. This book is intended for experienced storage, SAN, IBM FlashSystem®, IBM SAN Volume Controller, and IBM Storwize® administrators and technicians. Understanding this book requires advanced knowledge of these environments.

Practical Implementation of a Data Lake: Translating Customer Expectations into Tangible Technical Goals

This book explains how to implement a data lake strategy, covering the technical and business challenges architects commonly face. It also illustrates how and why client requirements should drive architectural decisions. Drawing upon a specific case from his own experience, author Nayanjyoti Paul begins with the consideration from which all subsequent decisions should flow: what does your customer need? He also describes the importance of identifying key stakeholders and the key points to focus on when starting a new project. Next, he takes you through the business and technical requirement-gathering process, and how to translate customer expectations into tangible technical goals. From there, you’ll gain insight into the security model that will allow you to establish security and legal guardrails, as well as different aspects of security from the end user’s perspective. You’ll learn which organizational roles need to be onboarded into the data lake, their responsibilities, the services they need access to, and how the hierarchy of escalations should work. Subsequent chapters explore how to divide your data lakes into zones, organize data for security and access, manage data sensitivity, and techniques used for data obfuscation. Audit and logging capabilities in the data lake are also covered before a deep dive into designing data lakes to handle multiple kinds and file formats and access patterns. The book concludes by focusing on production operationalization and solutions to implement a production setup. After completing this book, you will understand how to implement a data lake, the best practices to employ while doing so, and will be armed with practical tips to solve business problems. What You Will Learn Understand the challenges associated with implementing a data lake Explore the architectural patterns and processes used to design a new data lake Design and implement data lake capabilities Associate business requirements with technical deliverables to drive success Who This Book Is For Data Scientists and Architects, Machine Learning Engineers, and Software Engineers.

Data Engineering and Data Science

DATA ENGINEERING and DATA SCIENCE Written and edited by one of the most prolific and well-known experts in the field and his team, this exciting new volume is the “one-stop shop” for the concepts and applications of data science and engineering for data scientists across many industries. The field of data science is incredibly broad, encompassing everything from cleaning data to deploying predictive models. However, it is rare for any single data scientist to be working across the spectrum day to day. Data scientists usually focus on a few areas and are complemented by a team of other scientists and analysts. Data engineering is also a broad field, but any individual data engineer doesn’t need to know the whole spectrum of skills. Data engineering is the aspect of data science that focuses on practical applications of data collection and analysis. For all the work that data scientists do to answer questions using large sets of information, there have to be mechanisms for collecting and validating that information. In this exciting new volume, the team of editors and contributors sketch the broad outlines of data engineering, then walk through more specific descriptions that illustrate specific data engineering roles. Data-driven discovery is revolutionizing the modeling, prediction, and control of complex systems. This book brings together machine learning, engineering mathematics, and mathematical physics to integrate modeling and control of dynamical systems with modern methods in data science. It highlights many of the recent advances in scientific computing that enable data-driven methods to be applied to a diverse range of complex systems, such as turbulence, the brain, climate, epidemiology, finance, robotics, and autonomy. Whether for the veteran engineer or scientist working in the field or laboratory, or the student or academic, this is a must-have for any library.

Database-Driven Web Development: Learn to Operate at a Professional Level with PERL and MySQL

This book will teach you the essential knowledge required to be a successful and productive web developer with the ability to produce cutting-edge websites utilizing a database. This updated edition starts with the fundamentals of web development before delving into Perl and MySQL concepts such as script and database modelling, script-driven database interactions, content generation from a database, and information delivery from the server to the browser and vice versa. The only skills required to get the most from this book are basic knowledge of how the Internet works and a novice skill level with Perl and MySQL. The rest is intuitively presented code that most people can quickly and easily understand and employ. An extensive selection of practical, fully functional programming constructs in six different programming languages will give you the knowledge and tools required to create eye-catching, capable, and functionally impressive database-driven websites. Author Thomas Valentine has taken the concepts presented in the first edition of this book to new heights, offering in-depth discussions of each area of functionality required to develop fully formed database-driven web applications. He has expanded on the examples presented in the first edition and has included some very interesting and useful programming techniques for your consideration. Upon completing this book, you’ll have gained the benefit of the author’s decades worth of experience and will be able to apply your new knowledge and skills to your own projects. What You Will Learn Install, configure and use a trio of software packages (Apache Web Server, MySQL Database Server, and Perl Scripting Server) Create an effective web development workstation with databases in mind Use the PERL scripting language and MySQL databases effectively Maximize the Apache Web Server Who This Book Is For Those who already know web development basics and web developers who want to master database-driven web development. The skills required to understand the concepts put forth in this book are a working knowledge of PERL and basic MySQL.