O'Reilly Data Engineering Books

In-Memory Analytics with Apache Arrow - Second Edition

2024-09-30 O'Reilly Amazon

book

Matthew Topol

data data-engineering apache-arrow Analytics Arrow Dremio

Dive into efficient data handling with 'In-Memory Analytics with Apache Arrow.' This book explores Apache Arrow, a powerful open-source project that revolutionizes how tabular and hierarchical data are processed. You'll learn to streamline data pipelines, accelerate analysis, and utilize high-performance tools for data exchange. What this Book will help me do Understand and utilize the Apache Arrow in-memory data format for your data analysis needs. Implement efficient and high-speed data pipelines using Arrow subprojects like Flight SQL and Acero. Enhance integration and performance in analysis workflows by using tools like Parquet and Snowflake with Arrow. Master chaining and reusing computations across languages and environments with Arrow's cross-language support. Apply in real-world scenarios by integrating Apache Arrow with analytics systems like Dremio and DuckDB. Author(s) Matthew Topol, the author of this book, brings 15 years of technical expertise in the realm of data processing and analysis. Having worked across various environments and languages, Matthew offers insights into optimizing workflows using Apache Arrow. His approachable writing style ensures that complex topics are comprehensible. Who is it for? This book is tailored for developers, data engineers, and data scientists eager to enhance their analytic toolset. Whether you're a beginner or have experience in data analysis, you'll find the concepts actionable and transformative. If you are curious about improving the performance and capabilities of your analytic pipelines or tools, this book is for you.

Big Data on Kubernetes

2024-07-19 O'Reilly Amazon

book

Neylson Crepalde

data data-engineering streaming-messaging Kafka Airflow BI

Big Data on Kubernetes is your comprehensive guide to leveraging Kubernetes for scalable and efficient big data solutions. You will learn key concepts of Kubernetes architecture and explore tools like Apache Spark, Airflow, and Kafka. Gain hands-on experience building complete data pipelines to tackle real-world data challenges. What this Book will help me do Understand Kubernetes architecture and learn to deploy and manage clusters. Build and orchestrate big data pipelines using Spark, Airflow, and Kafka. Develop scalable and resilient data solutions with Docker and Kubernetes. Integrate and optimize data tools for real-time ingestion and processing. Apply concepts to hands-on projects addressing actual big data scenarios. Author(s) Neylson Crepalde is an experienced data specialist with extensive knowledge of Kubernetes and big data solutions. With deep practical experience, Neylson brings real-world insights to his writing. His approach emphasizes actionable guidance and relatable problem-solving with a strong foundation in scalable architecture. Who is it for? This book is ideal for data engineers, BI analysts, data team leaders, and tech managers familiar with Python, SQL, and YAML. Targeted at professionals seeking to develop or expand their expertise in scalable big data solutions, it provides practical insights into Docker, Kubernetes, and prominent big data tools.

Information Modeling and Relational Databases, 3rd Edition

2024-07-09 O'Reilly Amazon

book

Tony Morgan , Terry Halpin

data data-engineering relational-databases NoSQL RDBMS SQL

Information Modeling and Relational Databases, Third Edition, provides an introduction to ORM (Object-Role Modeling) and much more. In fact, it is the only book to go beyond introductory coverage and provide all of the in-depth instruction you need to transform knowledge from domain experts into a sound database design. This book is intended for anyone with a stake in the accuracy and efficacy of databases: systems analysts, information modelers, database designers and administrators, and programmers. Dr. Terry Halpin and Dr. Tony Morgan, pioneers in the development of ORM, blend conceptual information with practical instruction that will let you begin using ORM effectively as soon as possible. The all-new Third Edition includes coverage of advances and improvements in ORM and UML, nominalization, relational mapping, SQL, XML, data interchange, NoSQL databases, ontological modeling, and post-relational databases. Supported by examples, exercises, and useful background information, the authors’ step-by-step approach teaches you to develop a natural-language-based ORM model, and then, where needed, abstract ER and UML models from it. This book will quickly make you proficient in the modeling technique that is proving vital to the development of accurate and efficient databases that best meet real business objectives. "This book is an excellent introduction to both information modeling in ORM and relational databases. The book is very clearly written in a step-by-step manner and contains an abundance of well-chosen examples illuminating practice and theory in information modeling. I strongly recommend this book to anyone interested in conceptual modeling and databases." — Dr. Herman Balsters, Director of the Faculty of Industrial Engineering, University of Groningen, The Netherlands Presents the most in-depth coverage of object-role modeling, including a thorough update of the book for the latest versions of ORM, ER, UML, OWL, and BPMN modeling. Includes clear coverage of relational database concepts as well as the latest developments in SQL, XML, information modeling, data exchange, and schema transformation. Case studies and a large number of class-tested exercises are provided for many topics. Includes all-new chapters on data file formats and NoSQL databases.

High Performance PostgreSQL for Rails

2024-06-17 O'Reilly Amazon

book

Andrew Atkinson

data data-engineering relational-databases postgresql Docker Linux

Build faster, more reliable Rails apps by taking the best advanced PostgreSQL and Active Record capabilities, and using them to solve your application scale and growth challenges. Gain the skills needed to comfortably work with multi-terabyte databases, and with complex Active Record, SQL, and specialized Indexes. Develop your skills with PostgreSQL on your laptop, then take them into production, while keeping everything in sync. Make slow queries fast, perform any schema or data migration without errors, use scaling techniques like read/write splitting, partitioning, and sharding, to meet demanding workload requirements from Internet scale consumer apps to enterprise SaaS. Deepen your firsthand knowledge of high-scale PostgreSQL databases and Ruby on Rails applications with dozens of practical and hands-on exercises. Unlock the mysteries surrounding complex Active Record. Make any schema or data migration change confidently, without downtime. Grow your experience with modern and exclusive PostgreSQL features like SQL Merge, Returning, and Exclusion constraints. Put advanced capabilities like Full Text Search and Publish Subscribe mechanisms built into PostgreSQL to work in your Rails apps. Improve the quality of the data in your database, using the advanced and extensible system of types and constraints to reduce and eliminate application bugs. Tackle complex topics like how to improve query performance using specialized indexes. Discover how to effectively use built-in database functions and write your own, administer replication, and make the most of partitioning and foreign data wrappers. Use more than 40 well-supported open source tools to extend and enhance PostgreSQL and Ruby on Rails. Gain invaluable insights into database administration by conducting advanced optimizations - including high-impact database maintenance - all while solving real-world operational challenges. Take your new skills into production today and then take your PostgreSQL and Rails applications to a whole new level of reliability and performance. What You Need: A computer running macOS, Linux, or Windows and WSL2 PostgreSQL version 16, installed by package manager, compiled, or running with Docker An Internet connection

Databricks Certified Associate Developer for Apache Spark Using Python

2024-06-14 O'Reilly Amazon

book

Saba Shah

data data-engineering apache-spark Analytics API Big Data

This book serves as the ultimate preparation for aspiring Databricks Certified Associate Developers specializing in Apache Spark. Deep dive into Spark's components, its applications, and exam techniques to achieve certification and expand your practical skills in big data processing and real-time analytics using Python. What this Book will help me do Deeply understand Apache Spark's core architecture for building big data applications. Write optimized SQL queries and leverage Spark DataFrame API for efficient data manipulation. Apply advanced Spark functions, including UDFs, to solve complex data engineering tasks. Use Spark Streaming capabilities to implement real-time and near-real-time processing solutions. Get hands-on preparation for the certification exam with mock tests and practice questions. Author(s) Saba Shah is a seasoned data engineer with extensive experience working at Databricks and leading data science teams. With her in-depth knowledge of big data applications and Spark, she delivers clear, actionable insights in this book. Her approach emphasizes practical learning and real-world applications. Who is it for? This book is ideal for data professionals such as engineers and analysts aiming to achieve Databricks certification. It is particularly helpful for individuals with moderate Python proficiency who are keen to understand Spark from scratch. If you're transitioning into big data roles, this guide prepares you comprehensively.

Data Engineering with Databricks Cookbook

2024-05-31 O'Reilly Amazon

book

Pulkit Chadha

data data-engineering Big Data Cloud Computing Data Engineering Data Governance

In "Data Engineering with Databricks Cookbook," you'll learn how to efficiently build and manage data pipelines using Apache Spark, Delta Lake, and Databricks. This recipe-based guide offers techniques to transform, optimize, and orchestrate your data workflows. What this Book will help me do Master Apache Spark for data ingestion, transformation, and analysis. Learn to optimize data processing and improve query performance with Delta Lake. Manage streaming data processing with Spark Structured Streaming capabilities. Implement DataOps and DevOps workflows tailored for Databricks. Enforce data governance policies using Unity Catalog for scalable solutions. Author(s) Pulkit Chadha, the author of this book, is a Senior Solutions Architect at Databricks. With extensive experience in data engineering and big data applications, he brings practical insights into implementing modern data solutions. His educational writings focus on empowering data professionals with actionable knowledge. Who is it for? This book is ideal for data engineers, data scientists, and analysts who want to deepen their knowledge in managing and transforming large datasets. Readers should have an intermediate understanding of SQL, Python programming, and basic data architecture concepts. It is especially well-suited for professionals working with Databricks or similar cloud-based data platforms.

The Ultimate Guide to Snowpark

2024-05-30 O'Reilly Amazon

book

Vivekanandan SS , Shankar Narayanan SGS

data data-engineering Snowflake AI/ML Cloud Computing Data Engineering

The Ultimate Guide to Snowpark serves as a comprehensive resource to help you master the Snowflake Snowpark framework using Python. You'll learn how to manage data engineering, data science, and data applications in Snowpark, coupled with practical implementations and examples. By following this guide, you'll gain the skills needed to efficiently process and analyze data in the Snowflake Data Cloud. What this Book will help me do Master Snowpark with Python for data engineering, data science, and data application workloads. Develop and deploy robust data pipelines using Snowpark in Python. Design, implement, and produce machine learning models using Snowpark. Learn to monetize and operationalize Snowflake-native applications. Effectively adopt Snowpark in production for scalable, efficient data solutions. Author(s) Shankar Narayanan SGS and Vivekanandan SS are experienced professionals in data engineering and Snowflake technologies. Shankar has extensive experience in utilizing Snowflake Snowpark to manage and enhance data solutions. Vivekanandan brings expertise in the intersection of Python programming and cloud-based data processing. Together, their combined knowledge and approachable writing style make this book an invaluable resource to readers. Who is it for? This book is designed for data engineers, data scientists, developers, and seasoned data practitioners. Ideal candidates are those looking to expand their skills in implementing Snowpark solutions using Python. A prior understanding of SQL, Python programming, and familiarity with Snowflake is beneficial for readers to fully leverage the techniques presented.

Concept Of Database Management System by Pearson

2024-05-21 O'Reilly Amazon

book

Shefali Naik

data data-engineering relational-databases Computer Science Oracle SQL

Concepts of Database Management System is designed to meet the syllabi requirements of undergraduate students of computer applications and computer science. It describes the concepts in an easy-to-understand language with sufficient number of examples. The overview of emerging trends in databases is thoroughly explained. A brief introduction to PL/SQL, MS-Access and Oracle is discussed to help students get a flavor of different types of database management systems.

Database Management Systems by Pearson

2024-05-16 O'Reilly Amazon

book

Rohit Khurana

data data-engineering relational-databases DWH Cyber Security SQL

Express Learning is a series of books designed as quick reference guides to important undergraduate computer courses. The organized and accessible format of these books allows students to learn important concepts in an easy-to-understand, question-and-answer format. These portable learning tools have been designed as one-stop references for students to understand and master the subjects by themselves.

Features –

• Designed as a student-friendly self-learning guide. The book is written in a clear, concise, and lucid manner. • Easy-to-understand question-and-answer format. • Includes previously asked as well as new questions organized in chapters. • All types of questions including MCQs, short and long questions are covered. • Solutions to numerical questions asked at examinations are provided. • All ideas and concepts are presented with clear examples. • Text is well structured and well supported with suitable diagrams. • Inter-chapter dependencies are kept to a minimum

Book Contents –

1: Database System 2: Conceptual Modelling 3: Relational Model 4: Relational Algebra and Calculus 5: Structured Query Language 6: Relational Database Design 7: Data Storage and Indexing 8: Query Processing and Optimization 9: Introduction to Transaction Processing 10: Concurrency Control Techniques 11: Database Recovery System 12: Database Security 13: Database System Architecture 14: Data Warehousing, OLAP, and Data Mining 15: Information Retrieval 16: Miscellaneous Questions

Learn SQL using MySQL in One Day and Learn It Well

2024-04-26 O'Reilly Amazon

book

Jamie Chan

data data-engineering relational-databases MySQL SQL

"Learn SQL using MySQL in One Day and Learn It Well" is your hands-on guide to mastering SQL efficiently using MySQL. This book takes you from understanding basic database concepts to executing advanced queries and implementing essential features like triggers and routines. With a project-based approach, you will confidently manage databases and unlock the potential of data. What this Book will help me do Understand database concepts and relational data architecture. Design and define tables to organize and store data effectively. Perform advanced SQL queries to manipulate and analyze data efficiently. Implement database triggers, views, and routines for advanced management. Apply practical skills in SQL through a comprehensive hands-on project. Author(s) Jamie Chan is a professional instructor and technical writer with extensive experience in database management and software development. Known for a clear and engaging teaching style, Jamie has authored numerous books focusing on hands-on learning. Jamie approaches pedagogy with the goal of making technical subjects accessible and practical for all learners. Who is it for? This book is designed for beginners eager to learn SQL and MySQL from scratch. It is perfect for professionals or students who want relevant and actionable skills in database management. Whether you're looking to enhance career prospects or leverage database tools for personal projects, this book is your practical starting point. Basic computer literacy is all that's needed.

Azure Data Factory by Example: Practical Implementation for Data Engineers

2024-03-22 O'Reilly Amazon

book

Richard Swinbank

data data-engineering storage-repositories data-lake Analytics Azure

Data engineers who need to hit the ground running will use this book to build skills in Azure Data Factory v2 (ADF). The tutorial-first approach to ADF taken in this book gets you working from the first chapter, explaining key ideas naturally as you encounter them. From creating your first data factory to building complex, metadata-driven nested pipelines, the book guides you through essential concepts in Microsoft’s cloud-based ETL/ELT platform. It introduces components indispensable for the movement and transformation of data in the cloud. Then it demonstrates the tools necessary to orchestrate, monitor, and manage those components. This edition, updated for 2024, includes the latest developments to the Azure Data Factory service: Enhancements to existing pipeline activities such as Execute Pipeline, along with the introduction of new activities such as Script, and activities designed specifically to interact with Azure Synapse Analytics. Improvements to flow control provided by activity deactivation and the Fail activity. The introduction of reusable data flow components such as user-defined functions and flowlets. Extensions to integration runtime capabilities including Managed VNet support. The ability to trigger pipelines in response to custom events. Tools for implementing boilerplate processes such as change data capture and metadata-driven data copying. What You Will Learn Create pipelines, activities, datasets, and linked services Build reusable components using variables, parameters, and expressions Move data into and around Azure services automatically Transform data natively using ADF data flows and Power Query data wrangling Master flow-of-control and triggers for tightly orchestrated pipeline execution Publish and monitor pipelines easily and with confidence Who This Book Is For Data engineers and ETL developers taking their first steps in Azure Data Factory, SQL Server Integration Services users making the transition toward doing ETL in Microsoft’s Azure cloud, and SQL Server database administrators involved in data warehousing and ETL operations

Learn T-SQL Querying - Second Edition

2024-02-29 O'Reilly Amazon

book

Pam Lahoud , Pedro Lopes

data data-engineering SQL Azure Microsoft SQL Server

Troubleshoot query performance issues, identify anti-patterns in your code, and write efficient T-SQL queries with this guide for T-SQL developers Key Features A definitive guide to mastering the techniques of writing efficient T-SQL code Learn query optimization fundamentals, query analysis, and how query structure impacts performance Discover insightful solutions to detect, analyze, and tune query performance issues Purchase of the print or Kindle book includes a free PDF eBook Book Description Data professionals seeking to excel in Transact-SQL for Microsoft SQL Server and Azure SQL Database often lack comprehensive resources. Learn T-SQL Querying second edition focuses on indexing queries and crafting elegant T-SQL code enabling data professionals gain mastery in modern SQL Server versions (2022) and Azure SQL Database. The book covers new topics like logical statement processing flow, data access using indexes, and best practices for tuning T-SQL queries. Starting with query processing fundamentals, the book lays a foundation for writing performant T-SQL queries. You’ll explore the mechanics of the Query Optimizer and Query Execution Plans, learning to analyze execution plans for insights into current performance and scalability. Using dynamic management views (DMVs) and dynamic management functions (DMFs), you’ll build diagnostic queries. The book covers indexing and delves into SQL Server’s built-in tools to expedite resolution of T-SQL query performance and scalability issues. Hands-on examples will guide you to avoid UDF pitfalls and understand features like predicate SARGability, Query Store, and Query Tuning Assistant. By the end of this book, you‘ll have developed the ability to identify query performance bottlenecks, recognize anti-patterns, and avoid pitfalls What you will learn Identify opportunities to write well-formed T-SQL statements Familiarize yourself with the Cardinality Estimator for query optimization Create efficient indexes for your existing workloads Implement best practices for T-SQL querying Explore Query Execution Dynamic Management Views Utilize the latest performance optimization features in SQL Server 2017, 2019, and 2022 Safeguard query performance during upgrades to newer versions of SQL Server Who this book is for This book is for database administrators, database developers, data analysts, data scientists and T-SQL practitioners who want to master the art of writing efficient T-SQL code and troubleshooting query performance issues through practical examples. A basic understanding of T-SQL syntax, writing queries in SQL Server, and using the SQL Server Management Studio tool will be helpful to get started.

PostgreSQL Query Optimization: The Ultimate Guide to Building Efficient Queries

2024-01-08 O'Reilly Amazon

book

Anna Bailliekova , Henrietta Dombrovskaya , Boris Novikov

data data-engineering relational-databases postgresql SQL

Write optimized queries. This book helps you write queries that perform fast and deliver results on time. You will learn that query optimization is not a dark art practiced by a small, secretive cabal of sorcerers. Any motivated professional can learn to write efficient queries from the get-go and capably optimize existing queries. You will learn to look at the process of writing a query from the database engine’s point of view, and know how to think like the database optimizer. The book begins with a discussion of what a performant system is and progresses to measuring performance and setting performance goals. It introduces different classes of queries and optimization techniques suitable to each, such as the use of indexes and specific join algorithms. You will learn to read and understand query execution plans along with techniques for influencing those plans for better performance. The book also covers advanced topics such as the use of functions and procedures, dynamic SQL, and generated queries. All of these techniques are then used together to produce performant applications, avoiding the pitfalls of object-relational mappers. This second edition includes new examples using Postgres 15 and the newest version of the PostgresAir database. It includes additional details and clarifications about advanced topics, and covers configuration parameters in greater depth. Finally, it makes use of advancements in NORM, using automatically generated functions. What You Will Learn Identify optimization goals in OLTP and OLAP systems Read and understand PostgreSQL execution plans Distinguish between short queries and long queries Choose the right optimization technique for each query type Identify indexes that will improve query performance Optimize full table scans Avoid the pitfalls of object-relational mapping systems Optimize the entire application rather than just database queries Who This Book Is For IT professionals working in PostgreSQL who want to develop performant and scalable applications, anyone whose job title contains the words “database developer” or “database administrator" or who is a backend developer charged with programming database calls, and system architects involved in the overall design of application systems running against a PostgreSQL database

PostgreSQL 16 Administration Cookbook

2023-12-04 O'Reilly Amazon

book

Simon Riggs , Vibhor Kumar , Gianni Ciolli , Boriss Mejías , Jimmy Angelakos

data data-engineering relational-databases postgresql SQL

This cookbook is a comprehensive guide to mastering PostgreSQL 16 database administration. With over 180 practical recipes, this book covers everything from query performance and backup strategies to replication and high availability. You'll gain hands-on expertise in solving real-world challenges while leveraging the new and improved features of PostgreSQL 16. What this Book will help me do Perform efficient batch processing with Postgres' SQL MERGE statement. Implement parallel transaction processes using logical replication. Enhance database backups and recovery with advanced compression techniques. Monitor and fine-tune database performance for optimal operation. Apply new PostgreSQL 16 features for secure and reliable databases. Author(s) The team of authors, including Gianni Ciolli, Boriss Mejías, Jimmy Angelakos, Vibhor Kumar, and Simon Riggs, bring years of experience in PostgreSQL database management and development. Their expertise spans professional system administration, academic research, and contributions to PostgreSQL development. Their collaborative insights enrich this comprehensive guide. Who is it for? This book is ideal for PostgreSQL database administrators seeking advanced techniques, data architects managing PostgreSQL in production, and developers interested in mastering PostgreSQL 16. Whether you're an experienced DBA upgrading to PostgreSQL 16 or a newcomer looking for practical recipes, this book provides valuable strategies and solutions.

Data Exploration and Preparation with BigQuery

2023-11-29 O'Reilly Amazon

book

Mike Kahn

data data-engineering google-bigquery Big Data BigQuery Cloud Computing

In "Data Exploration and Preparation with BigQuery," Michael Kahn provides a hands-on guide to understanding and utilizing Google's powerful data warehouse solution, BigQuery. This comprehensive book equips you with the skills needed to clean, transform, and analyze large datasets for actionable business insights. What this Book will help me do Master the process of exploring and assessing the quality of datasets. Learn SQL for performing efficient and advanced data transformations in BigQuery. Optimize the performance of BigQuery queries for speed and cost-effectiveness. Discover best practices for setting up and managing BigQuery resources. Apply real-world case studies to analyze data and derive meaningful insights. Author(s) Michael Kahn is an experienced data engineer and author specializing in big data solutions and technologies. With years of hands-on experience working with Google Cloud Platform and BigQuery, he has assisted organizations in optimizing their data pipelines for effective decision-making. His accessible writing style ensures complex topics become approachable, enabling readers of various skill levels to succeed. Who is it for? This book is tailored for data analysts, data engineers, and data scientists who want to learn how to effectively use BigQuery for data exploration and preparation. Whether you're new to BigQuery or looking to deepen your expertise in working with large datasets, this book provides clear guidance and practical examples to achieve your goals.

Cracking the Data Engineering Interview

2023-11-07 O'Reilly Amazon

book

Kedeisha Bryan , Taamir Ransome

data data-engineering CI/CD Cloud Computing Data Engineering Data Modelling

"Cracking the Data Engineering Interview" is your essential guide to mastering the data engineering interview process. This book offers practical insights and techniques to build your resume, refine your skills in Python, SQL, data modeling, and ETL, and confidently tackle over 100 mock interview questions. Gain the knowledge and confidence to land your dream role in data engineering. What this Book will help me do Craft a compelling data engineering portfolio to stand out to employers. Refresh and deepen understanding of essential topics like Python, SQL, and ETL. Master over 100 interview questions that cover both technical and behavioral aspects. Understand data engineering concepts such as data modeling, security, and CI/CD. Develop negotiation, networking, and personal branding skills crucial for job applications. Author(s) None Bryan and None Ransome are seasoned authors with a wealth of experience in data engineering and professional development. Drawing from their extensive industry backgrounds, they provide actionable strategies for aspiring data engineers. Their approachable writing style and real-world insights make complex topics accessible to readers. Who is it for? This book is ideal for aspiring data engineers looking to navigate the job application process effectively. Readers should be familiar with data engineering fundamentals, including Python, SQL, cloud data platforms, and ETL processes. It's tailored for professionals aiming to enhance their portfolios, tackle challenging interviews, and boost their chances of landing a data engineering role.

Learn PostgreSQL - Second Edition

2023-10-31 O'Reilly Amazon

book

Enrico Pirozzi , Luca Ferrari

data data-engineering relational-databases postgresql Cyber Security SQL

Learn PostgreSQL, a comprehensive guide to mastering PostgreSQL 16, takes readers on a journey from the fundamentals to advanced concepts, such as replication and database optimization. With hands-on exercises and practical examples, this book provides all you need to confidently use, manage, and build secure and scalable databases. What this Book will help me do Master the essentials of PostgreSQL 16, including advanced SQL features and performance tuning. Understand database replication methods and manage a scalable architecture. Enhance database security through roles, schemas, and strict privilege management. Learn how to personalize your experience with custom extensions and functions. Acquire practical skills in backup, restoration, and disaster recovery planning. Author(s) Luca Ferrari and Enrico Pirozzi are experienced database engineers and PostgreSQL enthusiasts with years of experience using and teaching PostgreSQL technology. They specialize in creating learning content that is practical and focused on real-world situations. Their writing emphasizes clarity and systematically equips readers with professional skills. Who is it for? This book is perfect for database professionals, software developers, and system administrators looking to develop their PostgreSQL expertise. Beginners with an interest in databases will also find this book highly approachable. Ideal for readers seeking to improve their database scalability and robustness. If you aim to hone practical PostgreSQL skills, this guide is essential.

Geospatial Analysis with SQL

2023-10-03 O'Reilly Amazon

book

Bonny P McClain

data data-engineering location-data geographic-information-system-gis geographic information system (gis) GIS

"Geospatial Analysis with SQL" is a practical guide that teaches you how to use SQL for geospatial data analysis. With direct, actionable guidance, you will learn to explore and analyze data using geospatial techniques without needing additional programming. This book equips you with the knowledge to solve location-based queries and perform advanced geospatial operations. What this Book will help me do Master the fundamentals of geospatial analysis and learn the importance of location-based data. Develop skills in creating and manipulating spatial database objects in SQL. Gain proficiency in using tools such as PostGIS and QGIS for geospatial data analysis. Learn techniques to visualize spatial data effectively and communicate results. Perform both single-layer and multi-layer spatial analysis for complex real-world scenarios. Author(s) Bonny P. McClain, the author of "Geospatial Analysis with SQL", brings extensive experience as a spatial data analyst and GIS expert. Bonny specializes in helping practitioners make data-driven insights through geospatial techniques. With a passion for teaching, Bonny's goal is to make complex concepts accessible and practical for analysts and developers alike. Who is it for? This book is ideal for GIS analysts, data analysts, and data scientists who have a basic understanding of SQL and geospatial concepts and want to expand their analytical capabilities. Readers looking to perform professional-grade geospatial analysis using SQL will find this book especially valuable. It caters to professionals wishing to use their SQL skills to understand and work with spatial datasets effectively.

Learning and Operating Presto

2023-09-21 O'Reilly Amazon

book

Tim Meehan , Ying Su , Vivek Bharathan , Angelica Lo Duca

data data-engineering Hadoop Presto BI Cloud Computing

The Presto community has mushroomed since its origins at Facebook in 2012. But ramping up this open source distributed SQL query engine can be challenging even for the most experienced engineers. With this practical book, data engineers and architects, platform engineers, cloud engineers, and software engineers will learn how to use Presto operations at your organization to derive insights on datasets wherever they reside. Authors Angelica Lo Duca, Tim Meehan, Vivek Bharathan, and Ying Su explain what Presto is, where it came from, and how it differs from other data warehousing solutions. You'll discover why Facebook, Uber, Alibaba Cloud, Hewlett Packard Enterprise, IBM, Intel, and many more use Presto and how you can quickly deploy Presto in production. With this book, you will: Learn how to install and configure Presto Use Presto with business intelligence tools Understand how to connect Presto to a variety of data sources Extend Presto for real-time business insight Learn how to apply best practices and tuning Get troubleshooting tips for logs, error messages, and more Explore Presto's architectural concepts and usage patterns Understand Presto security and administration

Leveling Up with SQL: Advanced Techniques for Transforming Data into Insights

2023-09-04 O'Reilly Amazon

book

Mark Simon

data data-engineering SQL

Learn to write SQL queries to select and analyze data, and improve your ability to manipulate data. This book will help you take your existing skills to the next level. Author Mark Simon kicks things off with a quick review of basic SQL knowledge, followed by a demonstration of how efficient SQL databases are designed and how to extract just the right data from them. You’ll then learn about each individual table’s structure and how to work with the relationships between tables. As you progress through the book, you will learn more sophisticated techniques such as using common table expressions and subqueries, analyzing your data using aggregate and windowing functions, and how to save queries in the form of views and other methods. This book employs an accessible approach to work through a realistic sample, enabling you to learn concepts as they arise to improve parts of the database or to work with the data itself. After completing this book, you will have a more thorough understanding of database structure and how to use advanced techniques to extract, manage, and analyze data. What Will You Learn Gain a stronger understanding of database design principles, especially individual tables Understand the relationships between tables Utilize techniques such as views, subqueries, common table expressions, and windowing functions Who Is This Book For: SQL Databases users who want to improve their knowledge and techniques.

Serverless Machine Learning with Amazon Redshift ML

2023-08-30 O'Reilly Amazon

book

Debu Panda , Bhanu Pittampally , Sumeet Joshi , Phil Bates

data data-engineering relational-databases amazon-redshift AI/ML Analytics

Serverless Machine Learning with Amazon Redshift ML provides a hands-on guide to using Amazon Redshift Serverless and Redshift ML for building and deploying machine learning models. Through SQL-focused examples and practical walkthroughs, you will learn efficient techniques for cloud data analytics and serverless machine learning. What this Book will help me do Grasp the workflow of building machine learning models with Redshift ML using SQL. Learn to handle supervised learning tasks like classification and regression. Apply unsupervised learning techniques, such as K-means clustering, in Redshift ML. Develop time-series forecasting models within Amazon Redshift. Understand how to operationalize machine learning in serverless cloud architecture. Author(s) Debu Panda, Phil Bates, Bhanu Pittampally, and Sumeet Joshi are seasoned professionals in cloud computing and machine learning technologies. They combine deep technical knowledge with teaching expertise to guide learners through mastering Amazon Redshift ML. Their collaborative approach ensures that the content is accessible, engaging, and practically applicable. Who is it for? This book is perfect for data scientists, machine learning engineers, and database administrators using or intending to use Amazon Redshift. It's tailored for professionals with basic knowledge of machine learning and SQL who aim to enhance their efficiency and specialize in serverless machine learning within cloud architectures.

High-Performance Data Architectures

2023-08-25 O'Reilly Amazon

book

Joe McKendrick , Ed Huang

data data-engineering Agile/Scrum Analytics Cloud Computing Data Management

By choosing the right database, you can maximize your business potential, improve performance, increase efficiency, and gain a competitive edge. This insightful report examines the benefits of using a simplified data architecture containing cloud-based HTAP (hybrid transactional and analytical processing) database capabilities. You'll learn how this data architecture can help data engineers and data decision makers focus on what matters most: growing your business. Authors Joe McKendrick and Ed Huang explain how cloud native infrastructure supports enterprise businesses and operations with a much more agile foundation. Just one layer up from the infrastructure, cloud-based databases are a crucial part of data management and analytics. Learn how distributed SQL databases containing HTAP capabilities provide more efficient and streamlined data processing to improve cost efficiency and expedite business operations and decision making. This report helps you: Explore industry trends in database development Learn the benefits of a simplified data architecture Comb through the complex and crowded database choices on the market Examine the process of selecting the right database for your business Learn the latest innovations database for improving your company's efficiency and performance

Data Engineering with dbt

2023-06-30 O'Reilly Amazon

book

Roberto Zagni

data data-engineering Analytics Cloud Computing Data Engineering dbt

Data Engineering with dbt provides a comprehensive guide to building modern, reliable data platforms using dbt and SQL. You'll gain hands-on experience building automated ELT pipelines, using dbt Cloud with Snowflake, and embracing patterns for scalable and maintainable data solutions. What this Book will help me do Set up and manage a dbt Cloud environment and create reliable ELT pipelines. Integrate Snowflake with dbt to implement robust data engineering workflows. Transform raw data into analytics-ready data using dbt's features and SQL. Apply advanced dbt functionality such as macros and Jinja for efficient coding. Ensure data accuracy and platform reliability with built-in testing and monitoring. Author(s) None Zagni is a seasoned data engineering professional with a wealth of experience in designing scalable data platforms. Through practical insights and real-world applications, Zagni demystifies complex data engineering practices. Their approachable teaching style makes technical concepts accessible and actionable. Who is it for? This book is perfect for data engineers, analysts, and analytics engineers looking to leverage dbt for data platform development. If you're a manager or decision maker interested in fostering efficient data workflows or a professional with basic SQL knowledge aiming to deepen your expertise, this resource will be invaluable.

Data Modeling with Snowflake

2023-05-31 O'Reilly Amazon

book

Serge Gershkovich

data data-engineering Snowflake Agile/Scrum Cloud Computing Data Management

This comprehensive guide, "Data Modeling with Snowflake", is your go-to resource for mastering the art of efficient data modeling tailored to the capabilities of the Snowflake Data Cloud. In this book, you will learn how to design agile and scalable data solutions by effectively leveraging Snowflake's unique architecture and advanced features. What this Book will help me do Understand the core principles of data modeling and how they apply to Snowflake's cloud-native environment. Learn to use Snowflake's features, such as time travel and zero-copy cloning, to create efficient data solutions. Gain hands-on experience with SQL recipes that outline practical approaches to transforming and managing Snowflake data. Discover techniques for modeling structured and semi-structured data for real-world business needs. Learn to integrate universal modeling frameworks like Star Schema and Data Vault into Snowflake implementations for scalability and maintainability. Author(s) The author, Serge Gershkovich, is a seasoned expert in database design and Snowflake architecture. With years of experience in the data management field, Serge has dedicated himself to making complex technical subjects approachable to professionals at all levels. His insights in this book are informed by practical applications and real-world experience. Who is it for? This book is targeted at data professionals, ranging from newcomers to database design to seasoned SQL developers seeking to specialize in Snowflake. If you are looking to understand and apply data modeling practices effectively within Snowflake's architecture, this book is for you. Whether you're refining your modeling skills or getting started with Snowflake, it provides the practical knowledge you need to succeed.

MySQL Crash Course

2023-05-23 O'Reilly Amazon

book

Rick Silva

data data-engineering relational-databases MySQL Java Python

MySQL Crash Course is a fast-paced, no-nonsense introduction to relational database development. It’s filled with practical examples and expert advice that will have you up and running quickly. You’ll learn the basics of SQL, how to create a database, craft SQL queries to extract data, and work with events, procedures, and functions. You’ll see how to add constraints to tables to enforce rules about permitted data and use indexes to accelerate data retrieval. You’ll even explore how to call MySQL from PHP, Python, and Java. Three final projects will show you how to build a weather database from scratch, use triggers to prevent errors in an election database, and use views to protect sensitive data in a salary database. You’ll also learn how to: •Query database tables for specific information, order the results, comment SQL code, and deal with null values •Define table columns to hold strings, integers, and dates, and determine what data types to use •Join multiple database tables as well as use temporary tables, common table expressions, derived tables, and subqueries •Add, change, and remove data from tables, create views based on specific queries, write reusable stored routines, and automate and schedule events The perfect quick-start resource for database developers, MySQL Crash Course will arm you with the tools you need to build and manage fast, powerful, and secure MySQL-based data storage systems.

talk-data.com

O'Reilly Data Engineering Books

Top Topics

Top Speakers

In-Memory Analytics with Apache Arrow - Second Edition

Big Data on Kubernetes

Information Modeling and Relational Databases, 3rd Edition

High Performance PostgreSQL for Rails

Databricks Certified Associate Developer for Apache Spark Using Python

Data Engineering with Databricks Cookbook

The Ultimate Guide to Snowpark

Concept Of Database Management System by Pearson

Database Management Systems by Pearson

Learn SQL using MySQL in One Day and Learn It Well

Azure Data Factory by Example: Practical Implementation for Data Engineers

Learn T-SQL Querying - Second Edition

PostgreSQL Query Optimization: The Ultimate Guide to Building Efficient Queries

PostgreSQL 16 Administration Cookbook

Data Exploration and Preparation with BigQuery

Cracking the Data Engineering Interview

Learn PostgreSQL - Second Edition

Geospatial Analysis with SQL

Learning and Operating Presto

Leveling Up with SQL: Advanced Techniques for Transforming Data into Insights

Serverless Machine Learning with Amazon Redshift ML

High-Performance Data Architectures

Data Engineering with dbt

Data Modeling with Snowflake

MySQL Crash Course