SQL

Data Engineering with Apache Spark, Delta Lake, and Lakehouse

2021-10-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Manoj Kukreja

Analytics Big Data Data Engineering Data Lakehouse Data Science Delta Python Spark apache-spark data data-engineering

Data Engineering with Apache Spark, Delta Lake, and Lakehouse is a comprehensive guide packed with practical knowledge for building robust and scalable data pipelines. Throughout this book, you will explore the core concepts and applications of Apache Spark and Delta Lake, and learn how to design and implement efficient data engineering workflows using real-world examples. What this Book will help me do Master the core concepts and components of Apache Spark and Delta Lake. Create scalable and secure data pipelines for efficient data processing. Learn best practices and patterns for building enterprise-grade data lakes. Discover how to operationalize data models into production-ready pipelines. Gain insights into deploying and monitoring data pipelines effectively. Author(s) None Kukreja is a seasoned data engineer with over a decade of experience working with big data platforms. He specializes in implementing efficient and scalable data solutions to meet the demands of modern analytics and data science. Writing with clarity and a practical approach, he aims to provide actionable insights that professionals can apply to their projects. Who is it for? This book is tailored for aspiring data engineers and data analysts who wish to delve deeper into building scalable data platforms. It is suitable for those with basic knowledge of Python, Spark, and SQL, and seeking to learn Delta Lake and advanced data engineering concepts. Readers should be eager to develop practical skills for tackling real-world data engineering challenges.

IBM Spectrum Protect Plus Protecting Database Applications

2021-10-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Julien Sauvanet , Kenneth Salerno , Markus Fehling

IBM Microsoft MongoDB Oracle SQL Server data data-engineering

IBM® Spectrum Protect Plus is a data protection solution that provides near-instant recovery, replication, retention management, and reuse for virtual machines, databases, and application backups in hybrid multicloud environments. This IBM Redpaper publication focuses on protecting database applications. IBM Spectrum® Protect Plus supports backup, restore, and data reuse for multiple databases, such as Oracle, IBM Db2®, MongoDB, Microsoft Exchange, and Microsoft SQL Server. Although other IBM Spectrum Protect Plus features focus on virtual environments, the database and application support of IBM Spectrum Protect Plus includes databases on virtual physical servers.

Azure Databricks Cookbook

2021-09-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Phani Raj , Vinod Jaiswal

Analytics Azure Big Data CI/CD Cosmos Databricks Delta Cyber Security Spark Data Streaming Synapse apache-spark +2 more

Azure Databricks is a robust analytics platform that leverages Apache Spark and seamlessly integrates with Azure services. In the Azure Databricks Cookbook, you'll find hands-on recipes to ingest data, build modern data pipelines, and perform real-time analytics while learning to optimize and secure your solutions. What this Book will help me do Design advanced data workflows integrating Azure Synapse, Cosmos DB, and streaming sources with Databricks. Gain proficiency in using Delta Tables and Spark for efficient data storage and analysis. Learn to create, deploy, and manage real-time dashboards with Databricks SQL. Master CI/CD pipelines for automating deployments of Databricks solutions. Understand security best practices for restricting access and monitoring Azure Databricks. Author(s) None Raj and None Jaiswal are experienced professionals in the field of big data and analytics. They are well-versed in implementing Azure Databricks solutions for real-world problems. Their collaborative writing approach ensures clarity and practical focus. Who is it for? This book is tailored for data engineers, scientists, and big data professionals who want to apply Azure Databricks and Apache Spark to their analytics workflows. A basic familiarity with Spark and Azure is recommended to make the best use of the recipes provided. If you're looking to scale and optimize your analytics pipelines, this book is for you.

PostGIS in Action, Third Edition

2021-09-12 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Leo S. Hsu , Regina Obe

GIS JSON RDBMS data data-engineering geographic-information-system-gis location-data postgis postgresql

In PostGIS in Action, Third Edition you will learn: An introduction to spatial databases Geometry, geography, raster, and topology spatial types, functions, and queries Applying PostGIS to real-world problems Extending PostGIS to web and desktop applications Querying data from external sources using PostgreSQL Foreign Data Wrappers Optimizing queries for maximum speed Simplifying geometries for greater efficiency PostGIS in Action, Third Edition teaches readers of all levels to write spatial queries for PostgreSQL. You’ll start by exploring vector-, raster-, and topology-based GIS before quickly progressing to analyzing, viewing, and mapping data. This fully updated third edition covers key changes in PostGIS 3.1 and PostgreSQL 13, including parallelization support, partitioned tables, and new JSON functions that help in creating web mapping applications. About the Technology PostGIS is a spatial database extender for PostgreSQL. It offers the features and firepower you need to take on nearly any geodata task. PostGIS lets you create location-aware queries with a few lines of SQL code, then build the backend for mapping, raster analysis, or routing application with minimal effort. About the Book PostGIS in Action, Third Edition shows you how to solve real-world geodata problems. You’ll go beyond basic mapping, and explore custom functions for your applications. Inside this fully updated edition, you’ll find coverage of new PostGIS features such as PostGIS Window functions, parallelization of queries, and outputting data for applications using JSON and Vector Tile functions. What's Inside Fully revised for PostGIS version 3.1 and PostgreSQL 13 Optimize queries for maximum speed Simplify geometries for greater efficiency Extend PostGIS to web and desktop applications About the Reader For readers familiar with relational databases and basic SQL. No prior geodata or GIS experience required. About the Authors Regina Obe and Leo Hsu are database consultants and authors. Regina is a member of the PostGIS core development team and the Project Steering Committee. Quotes The best introduction I’ve seen for engineers who want to get ramped up quickly and build advanced GIS applications. - Ikechukwu Okonkwo, Orum.io A wealth of information that showcases how powerful PostGIS is. - Luis Moux-Dominguez, EMO An extraordinary book for the world of GIS. Truly learned a lot! - DeUndre’ Rushon, DigiDiscover LLC Gives you insight into how best to provide map services for a wide audience. - Marcus Brown, Enel Green Power

The Definitive Guide to Azure Data Engineering: Modern ELT, DevOps, and Analytics on the Azure Cloud Platform

2021-08-06 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ron C. L'Esteve

Analytics Azure ADF Azure DevOps CI/CD Cloud Computing Cosmos Data Engineering Data Governance Data Lake Databricks DevOps +9 more

Build efficient and scalable batch and real-time data ingestion pipelines, DevOps continuous integration and deployment pipelines, and advanced analytics solutions on the Azure Data Platform. This book teaches you to design and implement robust data engineering solutions using Data Factory, Databricks, Synapse Analytics, Snowflake, Azure SQL database, Stream Analytics, Cosmos database, and Data Lake Storage Gen2. You will learn how to engineer your use of these Azure Data Platform components for optimal performance and scalability. You will also learn to design self-service capabilities to maintain and drive the pipelines and your workloads. The approach in this book is to guide you through a hands-on, scenario-based learning process that will empower you to promote digital innovation best practices while you work through your organization’s projects, challenges, and needs. The clear examples enable you to use this book as a reference and guide for building data engineering solutions in Azure. After reading this book, you will have a far stronger skill set and confidence level in getting hands on with the Azure Data Platform. What You Will Learn Build dynamic, parameterized ELT data ingestion orchestration pipelines in Azure Data Factory Create data ingestion pipelines that integrate control tables for self-service ELT Implement a reusable logging framework that can be applied to multiple pipelines Integrate Azure Data Factory pipelines with a variety of Azure data sources and tools Transform data with Mapping Data Flows in Azure Data Factory Apply Azure DevOps continuous integration and deployment practices to your Azure Data Factory pipelines and development SQL databases Design and implement real-time streaming and advanced analytics solutions using Databricks, Stream Analytics, and Synapse Analytics Get started with a variety of Azure data services through hands-on examples Who This Book Is For Data engineers and data architects who are interested in learning architectural and engineering best practices around ELT and ETL on the Azure Data Platform, those who are creating complex Azure data engineering projects and are searching for patterns of success, and aspiring cloud and data professionals involved in data engineering, data governance, continuous integration and deployment of DevOps practices, and advanced analytics who want a full understanding of the many different tools and technologies that Azure Data Platform provides

Data Modeling for Azure Data Services

2021-07-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Peter ter Braake

Azure ADF BI Cloud Computing Cosmos Data Lake Data Management Data Modelling Data Vault ETL/ELT dimensional modeling Microsoft +6 more

Data Modeling for Azure Data Services is an essential guide that delves into the intricacies of designing, provisioning, and implementing robust data solutions within the Azure ecosystem. Through practical examples and hands-on exercises, this book equips you with the knowledge to create scalable, performant, and adaptable database designs tailored to your business needs. What this Book will help me do Understand and apply normalization, dimensional modeling, and data vault modeling for relational databases. Learn to provision and implement scalable solutions like Azure SQL DB and Azure Synapse SQL Pool. Master how to design and model a Data Lake using Azure Storage efficiently. Gain expertise in NoSQL database modeling and implementing solutions using Azure Cosmos DB. Develop ETL/ELT processes effectively using Azure Data Factory to support data integration workflows. Author(s) None Braake brings a wealth of expertise as a data architect and cloud solutions builder specializing in Azure's data services. With hands-on experience in projects requiring sophisticated data modeling and optimization, None crafts detailed learning material to help professionals level up their database design and Azure deployment skills. Dedicated to explaining complex topics with clarity and approachable language, None ensures that the learners gain not just knowledge but applied competence. Who is it for? This book is a valuable resource for business intelligence developers, data architects, and consultants aiming to refine their skills in data modeling within modern cloud ecosystems, particularly Microsoft Azure. Whether you're a beginner with some foundational cloud data management knowledge or an experienced professional seeking to deepen your Azure data services proficiency, this book caters to your learning needs.

SQL Server on Kubernetes: Designing and Building a Modern Data Platform

2021-07-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ben Weissman , Anthony Nocentino (Pure Storage)

API Azure Big Data Cloud Computing Kubernetes data data-engineering microsoft-sql-server relational-databases

Build a modern data platform by deploying SQL Server in Kubernetes. Modern application deployment needs to be fast and consistent to keep up with business objectives and Kubernetes is quickly becoming the standard for deploying container-based applications, fast. This book introduces Kubernetes and its core concepts. Then it shows you how to build and interact with a Kubernetes cluster. Next, it goes deep into deploying and operationalizing SQL Server in Kubernetes, both on premises and in cloud environments such as the Azure Cloud. You will begin with container-based application fundamentals and then go into an architectural overview of a Kubernetes container and how it manages application state. Then you will learn the hands-on skill of building a production-ready cluster. With your cluster up and running, you will learn how to interact with your cluster and perform common administrative tasks. Once you can admin the cluster, you will learn how to deploy applications and SQL Server in Kubernetes. You will learn about high-availability options, and about using Azure Arc-enabled Data Services. By the end of this book, you will know how to set up a Kubernetes cluster, manage a cluster, deploy applications and databases, and keep everything up and running. What You Will Learn Understand Kubernetes architecture and cluster components Deploy your applications into Kubernetes clusters Manage your containers programmatically through API objects and controllers Deploy and operationalize SQL Server in Kubernetes Implement high-availability SQL Server scenarios on Kubernetes using Azure Arc-enabled Data Services Make use of Kubernetes deployments for Big Data Clusters Who This Book Is For DBAs and IT architects who are ready to begin planning their next-generation data platform and want to understand what it takes to run SQL Server in a container in Kubernetes. SQL Server on Kubernetes is an excellent choice for those who want to understand the big picture of why Kubernetes is the next-generation deployment method for SQL Server but also want to understand the internals, or the how, of deploying SQL Server in Kubernetes. When finished with this book, you will have the vision and skills to successfully architect, build and maintain a modern data platform deploying SQL Server on Kubernetes.

Advanced Analytics with Transact-SQL: Exploring Hidden Patterns and Rules in Your Data

2021-07-16 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dejan Sarka

AI/ML Analytics Azure BI Data Quality Data Science Microsoft Python Synapse data data-engineering microsoft-sql-server +2 more

Learn about business intelligence (BI) features in T-SQL and how they can help you with data science and analytics efforts without the need to bring in other languages such as R and Python. This book shows you how to compute statistical measures using your existing skills in T-SQL. You will learn how to calculate descriptive statistics, including centers, spreads, skewness, and kurtosis of distributions. You will also learn to find associations between pairs of variables, including calculating linear regression formulas and confidence levels with definite integration. No analysis is good without data quality. Advanced Analytics with Transact-SQL introduces data quality issues and shows you how to check for completeness and accuracy, and measure improvements in data quality over time. The book also explains how to optimize queries involving temporal data, such as when you search for overlapping intervals. More advanced time-oriented information in the book includes hazard and survival analysis. Forecasting with exponential moving averages and autoregression is covered as well. Every web/retail shop wants to know the products customers tend to buy together. Trying to predict the target discrete or continuous variable with few input variables is important for practically every type of business. This book helps you understand data science and the advanced algorithms use to analyze data, and terms such as data mining, machine learning, and text mining. Key to many of the solutions in this book are T-SQL window functions. Author Dejan Sarka demonstrates efficient statistical queries that are based on window functions and optimized through algorithms built using mathematical knowledge and creativity. The formulas and usage of those statistical procedures are explained so you can understand and modify the techniques presented. T-SQL is supported in SQL Server,Azure SQL Database, and in Azure Synapse Analytics. There are so many BI features in T-SQL that it might become your primary analytic database language. If you want to learn how to get information from your data with the T-SQL language that you already are familiar with, then this is the book for you. What You Will Learn Describe distribution of variables with statistical measures Find associations between pairs of variables Evaluate the quality of the data you are analyzing Perform time-series analysis on your data Forecast values of a continuous variable Perform market-basket analysis to predict customer purchasing patterns Predict target variable outcomes from one or more input variables Categorize passages of text by extracting and analyzing keywords Who This Book Is For Database developers and database administrators who want to translate their T-SQL skills into the world of business intelligence (BI) and data science. For readers who want to analyze large amounts of data efficiently by using their existing knowledge of T-SQL and Microsoft’s various database platforms such as SQL Server and Azure SQL Database. Also for readers who want to improve their querying by learning new and original optimization techniques.

Azure Data Factory by Example: Practical Implementation for Data Engineers

2021-06-09 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Richard Swinbank

Azure ADF Cloud Computing DWH ETL/ELT Microsoft SSIS data data-engineering microsoft-sql-server relational-databases

Data engineers who need to hit the ground running will use this book to build skills in Azure Data Factory v2 (ADF). The tutorial-first approach to ADF taken in this book gets you working from the first chapter, explaining key ideas naturally as you encounter them. From creating your first data factory to building complex, metadata-driven nested pipelines, the book guides you through essential concepts in Microsoft’s cloud-based ETL/ELT platform. It introduces components indispensable for the movement and transformation of data in the cloud. Then it demonstrates the tools necessary to orchestrate, monitor, and manage those components. The hands-on introduction to ADF found in this book is equally well-suited to data engineers embracing their first ETL/ELT toolset as it is to seasoned veterans of Microsoft’s SQL Server Integration Services (SSIS). The example-driven approach leads you through ADF pipeline construction from the ground up, introducing important ideas and making learning natural and engaging. SSIS users will find concepts with familiar parallels, while ADF-first readers will quickly master those concepts through the book’s steady building up of knowledge in successive chapters. Summaries of key concepts at the end of each chapter provide a ready reference that you can return to again and again. What You Will Learn Create pipelines, activities, datasets, and linked services Build reusable components using variables, parameters, and expressions Move data into and around Azure services automatically Transform data natively using ADF data flows and Power Query data wrangling Master flow-of-control and triggers for tightly orchestrated pipeline execution Publish and monitor pipelines easily and with confidence Who This Book Is For Data engineers and ETL developers taking their first steps in Azure Data Factory, SQL Server Integration Services users making the transition toward doing ETL in Microsoft’s Azure cloud, and SQL Server database administrators involved in data warehousing and ETL operations

PostgreSQL Query Optimization: The Ultimate Guide to Building Efficient Queries

2021-04-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Anna Bailliekova , Henrietta Dombrovskaya , Boris Novikov

data data-engineering postgresql relational-databases

Write optimized queries. This book helps you write queries that perform fast and deliver results on time. You will learn that query optimization is not a dark art practiced by a small, secretive cabal of sorcerers. Any motivated professional can learn to write efficient queries from the get-go and capably optimize existing queries. You will learn to look at the process of writing a query from the database engine’s point of view, and know how to think like the database optimizer. The book begins with a discussion of what a performant system is and progresses to measuring performance and setting performance goals. It introduces different classes of queries and optimization techniques suitable to each, such as the use of indexes and specific join algorithms. You will learn to read and understand query execution plans along with techniques for influencing those plans for better performance. The book also covers advanced topics such as the use of functions and procedures, dynamic SQL, and generated queries. All of these techniques are then used together to produce performant applications, avoiding the pitfalls of object-relational mappers. What You Will Learn Identify optimization goals in OLTP and OLAP systems Read and understand PostgreSQL execution plans Distinguish between short queries and long queries Choose the right optimization technique for each query type Identify indexes that will improve query performance Optimize full table scans Avoid the pitfalls of object-relational mapping systems Optimize the entire application rather than just database queries Who This Book Is For IT professionals working in PostgreSQL who want to develop performant and scalable applications, anyone whosejob title contains the words “database developer” or “database administrator" or who is a backend developer charged with programming database calls, and system architects involved in the overall design of application systems running against a PostgreSQL database

Introducing .NET for Apache Spark: Distributed Processing for Massive Datasets

2021-04-13 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ed Elliott

AI/ML API Big Data Hive Linux Microsoft Python Scala Spark Data Streaming apache-spark data +1 more

Get started using Apache Spark via C# or F# and the .NET for Apache Spark bindings. This book is an introduction to both Apache Spark and the .NET bindings. Readers new to Apache Spark will get up to speed quickly using Spark for data processing tasks performed against large and very large datasets. You will learn how to combine your knowledge of .NET with Apache Spark to bring massive computing power to bear by distributed processing of extremely large datasets across multiple servers. This book covers how to get a local instance of Apache Spark running on your developer machine and shows you how to create your first .NET program that uses the Microsoft .NET bindings for Apache Spark. Techniques shown in the book allow you to use Apache Spark to distribute your data processing tasks over multiple compute nodes. You will learn to process data using both batch mode and streaming mode so you can make the right choice depending on whether you are processing an existing dataset or are working against new records in micro-batches as they arrive. The goal of the book is leave you comfortable in bringing the power of Apache Spark to your favorite .NET language. What You Will Learn Install and configure Spark .NET on Windows, Linux, and macOS Write Apache Spark programs in C# and F# using the .NET bindings Access and invoke the Apache Spark APIs from .NET with the same high performance as Python, Scala, and R Encapsulate functionality in user-defined functions Transform and aggregate large datasets Execute SQL queries against files through Apache Hive Distribute processing of large datasets across multiple servers Create your own batch, streaming, and machine learning programs Who This Book Is For .NETdevelopers who want to perform big data processing without having to migrate to Python, Scala, or R; and Apache Spark developers who want to run natively on .NET and take advantage of the C# and F# ecosystems

R2DBC Revealed: Reactive Relational Database Connectivity for Java and JVM Programmers

2021-04-02 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Robert Hedgpeth

API Java MariaDB Oracle RDBMS data data-engineering relational-databases

Understand the newest trend in database programming for developers working in Java, Kotlin, Clojure, and other JVM-based languages. This book introduces Reactive Relational Database Connectivity (R2DBC), a modern way of connecting to and querying relational databases from Java and other JVM languages. The book begins by helping you understand not only what reactive programming is, but why it is necessary. Then building on those fundamentals, the book takes you into the world of databases and the newly released Reactive Relational Database Connectivity (R2DBC) specification. Examples in the book are worked using the freely available MariaDB database along with MariaDB’s vendor-implementation of the R2DBC service-provider interface (SPI). Following along with the examples and the provided example code helps prepare you to work with any of the growing number of R2DBC implementations for popular enterprise databases such as Oracle Database and SQL Server. You’ll be well prepared for what is becoming the future of database access from Java and other languages built on the JVM. What You Will Learn Understand why R2DBC was created and how it utilizes the Reactive Streams API Understand the components of the R2DBC service-provider interface Create and manage reactive database connections and connection pools using an R2DBC client Programmatically execute queries on a relational database using an R2DBC client Effectively utilize transactions using an R2DBC client Build relational database-driven applications that are event-driven and non-blocking Who This Book Is For Software developers building solutions using JVM languages and the JVM ecosystem, and developers who need an introduction to the R2DBC specification and reactive programming with relational databases and want to understand what Reactive Relational Database Connectivity is and why it came about. This book includes practical examples of using the R2DBC specification with Java and MariaDB that will provide developers with the knowledge they need to create their own solutions.

Professional Azure SQL Managed Database Administration - Third Edition

2021-03-08 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Shashikant Shakya , Ahmad Osama

Azure Cloud Computing Data Management ELK azure-sql-database data data-engineering relational-databases

Professional Azure SQL Managed Database Administration is a comprehensive guide to mastering data management with Azure's managed database services. Packed with real-world exercises and updated to cover the latest Azure features, this book provides actionable insights into migration, performance tuning, scaling, and securing Azure SQL databases. What this Book will help me do Master the configuration and pricing options for Azure SQL databases to make cost-effective choices. Learn the processes to provision new SQL databases or migrate existing on-premises SQL databases to Azure. Acquire skills in implementing high availability and disaster recovery for ensuring data resilience. Understand the strategies for monitoring, tuning, and optimizing the performance of Azure SQL databases. Discover techniques for scaling uses through elastic pools and securing databases comprehensively. Author(s) Ahmad Osama and Shashikant Shakya are experienced professionals in SQL Server and Azure SQL technologies. With decades of combined experience in database administration and cloud computing, they bring a depth of understanding to the content of this book. Their hands-on teaching approach is evident in the practical exercises and real-world scenarios included. Who is it for? This book is specifically tailored for database administrators, developers, and application developers looking to leverage Azure SQL databases. If you are tasked with migrating applications to the cloud or ensuring top performance and resilience for cloud databases, you will find this book highly valuable. Prior experience with on-premises SQL services will help contextualize the content, making it suitable for professionals with intermediate SQL experience. Readers aiming to deepen their Azure SQL expertise will also greatly benefit.

PostgreSQL 13 Cookbook

2021-02-26 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Vallarapu Naga Avinash Kumar

data data-engineering postgresql relational-databases

The "PostgreSQL 13 Cookbook" is your step-by-step resource for mastering PostgreSQL 13. Explore over 120 recipes, solving both common and advanced database management challenges, with a focus on high performance, fault tolerance, and cutting-edge features. What this Book will help me do Master the implementation of backup and recovery strategies tailored for PostgreSQL 13. Set up robust high availability clusters ensuring seamless failover with PostgreSQL replication features. Improve performance using optimization techniques specific to PostgreSQL 13 databases. Secure your databases with advanced authentication, encryption, and auditing measures. Analyze and monitor PostgreSQL servers to identify performance bottlenecks and maintain uptime efficiently. Author(s) Vallarapu Naga Avinash Kumar is an experienced PostgreSQL architect and developer who brings years of expertise in designing and managing enterprise-level databases. He has authored resources that simplify complex technical concepts for readers. His meticulous and straightforward writing approach empowers readers to skillfully apply PostgreSQL concepts in real-world scenarios. Who is it for? This book is perfect for database administrators, architects, and developers aiming to master PostgreSQL 13 capabilities. If you have prior experience with PostgreSQL and SQL, this cookbook will be a reliable reference to solve challenges and optimize your database solutions. If you're designing or managing databases, you'll find practical insights and actionable recipes tailored to your needs.

Snowflake Cookbook

2021-02-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Hamid Mahmood Qureshi , Hammad Sharif

Analytics Cloud Computing DWH Snowflake Spark data data-engineering

The "Snowflake Cookbook" is your guide to mastering Snowflake's unique cloud-centric architecture. This book provides detailed recipes for building modern data pipelines, configuring efficient virtual warehouses, ensuring robust data protection, and optimizing cost-performance-all while leveraging Snowflake's distinctive features such as data sharing and time travel. What this Book will help me do Set up and configure Snowflake's architecture for optimized performance and cost efficiency. Design and implement robust data pipelines using SQL and Snowflake's specialized features. Secure, manage, and share data efficiently with built-in Snowflake capabilities. Apply performance tuning techniques to enhance your Snowflake implementations. Extend Snowflake's functionality with tools like Spark Connector for advanced workflows. Author(s) Hamid Mahmood Qureshi and Hammad Sharif are both seasoned experts in data warehousing and cloud computing technologies. With extensive experience implementing analytics solutions, they bring a hands-on approach to teaching Snowflake. They are ardent proponents of empowering readers towards creating effective and scalable data solutions. Who is it for? This book is perfect for data warehouse developers, data analysts, cloud architects, and anyone managing cloud data solutions. If you're familiar with basic database concepts or just stepping into Snowflake, you'll find practical guidance here to deepen your understanding and functional expertise in cloud data warehousing.

Learn FileMaker Pro 19: The Comprehensive Guide to Building Custom Databases

2021-02-24 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Mark Conway Munro

JSON data data-engineering filemaker

Discover how easy it is to create multi-user, cross-platform custom solutions with FileMaker Pro, the relational database platform published by Apple subsidiary Claris International, Inc. Meticulously rewritten with clearer lessons, more real-world examples and updated to include feature changes introduced in recent versions, this book makes it easier to get started planning, building and deploying a custom database solution. The material is presented in an easy to follow manner with each chapter building on the last. After an initial review of the user environment and application basics, it begins a deep exploration of the integrated development environment that seamlessly combines the full stack of data table schema, business logic and interface layers into one visual programming experience. This book includes everything needed to get started building custom databases and contains advanced material that seasoned professionals will appreciate. Written bya professional developer with decades of real-world experience, Learn FileMaker Pro 19 is your comprehensive learning and reference guide. Join millions of users and developers worldwide in achieving a new level of workflow efficiency with FileMaker Pro. What You’ll Learn Discover interface and feature changes in FileMaker 17-19 Create and maintain healthy files Plan and create custom tables, fields, relationships Write calculations using built-in and custom functions Build recursive and repeating formulas Discover advanced features using cURL, JSON, SQL, ODBC and FM URL Manipulate data files in the computer directory with scripts Deploy solutions to a server and share with desktop, iOS and web clients Who This Book Is For Casual programmers, full time consultants, and IT professionals

Building Custom Tasks for SQL Server Integration Services: The Power of .NET for ETL for SQL Server 2019 and Beyond

2021-02-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Andy Leonard

Azure ADF Azure DevOps Cloud Computing DevOps ETL/ELT Microsoft SSIS data data-engineering etl

Build custom SQL Server Integration Services (SSIS) tasks using Visual Studio Community Edition and C#. Bring all the power of Microsoft .NET to bear on your data integration and ETL processes, and for no added cost over what you’ve already spent on licensing SQL Server. New in this edition is a demonstration deploying a custom SSIS task to the Azure Data Factory (ADF) Azure-SSIS Integration Runtime (IR). All examples in this new edition are implemented in C#. Custom task developers are shown how to implement custom tasks using the widely accepted and default language for .NET development. Why are custom components necessary? Because even though the SSIS catalog of built-in tasks and components is a marvel of engineering, gaps remain in the available functionality. One such gap is a constraint of the built-in SSIS Execute Package Task, which does not allow SSIS developers to select SSIS packages from other projects in the SSIS Catalog. Examples in this bookshow how to create a custom Execute Catalog Package task that allows SSIS developers to execute tasks from other projects in the SSIS Catalog. Building on the examples and patterns in this book, SSIS developers may create any task to which they aspire, custom tailored to their specific data integration and ETL needs. What You Will Learn Configure and execute Visual Studio in the way that best supports SSIS task development Create a class library as the basis for an SSIS task, and reference the needed SSIS assemblies Properly sign assemblies that you create in order to invoke them from your task Implement source code control via Azure DevOps, or your own favorite tool set Troubleshoot and execute custom tasks as part of your own projects Create deployment projects (MSIs) for distributing code-complete tasks Deploy custom tasks to Azure Data Factory Azure-SSIS IRs in the cloud Create advanced editors for custom task parameters Who This Book Is For For database administrators and developers who are involved in ETL projects built around SQL Server Integration Services (SSIS). Readers do not need a background in software development with C#. Most important is a desire to optimize ETL efforts by creating custom-tailored tasks for execution in SSIS packages, on-premises or in ADF Azure-SSIS IRs.

MySQL Concurrency: Locking and Transactions for MySQL Developers and DBAs

2021-01-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jesper Wisborg Krogh

MySQL data data-engineering relational-databases

Know how locks work in MySQL and how they relate to transactions. This book explains the major role that locks play in database systems, showing how locks are essential in allowing high-concurrency workloads. You will learn about lock access levels and lock granularities from the user level as well as table locks to record and gap locks. Most importantly, the book covers troubleshooting techniques when locking becomes a pain point. Several of the lock types in MySQL have a duration of a transaction. For this reason, it is important to understand how transactions work. This book covers the basics of transactions as well as transaction isolation levels and how they affect locking. The book is meant to be your go-to resource for solving locking contention and similar problems in high-performance MySQL database applications. Detecting locking issues when they occur is the first key to resolving such issues. MySQL Concurrency provides techniques for detecting locking issues such as contention. The book shows how to analyze locks that are causing contention to see why those locks are in place. A collection of six comprehensive case studies combine locking and transactional theory with realistic lock conflicts. The case studies walk you through the symptoms to look for in order to identify which issue you are facing, the cause of the conflict, its analysis, solution, and how to prevent the issue in the future. What You Will Learn Understand which lock types exist in MySQL and how they are used Choose the best transaction isolation level for a given transaction Detect and analyze lock contention when it occurs Reduce locking issues in your applications Resolve deadlocks between transactions Resolve InnoDB record-level locking issues Resolve issues from metadata and schema locks Who This Book Is For Database administrators and SQL developers who are familiar with MySQL and want to gain a better understanding of locking and transactions as well as how to work with them. While some experience with MySQL is required, no prior knowledge of locks and transactions is needed.

High Performance SQL Server: Consistent Response for Mission-Critical Applications

2021-01-21 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Benjamin Nevarez

Linux data data-engineering microsoft-sql-server relational-databases

Design and configure SQL Server instances and databases in support of high-throughput, mission-critical applications providing consistent response times in the face of variations in numbers of users and query volumes. In this new edition, with over 100 pages of additional content, every original chapter has been updated for SQL Server 2019, and the book also includes two new chapters covering SQL Server on Linux and Intelligent Query Processing. This book shows you how to configure SQL Server and design your databases to support a given instance and workload. You will learn advanced configuration options, in-memory technologies, storage and disk configuration, and more, all aimed toward enabling your desired application performance and throughput. Configuration doesn’t stop with implementation. Workloads change over time, and other impediments can arise to thwart desired performance. High Performance SQL Server covers monitoring and troubleshooting to aid you in detecting and fixing production performance problems and minimizing application outages. You will learn about a variety of tools, ranging from the traditional wait analysis methodology to the query store or indexing, and you will learn how improving performance is an iterative process. This book is an excellent complement to query performance tuning books and provides the other half of what you need to know by focusing on configuring the instances on which mission-critical queries are executed. What You Will Learn Understand SQL Server's database engine and how it processes queries Configure instances in support of high-throughput applications Provide consistent response times to varying user numbers and query volumes Design databases for high-throughput applications with focus on performance Record performance baselines and monitor SQL Server instances against them Troubleshot and fix performance problems Who This Book Is For SQL Server database administrators, developers, and data architects. The book is also of use to system administrators who are managing and are responsible for the physical servers on which SQL Server instances are run.

Pro SQL Server Relational Database Design and Implementation: Best Practices for Scalability and Performance

2020-12-14 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Louis Davidson

Cloud Computing Cyber Security data data-engineering relational-databases

Learn effective and scalable database design techniques in SQL Server 2019 and other recent SQL Server versions. This book is revised to cover additions to SQL Server that include SQL graph enhancements, in-memory online transaction processing, temporal data storage, row-level security, and other design-related features. This book will help you design OLTP databases that are high-quality, protect the integrity of your data, and perform fast on-premises, in the cloud, or in hybrid configurations. Designing an effective and scalable database using SQL Server is a task requiring skills that have been around for well over 30 years, using technology that is constantly changing. This book covers everything from design logic that business users will understand to the physical implementation of design in a SQL Server database. Grounded in best practices and a solid understanding of the underlying theory, author Louis Davidson shows you how to "getit right" in SQL Server database design and lay a solid groundwork for the future use of valuable business data. What You Will Learn Develop conceptual models of client data using interviews and client documentation Implement designs that work on premises, in the cloud, or in a hybrid approach Recognize and apply common database design patterns Normalize data models to enhance integrity and scalability of your databases for the long-term use of valuable data Translate conceptual models into high-performing SQL Server databases Secure and protect data integrity as part of meeting regulatory requirements Create effective indexing to speed query performance Understand the concepts of concurrency Who This Book Is For Programmers and database administrators of all types who want to use SQL Server to store transactional data. The book is especially useful to those wanting to learn the latest database design features in SQL Server 2019 (features that include graph objects, in-memory OLTP, temporal data support, and more). Chapters on fundamental concepts, the language of database modeling, SQL implementation, and the normalization process lay a solid groundwork for readers who are just entering the field of database design. More advanced chapters serve the seasoned veteran by tackling the latest in physical implementation features that SQL Server has to offer. The book has been carefully revised to cover all the design-related features that are new in SQL Server 2019.

talk-data.com

Activity Trend

Top Events

Top Speakers

Data Engineering with Apache Spark, Delta Lake, and Lakehouse

IBM Spectrum Protect Plus Protecting Database Applications

Azure Databricks Cookbook

PostGIS in Action, Third Edition

The Definitive Guide to Azure Data Engineering: Modern ELT, DevOps, and Analytics on the Azure Cloud Platform

Data Modeling for Azure Data Services

SQL Server on Kubernetes: Designing and Building a Modern Data Platform

Advanced Analytics with Transact-SQL: Exploring Hidden Patterns and Rules in Your Data

Azure Data Factory by Example: Practical Implementation for Data Engineers

PostgreSQL Query Optimization: The Ultimate Guide to Building Efficient Queries

Introducing .NET for Apache Spark: Distributed Processing for Massive Datasets

R2DBC Revealed: Reactive Relational Database Connectivity for Java and JVM Programmers

Professional Azure SQL Managed Database Administration - Third Edition

PostgreSQL 13 Cookbook

Snowflake Cookbook

Learn FileMaker Pro 19: The Comprehensive Guide to Building Custom Databases

Building Custom Tasks for SQL Server Integration Services: The Power of .NET for ETL for SQL Server 2019 and Beyond

MySQL Concurrency: Locking and Transactions for MySQL Developers and DBAs

High Performance SQL Server: Consistent Response for Mission-Critical Applications

Pro SQL Server Relational Database Design and Implementation: Best Practices for Scalability and Performance