Microsoft

Practical Database Auditing for Microsoft SQL Server and Azure SQL: Troubleshooting, Regulatory Compliance, and Governance

2022-09-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Josephine Bush

AWS Amazon RDS Azure Cloud Computing DevOps SQL SQL Server data data-engineering microsoft-sql-server relational-databases

Know how to track changes and key events in your SQL Server databases in support of application troubleshooting, regulatory compliance, and governance. This book shows how to use key features in SQL Server ,such as SQL Server Audit and Extended Events, to track schema changes, permission changes, and changes to your data. You’ll even learn how to track queries run against specific tables in a database. Not all changes and events can be captured and tracked using SQL Server Audit and Extended Events, and the book goes beyond those features to also show what can be captured using common criteria compliance, change data capture, temporal tables, or querying the SQL Server log. You will learn how to audit just what you need to audit, and how to audit pretty much anything that happens on a SQL Server instance. This book will also help you set up cloud auditing with an emphasis on Azure SQL Database, Azure SQL Managed Instance, and AWS RDS SQL Server. You don’t need expensive, third-party auditing tools to make auditing work for you, and to demonstrate and provide value back to your business. This book will help you set up an auditing solution that works for you and your needs. It shows how to collect the audit data that you need, centralize that data for easy reporting, and generate audit reports using built-in SQL Server functionality for use by your own team, developers, and organization’s auditors. What You Will Learn Understand why auditing is important for troubleshooting, compliance, and governance Track changes and key events using SQL Server Audit and Extended Events Track SQL Server configuration changes for governance and troubleshooting Utilize change data capture and temporal tables to track data changes in SQL Server tables Centralize auditing data from all yourdatabases for easy querying and reporting Configure auditing on Azure SQL, Azure SQL Managed Instance, and AWS RDS SQL Server Who This Book Is For Database administrators who need to know what’s changing on their database servers, and those who are making the changes; database-savvy DevOps engineers and developers who are charged with troubleshooting processes and applications; developers and administrators who are responsible for generating reports in support of regulatory compliance reporting and auditing

Pro Database Migration to Azure: Data Modernization for the Enterprise

2022-08-08 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dustin Dorsey (Onix) , Matt Gordon , Denis McDowell , Kevin Kline

Azure Cloud Computing MySQL Oracle SQL data data-engineering data-migration postgresql

Migrate your existing, on-premises applications into the Microsoft Azure cloud platform. This book covers the best practices to plan, implement, and operationalize the migration of a database application from your organization’s data center to Microsoft’s Azure cloud platform. Data modernization and migration is a technologically complex endeavor that can also be taxing from a leadership and operational standpoint. This book covers not only the technology, but also the most important aspects of organization culture, communication, and politics that so frequently derail such projects. You will learn the most important steps to ensuring a successful migration and see battle-tested wisdom from industry veterans. From executive sponsorship, to executing the migration, to the important steps following migration, you will learn how to effectively conduct future migrations and ensure that your team and your database application delivers on the expected business value of the project. This book is unlike any other currently in the market. It takes you through the most critical business and technical considerations and workflows for moving your data and databases into the cloud, with special attention paid to those who are deploying to the Microsoft Data Platform in Azure, especially SQL Server. Although this book focuses on migrating on-premises SQL Server enterprises to hybrid or fully cloud-based Azure SQL Database and Azure SQL Managed Instances, it also cover topics involving migrating non-SQL Server database platforms such as Oracle, MySQL, and PostgreSQL applications to Microsoft Azure. What You Will Learn Plan a database migration that ensures smooth project progress, optimal performance, low operating cost, and minimal downtime Properly analyze and manage non-technical considerations, such as legal compliance, privacy, and team execution Perform athorough architectural analysis to select the best Azure services, performance tiers, and cost-containment features Avoid pitfalls and common reasons for failure relating to corporate culture, intra-office politics, and poor communications Secure the proper executive champions who can execute the business planning needed for success Apply proven criteria to determine your future-state architecture and your migration method Execute your migration using a process proven by the authors over years of successful projects Who This Book Is For IT leadership, strategic IT decision makers, project owners and managers, and enterprise and application architects. For anyone looking toward cloud migration projects as the next stage of growth in their careers. Also useful for enterprise DBAs and consultants who might be involved in such projects. Readers should have experience and be competent in designing, coding, implementing, and supporting database applications in an on-premises environment.

Learn dbatools in a Month of Lunches

2022-07-20 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Rob Sewell , Chrissy LeMaire , Claudio Silva , Jess Pomfret

GitHub PowerShell Cyber Security SQL SQL Server data data-engineering microsoft-sql-server relational-databases

If you work with SQL Server, dbatools is a lifesaver. This book will show you how to use this free and open source PowerShell module to automate just about every SQL server task you can imagine—all in just one month! In Learn dbatools in a Month of Lunches you will learn how to: Perform instance-to-instance and customized migrations Automate security audits, tempdb configuration, alerting, and reporting Schedule and monitor PowerShell tasks in SQL Server Agent Bulk-import any type of data into SQL Server Install dbatools in secure environments Written by a group of expert authors including dbatools creator Chrissy LeMaire, Learn dbatools in a Month of Lunches teaches you techniques that will make you more effective—and efficient—than you ever thought possible. In twenty-eight lunchbreak lessons, you’ll learn the most important use cases of dbatools and the favorite functions of its core developers. Stabilize and standardize your SQL server environment, and simplify your tasks by building automation, alerting, and reporting with this powerful tool. About the Technology For SQL Server DBAs, automation is the key to efficiency. Using the open-source dbatools PowerShell module, you can easily execute tasks on thousands of database servers at once—all from the command line. dbatools gives you over 500 pre-built commands, with countless new options for managing SQL Server at scale. There’s nothing else like it. About the Book Learn dbatools in a Month of Lunches teaches you how to automate SQL Server using the dbatools PowerShell module. Each 30-minute lesson introduces a new automation that will make your daily duties easier. Following the expert advice of dbatools creator Chrissy LeMaire and other top community contributors, you’ll learn to script everything from backups to disaster recovery. What's Inside Performing instance-to-instance and customized migrations Automating security audits, best practices, and standardized configurations Administering SQL Server Agent including running PowerShell scripts effectively Bulk-importing many types of data into SQL Server Executing advanced tasks and increasing efficiency for everyday administration About the Reader For DBAs, accidental DBAs, and systems engineers who manage SQL Server. About the Authors Chrissy LeMaire is a GitHub Star and the creator of dbatools. Rob Sewell is a data engineer and a passionate automator. Jess Pomfret and Cláudio Silva are data platform architects. All are Microsoft MVPs. Quotes All SQL Server professionals should learn dbatools. With its combination of knowledge transfer, anecdotes, and hands-on labs, this book is the perfect way. - From the Foreword by Anna Hoffman, Databases Product Management, Microsoft Excellent guide for dbatools with lots of practical tips! Required reading for anyone interested in dbatools. - Ruben Vandeginste, PeopleWare A must-have for any SQL server developer. - Raushan Kumar Jha, Microsoft If you want to automate all vital aspects of SQL Server, wait no more! Learn dbatools in a month, with guidance from the best minds in the business. - Ranjit Sahai, RAM Consulting

The Azure Data Lakehouse Toolkit: Building and Scaling Data Lakehouses on Azure with Delta Lake, Apache Spark, Databricks, Synapse Analytics, and Snowflake

2022-07-13 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ron L'Esteve

AI/ML Analytics Azure BI Cloud Computing Data Lakehouse Databricks Delta ETL/ELT PySpark Snowflake Spark +6 more

Design and implement a modern data lakehouse on the Azure Data Platform using Delta Lake, Apache Spark, Azure Databricks, Azure Synapse Analytics, and Snowflake. This book teaches you the intricate details of the Data Lakehouse Paradigm and how to efficiently design a cloud-based data lakehouse using highly performant and cutting-edge Apache Spark capabilities using Azure Databricks, Azure Synapse Analytics, and Snowflake. You will learn to write efficient PySpark code for batch and streaming ELT jobs on Azure. And you will follow along with practical, scenario-based examples showing how to apply the capabilities of Delta Lake and Apache Spark to optimize performance, and secure, share, and manage a high volume, high velocity, and high variety of data in your lakehouse with ease. The patterns of success that you acquire from reading this book will help you hone your skills to build high-performing and scalable ACID-compliant lakehouses using flexible and cost-efficient decoupled storage and compute capabilities. Extensive coverage of Delta Lake ensures that you are aware of and can benefit from all that this new, open source storage layer can offer. In addition to the deep examples on Databricks in the book, there is coverage of alternative platforms such as Synapse Analytics and Snowflake so that you can make the right platform choice for your needs. After reading this book, you will be able to implement Delta Lake capabilities, including Schema Evolution, Change Feed, Live Tables, Sharing, and Clones to enable better business intelligence and advanced analytics on your data within the Azure Data Platform. What You Will Learn Implement the Data Lakehouse Paradigm on Microsoft’s Azure cloud platform Benefit from the new Delta Lake open-source storage layer for data lakehouses Take advantage of schema evolution, change feeds, live tables, and more Writefunctional PySpark code for data lakehouse ELT jobs Optimize Apache Spark performance through partitioning, indexing, and other tuning options Choose between alternatives such as Databricks, Synapse Analytics, and Snowflake Who This Book Is For Data, analytics, and AI professionals at all levels, including data architect and data engineer practitioners. Also for data professionals seeking patterns of success by which to remain relevant as they learn to build scalable data lakehouses for their organizations and customers who are migrating into the modern Azure Data Platform.

SAP S/4HANA Systems in Hyperscaler Clouds: Deploying SAP S/4HANA in AWS, Google Cloud, and Azure

2022-05-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dhiraj Kumar , Jessica Tischbierek , Johannes Rank , Elena Wolz , André Bögelsack , Utpal Chakraborty

AWS Azure Cloud Computing ERP GCP SAP data data-engineering

This book helps SAP architects and SAP Basis administrators deploy and operate SAP S/4HANA systems on the most common public cloud platforms. Market-leading cloud offerings are covered, including Amazon Web Services, Microsoft Azure, and Google Cloud. You will gain an end-to-end understanding of the initial implementation of SAP S/4HANA systems on those platforms. You will learn how to move away from the big monolithic SAP ERP systems and arrive at an environment with a central SAP S/4HANA system as the digital core surrounded by cloud-native services. The book begins by introducing the core concepts of Hyperscaler cloud platforms that are relevant to SAP. You will learn about the architecture of SAP S/4HANA systems on public cloud platforms, with specific content provided for each of the major platforms. The book simplifies the deployment of SAP S/4HANA systems in public clouds by providing step-by-step instructions and helping you deal with thecomplexity of such a deployment. Content in the book is based on best practices, industry lessons learned, and architectural blueprints, helping you develop deep insights into the operations of SAP S/4HANA systems on public cloud platforms. Reading this book enables you to build and operate your own SAP S/4HANA system in the public cloud with a minimum of effort. What You Will Learn Choose the right Hyperscaler platform for your future SAP S/4HANA workloads Start deploying your first SAP S/4HANA system in the public cloud Avoid typical pitfalls during your implementation Apply and leverage cloud-native services for your SAP S/4HANA system Save costs by choosing the right architecture and build a robust architecture for your most critical SAP systems Meet your business’ criteria for availability and performance by having the right sizing in place Identify further use cases whenoperating SAP S/4HANA in the public cloud Who This Book Is For SAP architects looking for an answer on how to move SAP S/4HANA systems from on-premises into the cloud; those planning to deploy to one of the three major platforms from Amazon Web Services, Microsoft Azure, and Google Cloud Platform; and SAP Basis administrators seeking a detailed and realistic description of how to get started on a migration to the cloud and how to drive that cloud implementation to completion

Data Analysis with Python and PySpark

2022-03-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jonathan Rioux

AI/ML Analytics API Big Data Cloud Computing Data Science Hadoop Pandas PySpark Python Spark apache-spark +2 more

Think big about your data! PySpark brings the powerful Spark big data processing engine to the Python ecosystem, letting you seamlessly scale up your data tasks and create lightning-fast pipelines. In Data Analysis with Python and PySpark you will learn how to: Manage your data as it scales across multiple machines Scale up your data programs with full confidence Read and write data to and from a variety of sources and formats Deal with messy data with PySpark’s data manipulation functionality Discover new data sets and perform exploratory data analysis Build automated data pipelines that transform, summarize, and get insights from data Troubleshoot common PySpark errors Creating reliable long-running jobs Data Analysis with Python and PySpark is your guide to delivering successful Python-driven data projects. Packed with relevant examples and essential techniques, this practical book teaches you to build pipelines for reporting, machine learning, and other data-centric tasks. Quick exercises in every chapter help you practice what you’ve learned, and rapidly start implementing PySpark into your data systems. No previous knowledge of Spark is required. About the Technology The Spark data processing engine is an amazing analytics factory: raw data comes in, insight comes out. PySpark wraps Spark’s core engine with a Python-based API. It helps simplify Spark’s steep learning curve and makes this powerful tool available to anyone working in the Python data ecosystem. About the Book Data Analysis with Python and PySpark helps you solve the daily challenges of data science with PySpark. You’ll learn how to scale your processing capabilities across multiple machines while ingesting data from any source—whether that’s Hadoop clusters, cloud data storage, or local data files. Once you’ve covered the fundamentals, you’ll explore the full versatility of PySpark by building machine learning pipelines, and blending Python, pandas, and PySpark code. What's Inside Organizing your PySpark code Managing your data, no matter the size Scale up your data programs with full confidence Troubleshooting common data pipeline problems Creating reliable long-running jobs About the Reader Written for data scientists and data engineers comfortable with Python. About the Author As a ML director for a data-driven software company, Jonathan Rioux uses PySpark daily. He teaches the software to data scientists, engineers, and data-savvy business analysts. Quotes A clear and in-depth introduction for truly tackling big data with Python. - Gustavo Patino, Oakland University William Beaumont School of Medicine The perfect way to learn how to analyze and master huge datasets. - Gary Bake, Brambles Covers both basic and more advanced topics of PySpark, with a good balance between theory and hands-on. - Philippe Van Bergenl, P² Consulting For beginner to pro, a well-written book to help understand PySpark. - Raushan Kumar Jha, Microsoft

Analytics Optimization with Columnstore Indexes in Microsoft SQL Server: Optimizing OLAP Workloads

2022-02-26 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Edward Pollack

Analytics BI SQL SQL Server data data-engineering microsoft-sql-server relational-databases

Meet the challenge of storing and accessing analytic data in SQL Server in a fast and performant manner. This book illustrates how columnstore indexes can provide an ideal solution for storing analytic data that leads to faster performing analytic queries and the ability to ask and answer business intelligence questions with alacrity. The book provides a complete walk through of columnstore indexing that encompasses an introduction, best practices, hands-on demonstrations, explanations of common mistakes, and presents a detailed architecture that is suitable for professionals of all skill levels. With little or no knowledge of columnstore indexing you can become proficient with columnstore indexes as used in SQL Server, and apply that knowledge in development, test, and production environments. This book serves as a comprehensive guide to the use of columnstore indexes and provides definitive guidelines. You will learn when columnstore indexes shouldbe used, and the performance gains that you can expect. You will also become familiar with best practices around architecture, implementation, and maintenance. Finally, you will know the limitations and common pitfalls to be aware of and avoid. As analytic data can become quite large, the expense to manage it or migrate it can be high. This book shows that columnstore indexing represents an effective storage solution that saves time, money, and improves performance for any applications that use it. You will see that columnstore indexes are an effective performance solution that is included in all versions of SQL Server, with no additional costs or licensing required. What You Will Learn Implement columnstore indexes in SQL Server Know best practices for the use and maintenance of analytic data in SQL Server Use metadata to fully understand the size and shape of data stored in columnstore indexes Employ optimal ways to load, maintain, and delete data from large analytic tables Know how columnstore compression saves storage, memory, and time Understand when a columnstore index should be used instead of a rowstore index Be familiar with advanced features and analytics Who This Book Is For Database developers, administrators, and architects who are responsible for analytic data, especially for those working with very large data sets who are looking for new ways to achieve high performance in their queries, and those with immediate or future challenges to analytic data and query performance who want a methodical and effective solution

Access For Dummies

2021-12-14 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Laurie A. Ulrich , Ken Cook

Data Management Data Science data data-engineering database-management-tools microsoft-access

Become a database boss —and have fun doing it—with this accessible and easy-to-follow guide to Microsoft Access Databases hold the key to organizing and accessing all your data in one convenient place. And you don’t have to be a data science wizard to build, populate, and organize your own. With Microsoft Access For Dummies, you’ll learn to use the latest version of Microsoft’s Access software to power your database needs. Need to understand the essentials before diving in? Check out our Basic Training in Part 1 where we teach you how to navigate the Access workspace and explore the foundations of databases. Ready for more advanced tutorials? Skip right to the sections on Data Management, Queries, or Reporting where we walk you through Access’s more sophisticated capabilities. Not sure if you have Access via Office 2021 or Office 365? No worries – this book covers Access now matter how you access it. The book also shows you how to: Handle the most common problems that Access users encounter Import, export, and automatically edit data to populate your next database Write powerful and accurate queries to find exactly what you’re looking for, exactly when you need it Microsoft Access For Dummies is the perfect resource for anyone expected to understand, use, or administer Access databases at the workplace, classroom, or any other data-driven destination.

IBM Spectrum Protect Plus Protecting Database Applications

2021-10-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Julien Sauvanet , Kenneth Salerno , Markus Fehling

IBM MongoDB Oracle SQL SQL Server data data-engineering

IBM® Spectrum Protect Plus is a data protection solution that provides near-instant recovery, replication, retention management, and reuse for virtual machines, databases, and application backups in hybrid multicloud environments. This IBM Redpaper publication focuses on protecting database applications. IBM Spectrum® Protect Plus supports backup, restore, and data reuse for multiple databases, such as Oracle, IBM Db2®, MongoDB, Microsoft Exchange, and Microsoft SQL Server. Although other IBM Spectrum Protect Plus features focus on virtual environments, the database and application support of IBM Spectrum Protect Plus includes databases on virtual physical servers.

Data Engineering on Azure

2021-08-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Vlad Riscutia

AI/ML Analytics Azure Big Data Cloud Computing Data Engineering Data Governance Data Management Data Modelling Data Quality DevOps data +1 more

Build a data platform to the industry-leading standards set by Microsoft’s own infrastructure. In Data Engineering on Azure you will learn how to: Pick the right Azure services for different data scenarios Manage data inventory Implement production quality data modeling, analytics, and machine learning workloads Handle data governance Using DevOps to increase reliability Ingesting, storing, and distributing data Apply best practices for compliance and access control Data Engineering on Azure reveals the data management patterns and techniques that support Microsoft’s own massive data infrastructure. Author Vlad Riscutia, a data engineer at Microsoft, teaches you to bring an engineering rigor to your data platform and ensure that your data prototypes function just as well under the pressures of production. You'll implement common data modeling patterns, stand up cloud-native data platforms on Azure, and get to grips with DevOps for both analytics and machine learning. About the Technology Build secure, stable data platforms that can scale to loads of any size. When a project moves from the lab into production, you need confidence that it can stand up to real-world challenges. This book teaches you to design and implement cloud-based data infrastructure that you can easily monitor, scale, and modify. About the Book In Data Engineering on Azure you’ll learn the skills you need to build and maintain big data platforms in massive enterprises. This invaluable guide includes clear, practical guidance for setting up infrastructure, orchestration, workloads, and governance. As you go, you’ll set up efficient machine learning pipelines, and then master time-saving automation and DevOps solutions. The Azure-based examples are easy to reproduce on other cloud platforms. What's Inside Data inventory and data governance Assure data quality, compliance, and distribution Build automated pipelines to increase reliability Ingest, store, and distribute data Production-quality data modeling, analytics, and machine learning About the Reader For data engineers familiar with cloud computing and DevOps. About the Author Vlad Riscutia is a software architect at Microsoft. Quotes A definitive and complete guide on data engineering, with clear and easy-to-reproduce examples. - Kelum Prabath Senanayake, Echoworx An all-in-one Azure book, covering all a solutions architect or engineer needs to think about. - Albert Nogués, Danone A meaningful journey through the Azure ecosystem. You’ll be building pipelines and joining components quickly! - Todd Cook, Appen A gateway into the world of Azure for machine learning and DevOps engineers. - Krzysztof Kamyczek, Luxoft

Data Modeling for Azure Data Services

2021-07-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Peter ter Braake

Azure ADF BI Cloud Computing Cosmos Data Lake Data Management Data Modelling Data Vault ETL/ELT dimensional modeling NoSQL +6 more

Data Modeling for Azure Data Services is an essential guide that delves into the intricacies of designing, provisioning, and implementing robust data solutions within the Azure ecosystem. Through practical examples and hands-on exercises, this book equips you with the knowledge to create scalable, performant, and adaptable database designs tailored to your business needs. What this Book will help me do Understand and apply normalization, dimensional modeling, and data vault modeling for relational databases. Learn to provision and implement scalable solutions like Azure SQL DB and Azure Synapse SQL Pool. Master how to design and model a Data Lake using Azure Storage efficiently. Gain expertise in NoSQL database modeling and implementing solutions using Azure Cosmos DB. Develop ETL/ELT processes effectively using Azure Data Factory to support data integration workflows. Author(s) None Braake brings a wealth of expertise as a data architect and cloud solutions builder specializing in Azure's data services. With hands-on experience in projects requiring sophisticated data modeling and optimization, None crafts detailed learning material to help professionals level up their database design and Azure deployment skills. Dedicated to explaining complex topics with clarity and approachable language, None ensures that the learners gain not just knowledge but applied competence. Who is it for? This book is a valuable resource for business intelligence developers, data architects, and consultants aiming to refine their skills in data modeling within modern cloud ecosystems, particularly Microsoft Azure. Whether you're a beginner with some foundational cloud data management knowledge or an experienced professional seeking to deepen your Azure data services proficiency, this book caters to your learning needs.

Advanced Analytics with Transact-SQL: Exploring Hidden Patterns and Rules in Your Data

2021-07-16 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dejan Sarka

AI/ML Analytics Azure BI Data Quality Data Science Python SQL Synapse data data-engineering microsoft-sql-server +2 more

Learn about business intelligence (BI) features in T-SQL and how they can help you with data science and analytics efforts without the need to bring in other languages such as R and Python. This book shows you how to compute statistical measures using your existing skills in T-SQL. You will learn how to calculate descriptive statistics, including centers, spreads, skewness, and kurtosis of distributions. You will also learn to find associations between pairs of variables, including calculating linear regression formulas and confidence levels with definite integration. No analysis is good without data quality. Advanced Analytics with Transact-SQL introduces data quality issues and shows you how to check for completeness and accuracy, and measure improvements in data quality over time. The book also explains how to optimize queries involving temporal data, such as when you search for overlapping intervals. More advanced time-oriented information in the book includes hazard and survival analysis. Forecasting with exponential moving averages and autoregression is covered as well. Every web/retail shop wants to know the products customers tend to buy together. Trying to predict the target discrete or continuous variable with few input variables is important for practically every type of business. This book helps you understand data science and the advanced algorithms use to analyze data, and terms such as data mining, machine learning, and text mining. Key to many of the solutions in this book are T-SQL window functions. Author Dejan Sarka demonstrates efficient statistical queries that are based on window functions and optimized through algorithms built using mathematical knowledge and creativity. The formulas and usage of those statistical procedures are explained so you can understand and modify the techniques presented. T-SQL is supported in SQL Server,Azure SQL Database, and in Azure Synapse Analytics. There are so many BI features in T-SQL that it might become your primary analytic database language. If you want to learn how to get information from your data with the T-SQL language that you already are familiar with, then this is the book for you. What You Will Learn Describe distribution of variables with statistical measures Find associations between pairs of variables Evaluate the quality of the data you are analyzing Perform time-series analysis on your data Forecast values of a continuous variable Perform market-basket analysis to predict customer purchasing patterns Predict target variable outcomes from one or more input variables Categorize passages of text by extracting and analyzing keywords Who This Book Is For Database developers and database administrators who want to translate their T-SQL skills into the world of business intelligence (BI) and data science. For readers who want to analyze large amounts of data efficiently by using their existing knowledge of T-SQL and Microsoft’s various database platforms such as SQL Server and Azure SQL Database. Also for readers who want to improve their querying by learning new and original optimization techniques.

97 Things Every Data Engineer Should Know

2021-06-14 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Tobias Macey

AI/ML Data Engineering DWH ETL/ELT Modern Data Stack Cyber Security Stitch data data-engineering

Take advantage of today's sky-high demand for data engineers. With this in-depth book, current and aspiring engineers will learn powerful real-world best practices for managing data big and small. Contributors from notable companies including Twitter, Google, Stitch Fix, Microsoft, Capital One, and LinkedIn share their experiences and lessons learned for overcoming a variety of specific and often nagging challenges. Edited by Tobias Macey, host of the popular Data Engineering Podcast, this book presents 97 concise and useful tips for cleaning, prepping, wrangling, storing, processing, and ingesting data. Data engineers, data architects, data team managers, data scientists, machine learning engineers, and software engineers will greatly benefit from the wisdom and experience of their peers. Topics include: The Importance of Data Lineage - Julien Le Dem Data Security for Data Engineers - Katharine Jarmul The Two Types of Data Engineering and Data Engineers - Jesse Anderson Six Dimensions for Picking an Analytical Data Warehouse - Gleb Mezhanskiy The End of ETL as We Know It - Paul Singman Building a Career as a Data Engineer - Vijay Kiran Modern Metadata for the Modern Data Stack - Prukalpa Sankar Your Data Tests Failed! Now What? - Sam Bail

Azure Data Factory by Example: Practical Implementation for Data Engineers

2021-06-09 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Richard Swinbank

Azure ADF Cloud Computing DWH ETL/ELT SQL SSIS data data-engineering microsoft-sql-server relational-databases

Data engineers who need to hit the ground running will use this book to build skills in Azure Data Factory v2 (ADF). The tutorial-first approach to ADF taken in this book gets you working from the first chapter, explaining key ideas naturally as you encounter them. From creating your first data factory to building complex, metadata-driven nested pipelines, the book guides you through essential concepts in Microsoft’s cloud-based ETL/ELT platform. It introduces components indispensable for the movement and transformation of data in the cloud. Then it demonstrates the tools necessary to orchestrate, monitor, and manage those components. The hands-on introduction to ADF found in this book is equally well-suited to data engineers embracing their first ETL/ELT toolset as it is to seasoned veterans of Microsoft’s SQL Server Integration Services (SSIS). The example-driven approach leads you through ADF pipeline construction from the ground up, introducing important ideas and making learning natural and engaging. SSIS users will find concepts with familiar parallels, while ADF-first readers will quickly master those concepts through the book’s steady building up of knowledge in successive chapters. Summaries of key concepts at the end of each chapter provide a ready reference that you can return to again and again. What You Will Learn Create pipelines, activities, datasets, and linked services Build reusable components using variables, parameters, and expressions Move data into and around Azure services automatically Transform data natively using ADF data flows and Power Query data wrangling Master flow-of-control and triggers for tightly orchestrated pipeline execution Publish and monitor pipelines easily and with confidence Who This Book Is For Data engineers and ETL developers taking their first steps in Azure Data Factory, SQL Server Integration Services users making the transition toward doing ETL in Microsoft’s Azure cloud, and SQL Server database administrators involved in data warehousing and ETL operations

Distributed Data Systems with Azure Databricks

2021-05-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Alan Bernardo Palacio

AI/ML Azure ADF Big Data Cloud Computing Databricks Delta ETL/ELT Python Data Streaming TensorFlow data +3 more

In 'Distributed Data Systems with Azure Databricks', you will explore the capabilities of Microsoft Azure Databricks as a platform for building and managing big data pipelines. Learn how to process, transform, and analyze data at scale while developing expertise in training distributed machine learning models and integrating them into enterprise workflows. What this Book will help me do Design and implement Extract, Transform, Load (ETL) pipelines using Azure Databricks. Conduct distributed training of machine learning models using TensorFlow and Horovod. Integrate Azure Databricks with Azure Data Factory for optimized data pipeline orchestration. Utilize Delta Engine for efficient querying and analysis of data within Delta Lake. Employ Databricks Structured Streaming to manage real-time production-grade data flows. Author(s) None Palacio is an experienced data engineer and cloud computing specialist, with extensive knowledge of the Microsoft Azure platform. With years of practical application of Databricks in enterprise settings, Palacio provides clear, actionable insights through relatable examples. They bring a passion for innovative solutions to the field of big data automation. Who is it for? This book is ideal for data engineers, machine learning engineers, and software developers looking to master Azure Databricks for large-scale data processing and analysis. Readers should have basic familiarity with cloud platforms, understanding of data pipelines, and a foundational grasp of Python and machine learning concepts. It is perfect for those wanting to create scalable and manageable data workflows.

Introducing .NET for Apache Spark: Distributed Processing for Massive Datasets

2021-04-13 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ed Elliott

AI/ML API Big Data Hive Linux Python Scala Spark SQL Data Streaming apache-spark data +1 more

Get started using Apache Spark via C# or F# and the .NET for Apache Spark bindings. This book is an introduction to both Apache Spark and the .NET bindings. Readers new to Apache Spark will get up to speed quickly using Spark for data processing tasks performed against large and very large datasets. You will learn how to combine your knowledge of .NET with Apache Spark to bring massive computing power to bear by distributed processing of extremely large datasets across multiple servers. This book covers how to get a local instance of Apache Spark running on your developer machine and shows you how to create your first .NET program that uses the Microsoft .NET bindings for Apache Spark. Techniques shown in the book allow you to use Apache Spark to distribute your data processing tasks over multiple compute nodes. You will learn to process data using both batch mode and streaming mode so you can make the right choice depending on whether you are processing an existing dataset or are working against new records in micro-batches as they arrive. The goal of the book is leave you comfortable in bringing the power of Apache Spark to your favorite .NET language. What You Will Learn Install and configure Spark .NET on Windows, Linux, and macOS Write Apache Spark programs in C# and F# using the .NET bindings Access and invoke the Apache Spark APIs from .NET with the same high performance as Python, Scala, and R Encapsulate functionality in user-defined functions Transform and aggregate large datasets Execute SQL queries against files through Apache Hive Distribute processing of large datasets across multiple servers Create your own batch, streaming, and machine learning programs Who This Book Is For .NETdevelopers who want to perform big data processing without having to migrate to Python, Scala, or R; and Apache Spark developers who want to run natively on .NET and take advantage of the C# and F# ecosystems

Azure Data Engineering Cookbook

2021-04-05 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Nagaraj Venkatesan , Ahmad Osama

Analytics Azure ADF Cloud Computing Data Engineering Databricks ETL/ELT Synapse data data-engineering

Dive into the world of data engineering with 'Azure Data Engineering Cookbook' to master building efficient ETL workflows using Microsoft Azure Data services. Whether you're working on batch processing solutions or real-time analytics, this book is your guide to implementing effective, scalable data operations. What this Book will help me do Design and implement efficient ETL pipelines for batch and real-time processing on MS Azure. Understand the use of Azure Blob storage for managing large data sets. Ingest, process, and analyze data using tools like Azure Synapse and Databricks. Develop and secure automation pipelines using Azure Data Factory. Leverage Azure Stream Analytics for real-time data processing workflows. Author(s) Ahmad Osama and Nagaraj Venkatesan bring years of expertise in cloud solutions and data engineering. Renowned for their practical teaching approach, they have helped countless professionals master the intricacies of Azure. Their focus is on equipping readers with actionable skills for real-world data challenges. Who is it for? This book is ideal for data engineers and database professionals aiming to hone their expertise in advanced Azure data engineering tasks. Readers should have a working knowledge of Azure fundamentals and basic data engineering concepts. If you're a technical architect or ETL developer seeking to transition or enhance your skills in Azure's ecosystem, you'll find immense value here.

Building Custom Tasks for SQL Server Integration Services: The Power of .NET for ETL for SQL Server 2019 and Beyond

2021-02-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Andy Leonard

Azure ADF Azure DevOps Cloud Computing DevOps ETL/ELT SQL SSIS data data-engineering etl

Build custom SQL Server Integration Services (SSIS) tasks using Visual Studio Community Edition and C#. Bring all the power of Microsoft .NET to bear on your data integration and ETL processes, and for no added cost over what you’ve already spent on licensing SQL Server. New in this edition is a demonstration deploying a custom SSIS task to the Azure Data Factory (ADF) Azure-SSIS Integration Runtime (IR). All examples in this new edition are implemented in C#. Custom task developers are shown how to implement custom tasks using the widely accepted and default language for .NET development. Why are custom components necessary? Because even though the SSIS catalog of built-in tasks and components is a marvel of engineering, gaps remain in the available functionality. One such gap is a constraint of the built-in SSIS Execute Package Task, which does not allow SSIS developers to select SSIS packages from other projects in the SSIS Catalog. Examples in this bookshow how to create a custom Execute Catalog Package task that allows SSIS developers to execute tasks from other projects in the SSIS Catalog. Building on the examples and patterns in this book, SSIS developers may create any task to which they aspire, custom tailored to their specific data integration and ETL needs. What You Will Learn Configure and execute Visual Studio in the way that best supports SSIS task development Create a class library as the basis for an SSIS task, and reference the needed SSIS assemblies Properly sign assemblies that you create in order to invoke them from your task Implement source code control via Azure DevOps, or your own favorite tool set Troubleshoot and execute custom tasks as part of your own projects Create deployment projects (MSIs) for distributing code-complete tasks Deploy custom tasks to Azure Data Factory Azure-SSIS IRs in the cloud Create advanced editors for custom task parameters Who This Book Is For For database administrators and developers who are involved in ETL projects built around SQL Server Integration Services (SSIS). Readers do not need a background in software development with C#. Most important is a desire to optimize ETL efforts by creating custom-tailored tasks for execution in SSIS packages, on-premises or in ADF Azure-SSIS IRs.

Introducing Microsoft Access Using Macro Programming Techniques: An Introduction to Desktop Database Development by Example

2020-12-26 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Flavio Morgado

data data-engineering database-management-tools microsoft-access

Learn Microsoft Access by building a powerful database application from start to finish. Microsoft Access ships with every version of Office, from Office 2019 to Office 365 Home and Personal editions. Most people understand the value of having a reliable contact database, but few realize that Access can be an incredibly valuable data tool and an excellent gateway for learning database development. Introducing Microsoft Access Using Macro Programming Techniques approaches database development from a practical and experiential standpoint. You will learn important data concepts as you journey through each step of creating a database using Access. The example you will build takes advantage of a massive amount of data from an external source of nutritional data (USDA). You will leverage this freely available repository of information in multiple ways, putting Access to the test in creating powerful business solutions that you can then apply to your own data sets. Thetables and records in this database will be used to demonstrate key relational principles in Access, including how to use the relationship window to understand the relationships between tables and how to create different objects such as queries, forms, reports, and macros. Using this approach, you will learn how desktop database development can be a powerful solution to meet your business needs. What You Will Learn Discover the relational database and how it is different from other databases Create database tables and establish relationships between them to create a solid relational database system Understand the concept and importance of referential integrity (RI) in data and databases Use different types of Access queries to extract the information you need from the database Show database information in individual, customized windows using Access Forms Present insightful information about the database using Access Reports Automate your database solutions with macros Who This Book Is For Anyone who wants to learn how to build a database using Microsoft Access to create customized solutions. It is also useful for those working in IT managing large contact data sets (healthcare, retail, etc.) who need to learn the basics in order to create a professional database solution. Readers should have access to some version of Microsoft Access in order to perform the exercises in this book.

What Is a Data Lake?

2020-11-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Alex Gorelik

Analytics AWS Azure BI Big Data Cloud Computing Data Governance Data Lake Data Management GCP data data-engineering +2 more

A revolution is occurring in data management regarding how data is collected, stored, processed, governed, managed, and provided to decision makers. The data lake is a popular approach that harnesses the power of big data and marries it with the agility of self-service. With this report, IT executives and data architects will focus on the technical aspects of building a data lake for your organization. Alex Gorelik from Facebook explains the requirements for building a successful data lake that business users can easily access whenever they have a need. You'll learn the phases of data lake maturity, common mistakes that lead to data swamps, and the importance of aligning data with your company's business strategy and gaining executive sponsorship. You'll explore: The ingredients of modern data lakes, such as the use of different ingestion methods for different data formats, and the importance of the three Vs: volume, variety, and velocity Building blocks of successful data lakes, including data ingestion, integration, persistence, data governance, and business intelligence and self-service analytics State-of-the-art data lake architectures offered by Amazon Web Services, Microsoft Azure, and Google Cloud

talk-data.com

Activity Trend

Top Events

Top Speakers

Practical Database Auditing for Microsoft SQL Server and Azure SQL: Troubleshooting, Regulatory Compliance, and Governance

Pro Database Migration to Azure: Data Modernization for the Enterprise

Learn dbatools in a Month of Lunches

The Azure Data Lakehouse Toolkit: Building and Scaling Data Lakehouses on Azure with Delta Lake, Apache Spark, Databricks, Synapse Analytics, and Snowflake

SAP S/4HANA Systems in Hyperscaler Clouds: Deploying SAP S/4HANA in AWS, Google Cloud, and Azure

Data Analysis with Python and PySpark

Analytics Optimization with Columnstore Indexes in Microsoft SQL Server: Optimizing OLAP Workloads

Access For Dummies

IBM Spectrum Protect Plus Protecting Database Applications

Data Engineering on Azure

Data Modeling for Azure Data Services

Advanced Analytics with Transact-SQL: Exploring Hidden Patterns and Rules in Your Data

97 Things Every Data Engineer Should Know

Azure Data Factory by Example: Practical Implementation for Data Engineers

Distributed Data Systems with Azure Databricks

Introducing .NET for Apache Spark: Distributed Processing for Massive Datasets

Azure Data Engineering Cookbook

Building Custom Tasks for SQL Server Integration Services: The Power of .NET for ETL for SQL Server 2019 and Beyond

Introducing Microsoft Access Using Macro Programming Techniques: An Introduction to Desktop Database Development by Example

What Is a Data Lake?