DWH

Snowflake Security: Securing Your Snowflake Data Cloud

2021-10-05 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Yoav Cohen (Satori) , Ben Herzberg (Satori)

Cloud Computing Data Engineering Cyber Security Snowflake data data-engineering

This book is your complete guide to Snowflake security, covering account security, authentication, data access control, logging and monitoring, and more. It will help you make sure that you are using the security controls in a right way, are on top of access control, and making the most of the security features in Snowflake. Snowflake is the fastest growing cloud data warehouse in the world, and having the right methodology to protect the data is important both to data engineers and security teams. It allows for faster data enablement for organizations, as well as reducing security risks, meeting compliance requirements, and solving data privacy challenges. There are currently tens of thousands of people who are either data engineers/data ops in Snowflake-using organizations, or security people in such organizations. This book provides guidance when you want to apply certain capabilities, such as data masking, row-level security, column-level security, tackling rolehierarchy, building monitoring dashboards, etc., to your organizations. What You Will Learn Implement security best practices for Snowflake Set up user provisioning, MFA, OAuth, and SSO Set up a Snowflake security model Design roles architecture Use advanced access control such as row-based security and dynamic masking Audit and monitor your Snowflake Data Cloud Who This Book Is For Data engineers, data privacy professionals, and security teams either with security knowledge (preferably some data security knowledge) or with data engineering knowledge; in other words, either “Snowflake people” or “data people” who want to get security right, or “security people” who want to make sure that Snowflake gets handled right in terms of security

Amazon Redshift Cookbook

2021-07-23 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Harshida Patel (AWS) , Thiyagarajan Arumugam , Shruti Worlikar (AWS Analytics)

Analytics Cloud Computing ETL/ELT Redshift Cyber Security amazon-redshift data data-engineering relational-databases

Dive into the world of Amazon Redshift with this comprehensive cookbook, packed with practical recipes to build, optimize, and manage modern data warehousing solutions. From understanding Redshift's architecture to implementing advanced data warehousing techniques, this book provides actionable guidance to harness the power of Amazon Redshift effectively. What this Book will help me do Master the architecture and core concepts of Amazon Redshift to architect scalable data warehouses. Optimize data pipelines and automate ETL processes for seamless data ingestion and management. Leverage advanced features like concurrency scaling and Redshift Spectrum for enhanced analytics. Apply best practices for security and cost optimization in Redshift projects. Gain expertise in scaling data warehouse solutions to accommodate large-scale analytics needs. Author(s) Shruti Worlikar, None Arumugam, and None Patel are seasoned experts in data warehousing and analytics with extensive experience using Amazon Redshift. Their backgrounds in implementing scalable data solutions make their insights practical and grounded. Through their collaborative writing, they aim to make complex topics approachable to learners of various skill levels. Who is it for? This book is tailored for professionals such as data warehouse developers, data engineers, and data analysts looking to master Amazon Redshift. It suits intermediate to advanced practitioners with a basic understanding of data warehousing and cloud technologies. Readers seeking to optimize Redshift for cost, performance, and security will find this guide invaluable.

97 Things Every Data Engineer Should Know

2021-06-14 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Tobias Macey

AI/ML Data Engineering ETL/ELT Modern Data Stack Microsoft Cyber Security Stitch data data-engineering

Take advantage of today's sky-high demand for data engineers. With this in-depth book, current and aspiring engineers will learn powerful real-world best practices for managing data big and small. Contributors from notable companies including Twitter, Google, Stitch Fix, Microsoft, Capital One, and LinkedIn share their experiences and lessons learned for overcoming a variety of specific and often nagging challenges. Edited by Tobias Macey, host of the popular Data Engineering Podcast, this book presents 97 concise and useful tips for cleaning, prepping, wrangling, storing, processing, and ingesting data. Data engineers, data architects, data team managers, data scientists, machine learning engineers, and software engineers will greatly benefit from the wisdom and experience of their peers. Topics include: The Importance of Data Lineage - Julien Le Dem Data Security for Data Engineers - Katharine Jarmul The Two Types of Data Engineering and Data Engineers - Jesse Anderson Six Dimensions for Picking an Analytical Data Warehouse - Gleb Mezhanskiy The End of ETL as We Know It - Paul Singman Building a Career as a Data Engineer - Vijay Kiran Modern Metadata for the Modern Data Stack - Prukalpa Sankar Your Data Tests Failed! Now What? - Sam Bail

Azure Data Factory by Example: Practical Implementation for Data Engineers

2021-06-09 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Richard Swinbank

Azure ADF Cloud Computing ETL/ELT Microsoft SQL SSIS data data-engineering microsoft-sql-server relational-databases

Data engineers who need to hit the ground running will use this book to build skills in Azure Data Factory v2 (ADF). The tutorial-first approach to ADF taken in this book gets you working from the first chapter, explaining key ideas naturally as you encounter them. From creating your first data factory to building complex, metadata-driven nested pipelines, the book guides you through essential concepts in Microsoft’s cloud-based ETL/ELT platform. It introduces components indispensable for the movement and transformation of data in the cloud. Then it demonstrates the tools necessary to orchestrate, monitor, and manage those components. The hands-on introduction to ADF found in this book is equally well-suited to data engineers embracing their first ETL/ELT toolset as it is to seasoned veterans of Microsoft’s SQL Server Integration Services (SSIS). The example-driven approach leads you through ADF pipeline construction from the ground up, introducing important ideas and making learning natural and engaging. SSIS users will find concepts with familiar parallels, while ADF-first readers will quickly master those concepts through the book’s steady building up of knowledge in successive chapters. Summaries of key concepts at the end of each chapter provide a ready reference that you can return to again and again. What You Will Learn Create pipelines, activities, datasets, and linked services Build reusable components using variables, parameters, and expressions Move data into and around Azure services automatically Transform data natively using ADF data flows and Power Query data wrangling Master flow-of-control and triggers for tightly orchestrated pipeline execution Publish and monitor pipelines easily and with confidence Who This Book Is For Data engineers and ETL developers taking their first steps in Azure Data Factory, SQL Server Integration Services users making the transition toward doing ETL in Microsoft’s Azure cloud, and SQL Server database administrators involved in data warehousing and ETL operations

Automating the Modern Data Warehouse

2021-03-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Steve Swoyer

AI/ML Cloud Computing Data Governance Data Management data data-engineering data-warehouse storage-repositories

The opportunity to modernize and improve the enterprise data warehouse is one of the best reasons for moving your application to the cloud. A data warehouse can access a greater diversity of use cases and practices than is possible in an existing environment. In this report, researcher and analyst Stephen Swoyer offers a comprehensive overview of the benefits and challenges of implementing a cloud-based data warehouse. Senior IT decision makers, chief data officers, and data professionals will learn about the shifts and new trends in the data management landscape. Explore ways to improve data management, build a data warehouse strategy, and learn how to modernize a data warehouse effectively. Understand how AI, machine learning, self-service data integration, and built-in developer-oriented services have transformed the data warehouse role Use data warehouses to work with cloud-based data lakes for end-to-end data management and data governance Explore how data warehouse platforms as a service (PaaS) pave the way to automation Migrate, manage, and secure a data warehouse in a hybrid or multicloud environment

Snowflake Cookbook

2021-02-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Hamid Mahmood Qureshi , Hammad Sharif

Analytics Cloud Computing Snowflake Spark SQL data data-engineering

The "Snowflake Cookbook" is your guide to mastering Snowflake's unique cloud-centric architecture. This book provides detailed recipes for building modern data pipelines, configuring efficient virtual warehouses, ensuring robust data protection, and optimizing cost-performance-all while leveraging Snowflake's distinctive features such as data sharing and time travel. What this Book will help me do Set up and configure Snowflake's architecture for optimized performance and cost efficiency. Design and implement robust data pipelines using SQL and Snowflake's specialized features. Secure, manage, and share data efficiently with built-in Snowflake capabilities. Apply performance tuning techniques to enhance your Snowflake implementations. Extend Snowflake's functionality with tools like Spark Connector for advanced workflows. Author(s) Hamid Mahmood Qureshi and Hammad Sharif are both seasoned experts in data warehousing and cloud computing technologies. With extensive experience implementing analytics solutions, they bring a hands-on approach to teaching Snowflake. They are ardent proponents of empowering readers towards creating effective and scalable data solutions. Who is it for? This book is perfect for data warehouse developers, data analysts, cloud architects, and anyone managing cloud data solutions. If you're familiar with basic database concepts or just stepping into Snowflake, you'll find practical guidance here to deepen your understanding and functional expertise in cloud data warehousing.

Data Lake Analytics on Microsoft Azure: A Practitioner's Guide to Big Data Engineering

2020-10-08 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Pankaj Khattar , Harsh Chawla

AI/ML Analytics Azure Big Data Cosmos Data Analytics Data Engineering Data Lake Data Science Databricks Hadoop IoT +7 more

Get a 360-degree view of how the journey of data analytics solutions has evolved from monolithic data stores and enterprise data warehouses to data lakes and modern data warehouses. You will This book includes comprehensive coverage of how: To architect data lake analytics solutions by choosing suitable technologies available on Microsoft Azure The advent of microservices applications covering ecommerce or modern solutions built on IoT and how real-time streaming data has completely disrupted this ecosystem These data analytics solutions have been transformed from solely understanding the trends from historical data to building predictions by infusing machine learning technologies into the solutions Data platform professionals who have been working on relational data stores, non-relational data stores, and big data technologies will find the content in this book useful. The book also can help you start your journey into the data engineer world as it provides an overview of advanced data analytics and touches on data science concepts and various artificial intelligence and machine learning technologies available on Microsoft Azure. What Will You Learn You will understand the: Concepts of data lake analytics, the modern data warehouse, and advanced data analytics Architecture patterns of the modern data warehouse and advanced data analytics solutions Phases—such as Data Ingestion, Store, Prep and Train, and Model and Serve—of data analytics solutions and technology choices available on Azure under each phase In-depth coverage of real-time and batch mode data analytics solutions architecture Various managed services available on Azure such as Synapse analytics, event hubs, Stream analytics, CosmosDB, and managed Hadoop services such as Databricks and HDInsight Who This Book Is For Data platform professionals, database architects, engineers, and solution architects

BigQuery for Data Warehousing: Managed Data Analysis in the Google Cloud

2020-09-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Mark Mucchetti

AI/ML BigQuery Cloud Computing Data Science GCP data data-engineering google-bigquery

Create a data warehouse, complete with reporting and dashboards using Google’s BigQuery technology. This book takes you from the basic concepts of data warehousing through the design, build, load, and maintenance phases. You will build capabilities to capture data from the operational environment, and then mine and analyze that data for insight into making your business more successful. You will gain practical knowledge about how to use BigQuery to solve data challenges in your organization. BigQuery is a managed cloud platform from Google that provides enterprise data warehousing and reporting capabilities. Part I of this book shows you how to design and provision a data warehouse in the BigQuery platform. Part II teaches you how to load and stream your operational data into the warehouse to make it ready for analysis and reporting. Parts III and IV cover querying and maintaining, helping you keep your information relevant with other Google Cloud Platform services and advanced BigQuery. Part V takes reporting to the next level by showing you how to create dashboards to provide at-a-glance visual representations of your business situation. Part VI provides an introduction to data science with BigQuery, covering machine learning and Jupyter notebooks. What You Will Learn Design a data warehouse for your project or organization Load data from a variety of external and internal sources Integrate other Google Cloud Platform services for more complex workflows Maintain and scale your data warehouse as your organization grows Analyze, report, and create dashboards on the information in the warehouse Become familiar with machine learning techniques using BigQuery ML Who This Book Is For Developers who want to provide business users with fast, reliable, and insightful analysis from operational data, and data analysts interested in a cloud-based solution that avoids the pain of provisioning their own servers.

Data Management at Scale

2020-07-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Piethein Strengholt

Analytics Data Governance Data Management Master Data Management Cyber Security data data-engineering data-warehouse storage-repositories

As data management and integration continue to evolve rapidly, storing all your data in one place, such as a data warehouse, is no longer scalable. In the very near future, data will need to be distributed and available for several technological solutions. With this practical book, you’ll learnhow to migrate your enterprise from a complex and tightly coupled data landscape to a more flexible architecture ready for the modern world of data consumption. Executives, data architects, analytics teams, and compliance and governance staff will learn how to build a modern scalable data landscape using the Scaled Architecture, which you can introduce incrementally without a large upfront investment. Author Piethein Strengholt provides blueprints, principles, observations, best practices, and patterns to get you up to speed. Examine data management trends, including technological developments, regulatory requirements, and privacy concerns Go deep into the Scaled Architecture and learn how the pieces fit together Explore data governance and data security, master data management, self-service data marketplaces, and the importance of metadata

What Is Data Engineering?

2019-12-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Lewis Gavin

Data Engineering data data-engineering

The demand for data scientists is well-known, but when it comes time to build solutions based on data, your company also needs data engineers—people with strong data warehousing and programming backgrounds. In fact, whether you’re powering self-driving cars or creating music playlists, this field has emerged as one of the most important in modern business. In this report, Lewis Gavin explores key aspects of data engineering and presents a case study from Spotify that demonstrates the tremendous value of this role.

Jumpstart Snowflake: A Step-by-Step Guide to Modern Cloud Analytics

2019-12-20 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Donna Strok , Dmitry Shirokov , Dmitry Anoshin

Analytics AWS Azure BI Cloud Computing Data Analytics Databricks ETL/ELT GCP Matillion Microsoft Cyber Security +4 more

Explore the modern market of data analytics platforms and the benefits of using Snowflake computing, the data warehouse built for the cloud. With the rise of cloud technologies, organizations prefer to deploy their analytics using cloud providers such as Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform. Cloud vendors are offering modern data platforms for building cloud analytics solutions to collect data and consolidate into single storage solutions that provide insights for business users. The core of any analytics framework is the data warehouse, and previously customers did not have many choices of platform to use. Snowflake was built specifically for the cloud and it is a true game changer for the analytics market. This book will help onboard you to Snowflake, present best practices to deploy, and use the Snowflake data warehouse. In addition, it covers modern analytics architecture and use cases. It provides use cases of integration with leading analytics software such as Matillion ETL, Tableau, and Databricks. Finally, it covers migration scenarios for on-premise legacy data warehouses. What You Will Learn Know the key functionalities of Snowflake Set up security and access with cluster Bulk load data into Snowflake using the COPY command Migrate from a legacy data warehouse to Snowflake integrate the Snowflake data platform with modern business intelligence (BI) and data integration tools Who This Book Is For Those working with data warehouse and business intelligence (BI) technologies, and existing and potential Snowflake users

Building Big Data Applications

2019-11-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Krish Krishnan

Big Data data data-engineering data-warehouse storage-repositories

Building Big Data Applications helps data managers and their organizations make the most of unstructured data with an existing data warehouse. It provides readers with what they need to know to make sense of how Big Data fits into the world of Data Warehousing. Readers will learn about infrastructure options and integration and come away with a solid understanding on how to leverage various architectures for integration. The book includes a wide range of use cases that will help data managers visualize reference architectures in the context of specific industries (healthcare, big oil, transportation, software, etc.). Explores various ways to leverage Big Data by effectively integrating it into the data warehouse Includes real-world case studies which clearly demonstrate Big Data technologies Provides insights on how to optimize current data warehouse infrastructure and integrate newer infrastructure matching data processing workloads and requirements

Google BigQuery: The Definitive Guide

2019-10-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jordan Tigani (MotherDuck) , Valliappa Lakshmanan

Agile/Scrum BigQuery Cloud Computing GCP data data-engineering google-bigquery

Work with petabyte-scale datasets while building a collaborative, agile workplace in the process. This practical book is the canonical reference to Google BigQuery, the query engine that lets you conduct interactive analysis of large datasets. BigQuery enables enterprises to efficiently store, query, ingest, and learn from their data in a convenient framework. With this book, you’ll examine how to analyze data at scale to derive insights from large datasets efficiently. Valliappa Lakshmanan, tech lead for Google Cloud Platform, and Jordan Tigani, engineering director for the BigQuery team, provide best practices for modern data warehousing within an autoscaled, serverless public cloud. Whether you want to explore parts of BigQuery you’re not familiar with or prefer to focus on specific tasks, this reference is indispensable.

Mastering SQL Server 2017

2019-08-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Christian Cote , Milos Radivojevic , William Durkin , Dejan Sarka , Matija Lah

AI/ML Azure BI Docker ETL/ELT JSON Linux Microsoft Python SQL SQL Server SSIS +4 more

Leverage the power of SQL Server 2017 Integration Services to build data integration solutions with ease Key Features Work with temporal tables to access information stored in a table at any time Get familiar with the latest features in SQL Server 2017 Integration Services Program and extend your packages to enhance their functionality Book Description Microsoft SQL Server 2017 uses the power of R and Python for machine learning and containerization-based deployment on Windows and Linux. By learning how to use the features of SQL Server 2017 effectively, you can build scalable apps and easily perform data integration and transformation. You'll start by brushing up on the features of SQL Server 2017. This Learning Path will then demonstrate how you can use Query Store, columnstore indexes, and In-Memory OLTP in your apps. You'll also learn to integrate Python code in SQL Server and graph database implementations for development and testing. Next, you'll get up to speed with designing and building SQL Server Integration Services (SSIS) data warehouse packages using SQL server data tools. Toward the concluding chapters, you'll discover how to develop SSIS packages designed to maintain a data warehouse using the data flow and other control flow tasks. By the end of this Learning Path, you'll be equipped with the skills you need to design efficient, high-performance database applications with confidence. This Learning Path includes content from the following Packt books: SQL Server 2017 Developer's Guide by Milos Radivojevic, Dejan Sarka, et. al SQL Server 2017 Integration Services Cookbook by Christian Cote, Dejan Sarka, et. al What you will learn Use columnstore indexes to make storage and performance improvements Extend database design solutions using temporal tables Exchange JSON data between applications and SQL Server Migrate historical data to Microsoft Azure by using Stretch Database Design the architecture of a modern Extract, Transform, and Load (ETL) solution Implement ETL solutions using Integration Services for both on-premise and Azure data Who this book is for This Learning Path is for database developers and solution architects looking to develop ETL solutions with SSIS, and explore the new features in SSIS 2017. Advanced analysis practitioners, business intelligence developers, and database consultants dealing with performance tuning will also find this book useful. Basic understanding of database concepts and T-SQL is required to get the best out of this Learning Path.

Data Warehousing with Greenplum, 2nd Edition

2019-07-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Marshall Presser

Analytics Data Analytics RDBMS Cyber Security SQL data data-engineering data-warehouse storage-repositories

Data professionals are confronting the most disruptive change since relational databases appeared in the 1980s. SQL is still a major tool for data analytics, but conventional relational database management systems can’t handle the increasing size and complexity of today’s datasets. This updated edition teaches you best practices for Greenplum Database, the open source massively parallel processing (MPP) database that accommodates large sets of nonrelational and relational data. Marshall Presser, field CTO at Pivotal, introduces Greenplum’s approach to data analytics and data-driven decisions, beginning with its shared-nothing architecture. IT managers, developers, data analysts, system architects, and data scientists will all gain from exploring data organization and storage, data loading, running queries, and learning to perform analytics in the database. Discover how MPP and Greenplum will help you go beyond the traditional data warehouse. This ebook covers: Greenplum features, use case examples, and techniques for optimizing use Four Greenplum deployment options to help you balance security, cost, and time to usability Why each networked node in Greenplum’s architecture includes an independent operating system, memory, and storage Additional tools for monitoring, managing, securing, and optimizing query responses in the Pivotal Greenplum commercial database

Data Architecture: A Primer for the Data Scientist, 2nd Edition

2019-04-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Mary Levins , Daniel Linstedt , W. H. Inmon

Analytics Big Data Data Science data data-engineering

Over the past 5 years, the concept of big data has matured, data science has grown exponentially, and data architecture has become a standard part of organizational decision-making. Throughout all this change, the basic principles that shape the architecture of data have remained the same. There remains a need for people to take a look at the "bigger picture" and to understand where their data fit into the grand scheme of things. Data Architecture: A Primer for the Data Scientist, Second Edition addresses the larger architectural picture of how big data fits within the existing information infrastructure or data warehousing systems. This is an essential topic not only for data scientists, analysts, and managers but also for researchers and engineers who increasingly need to deal with large and complex sets of data. Until data are gathered and can be placed into an existing framework or architecture, they cannot be used to their full potential. Drawing upon years of practical experience and using numerous examples and case studies from across various industries, the authors seek to explain this larger picture into which big data fits, giving data scientists the necessary context for how pieces of the puzzle should fit together. New case studies include expanded coverage of textual management and analytics New chapters on visualization and big data Discussion of new visualizations of the end-state architecture

The Enterprise Big Data Lake

2019-03-11 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Alex Gorelik

Big Data Data Lake Data Science data data-engineering data-lake storage-repositories

The data lake is a daring new approach for harnessing the power of big data technology and providing convenient self-service capabilities. But is it right for your company? This book is based on discussions with practitioners and executives from more than a hundred organizations, ranging from data-driven companies such as Google, LinkedIn, and Facebook, to governments and traditional corporate enterprises. You’ll learn what a data lake is, why enterprises need one, and how to build one successfully with the best practices in this book. Alex Gorelik, CTO and founder of Waterline Data, explains why old systems and processes can no longer support data needs in the enterprise. Then, in a collection of essays about data lake implementation, you’ll examine data lake initiatives, analytic projects, experiences, and best practices from data experts working in various industries. Get a succinct introduction to data warehousing, big data, and data science Learn various paths enterprises take to build a data lake Explore how to build a self-service model and best practices for providing analysts access to the data Use different methods for architecting your data lake Discover ways to implement a data lake from experts in different industries

Learning PostgreSQL 11 - Third Edition

2019-01-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Christopher Travers , Andrey Volkov

Data Modelling Python data data-engineering postgresql relational-databases

Immerse yourself in the capabilities of PostgreSQL 11 with this comprehensive beginner's guide. Learning PostgreSQL 11 will take you through relational database fundamentals and advanced database functionality, empowering you to build efficient and scalable database solutions with confidence. By the end of this book, you'll have mastery over PostgreSQL's features to develop, manage, and optimize your own databases. What this Book will help me do Gain a solid understanding of relational database principles and the PostgreSQL ecosystem. Learn to install PostgreSQL, create a database, and design a data model effectively. Develop skills to create, manipulate, and optimize tables, views, and efficient indexes. Utilize server-side programming with PL/pgSQL and advanced data types like JSONB. Enhance database reliability and performance, and connect to your Python applications seamlessly. Author(s) Christopher Travers and None Volkov bring their collective expertise and practical experience to this book. Christopher has a strong background in software development and database systems, with years of hands-on involvement with PostgreSQL. None has contributed significantly to innovative database solutions, emphasizing clear and actionable instructions. Together, they aim to demystify PostgreSQL for learners of all backgrounds. Who is it for? This book is crafted for developers, database administrators, and tech enthusiasts who want to delve into PostgreSQL. Beginners with no prior database experience will find its approach accessible, while those aiming to enhance their skills with PostgreSQL's latest features will benefit immensely. It's ideal for anyone seeking to build solid database or data warehousing applications with modern capabilities and best practices.

Streaming Change Data Capture

2018-06-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Kevin Petrie (Eckerson Group) , Dan Potter , Itamar Ankorion

Analytics Cloud Computing Data Lake Hadoop IoT Marketing Data Streaming data data-engineering data-lake storage-repositories

There are many benefits to becoming a data-driven organization, including the ability to accelerate and improve business decision accuracy through the real-time processing of transactions, social media streams, and IoT data. But those benefits require significant changes to your infrastructure. You need flexible architectures that can copy data to analytics platforms at near-zero latency while maintaining 100% production uptime. Fortunately, a solution already exists. This ebook demonstrates how change data capture (CDC) can meet the scalability, efficiency, real-time, and zero-impact requirements of modern data architectures. Kevin Petrie, Itamar Ankorion, and Dan Potter—technology marketing leaders at Attunity—explain how CDC enables faster and more accurate decisions based on current data and reduces or eliminates full reloads that disrupt production and efficiency. The book examines: How CDC evolved from a niche feature of database replication software to a critical data architecture building block Architectures where data workflow and analysis take place, and their integration points with CDC How CDC identifies and captures source data updates to assist high-speed replication to one or more targets Case studies on cloud-based streaming and streaming to a data lake and related architectures Guiding principles for effectively implementing CDC in cloud, data lake, and streaming environments The Attunity Replicate platform for efficiently loading data across all major database, data warehouse, cloud, streaming, and Hadoop platforms

Next-Generation Big Data: A Practical Guide to Apache Kudu, Impala, and Spark

2018-06-12 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Butch Quinto

Alteryx Analytics BI Big Data Cloud Computing Data Governance DataViz Apache HBase HDFS Kafka MySQL Oracle +7 more

Utilize this practical and easy-to-follow guide to modernize traditional enterprise data warehouse and business intelligence environments with next-generation big data technologies. Next-Generation Big Data takes a holistic approach, covering the most important aspects of modern enterprise big data. The book covers not only the main technology stack but also the next-generation tools and applications used for big data warehousing, data warehouse optimization, real-time and batch data ingestion and processing, real-time data visualization, big data governance, data wrangling, big data cloud deployments, and distributed in-memory big data computing. Finally, the book has an extensive and detailed coverage of big data case studies from Navistar, Cerner, British Telecom, Shopzilla, Thomson Reuters, and Mastercard. What You’ll Learn Install Apache Kudu, Impala, and Spark to modernize enterprise data warehouse and business intelligence environments, complete with real-world, easy-to-follow examples, and practical advice Integrate HBase, Solr, Oracle, SQL Server, MySQL, Flume, Kafka, HDFS, and Amazon S3 with Apache Kudu, Impala, and Spark Use StreamSets, Talend, Pentaho, and CDAP for real-time and batch data ingestion and processing Utilize Trifacta, Alteryx, and Datameer for data wrangling and interactive data processing Turbocharge Spark with Alluxio, a distributed in-memory storage platform Deploy big data in the cloud using Cloudera Director Perform real-time data visualization and time series analysis using Zoomdata, Apache Kudu, Impala, and Spark Understand enterprise big data topics such as big data governance, metadata management, data lineage, impact analysis, and policy enforcement, and how to use Cloudera Navigator to perform common data governance tasks Implement big data use cases such as big data warehousing, data warehouse optimization, Internet of Things, real-time data ingestion and analytics, complex event processing, and scalable predictive modeling Study real-world big data case studies from innovative companies, including Navistar, Cerner, British Telecom, Shopzilla, Thomson Reuters, and Mastercard Who This Book Is For BI and big data warehouse professionals interested in gaining practical and real-world insight into next-generation big data processing and analytics using Apache Kudu, Impala, and Spark; and those who want to learn more about other advanced enterprise topics

talk-data.com

Activity Trend

Top Events

Top Speakers

Snowflake Security: Securing Your Snowflake Data Cloud

Amazon Redshift Cookbook

97 Things Every Data Engineer Should Know

Azure Data Factory by Example: Practical Implementation for Data Engineers

Automating the Modern Data Warehouse

Snowflake Cookbook

Data Lake Analytics on Microsoft Azure: A Practitioner's Guide to Big Data Engineering

BigQuery for Data Warehousing: Managed Data Analysis in the Google Cloud

Data Management at Scale

What Is Data Engineering?

Jumpstart Snowflake: A Step-by-Step Guide to Modern Cloud Analytics

Building Big Data Applications

Google BigQuery: The Definitive Guide

Mastering SQL Server 2017

Data Warehousing with Greenplum, 2nd Edition

Data Architecture: A Primer for the Data Scientist, 2nd Edition

The Enterprise Big Data Lake

Learning PostgreSQL 11 - Third Edition

Streaming Change Data Capture

Next-Generation Big Data: A Practical Guide to Apache Kudu, Impala, and Spark