data-warehouse

Unlocking dbt: Design and Deploy Transformations in Your Cloud Data Warehouse

2025-09-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dustin Dorsey (Onix) , Cameron Cyr

CI/CD Cloud Computing dbt DWH Git Modern Data Stack Python SQL data data-engineering storage-repositories

Master the art of data transformation with the second edition of this trusted guide to dbt. Building on the foundation of the first edition, this updated volume offers a deeper, more comprehensive exploration of dbt’s capabilities—whether you're new to the tool or looking to sharpen your skills. It dives into the latest features and techniques, equipping you with the tools to create scalable, maintainable, and production-ready data transformation pipelines. Unlocking dbt, Second Edition introduces key advancements, including the semantic layer, which allows you to define and manage metrics at scale, and dbt Mesh, empowering organizations to orchestrate decentralized data workflows with confidence. You’ll also explore more advanced testing capabilities, expanded CI/CD and deployment strategies, and enhancements in documentation—such as the newly introduced dbt Catalog. As in the first edition, you’ll learn how to harness dbt’s power to transform raw data into actionable insights, while incorporating software engineering best practices like code reusability, version control, and automated testing. From configuring projects with the dbt Platform or open source dbt to mastering advanced transformations using SQL and Jinja, this book provides everything you need to tackle real-world challenges effectively. What You Will Learn Understand dbt and its role in the modern data stack Set up projects using both the cloud-hosted dbt Platform and open source project Connect dbt projects to cloud data warehouses Build scalable models in SQL and Python Configure development, testing, and production environments Capture reusable logic with Jinja macros Incorporate version control with your data transformation code Seamlessly connect your projects using dbt Mesh Build and manage a semantic layer using dbt Deploy dbt using CI/CD best practices Who This Book Is For Current and aspiring data professionals, including architects, developers, analysts, engineers, data scientists, and consultants who are beginning the journey of using dbt as part of their data pipeline’s transformation layer. Readers should have a foundational knowledge of writing basic SQL statements, development best practices, and working with data in an analytical context such as a data warehouse.

Express Learning - Data Warehousing and Data Mining, 1st Edition by Pearson

2024-05-16 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by ITL Education

DWH data data-engineering storage-repositories

Express Learning is a series of books designed as quick reference guides to important undergraduate courses. The organized and accessible format of these books allows students to learn important concepts in an easy-to-understand, question-and-answer format. These portable learning tools have been designed as one-stop references for students to understand and master the subjects by themselves.

Book Contents –

Chapter 1: Introduction to Data Warehouse Chapter 2: Building a Data Warehouse Chapter 3: Data Warehouse: Architecture Chapter 4: OLAP Technology Chapter 5: Introduction to Data Mining Chapter 6: Data Preprocessing Chapter 7: Mining Association Rules Chapter 8: Classification and Prediction Chapter 9: Cluster Analysis Chapter 10: Advanced Techniques of Data Mining and Its Applications Index

Architecting a Modern Data Warehouse for Large Enterprises: Build Multi-cloud Modern Distributed Data Warehouses with Azure and AWS

2023-12-27 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Abhishek Mishra , Anjani Kumar , Sanjeev Kumar (Tesa SE)

AWS Azure BI Big Data Cloud Computing Data Governance Data Lake Data Lakehouse Delta DWH Pandas Cyber Security +4 more

Design and architect new generation cloud-based data warehouses using Azure and AWS. This book provides an in-depth understanding of how to build modern cloud-native data warehouses, as well as their history and evolution. The book starts by covering foundational data warehouse concepts, and introduces modern features such as distributed processing, big data storage, data streaming, and processing data on the cloud. You will gain an understanding of the synergy, relevance, and usage data warehousing standard practices in the modern world of distributed data processing. The authors walk you through the essential concepts of Data Mesh, Data Lake, Lakehouse, and Delta Lake. And they demonstrate the services and offerings available on Azure and AWS that deal with data orchestration, data democratization, data governance, data security, and business intelligence. After completing this book, you will be ready to design and architect enterprise-grade, cloud-based modern data warehouses using industry best practices and guidelines. What You Will Learn Understand the core concepts underlying modern data warehouses Design and build cloud-native data warehousesGain a practical approach to architecting and building data warehouses on Azure and AWS Implement modern data warehousing components such as Data Mesh, Data Lake, Delta Lake, and Lakehouse Process data through pandas and evaluate your model’s performance using metrics such as F1-score, precision, and recall Apply deep learning to supervised, semi-supervised, and unsupervised anomaly detection tasks for tabular datasets and time series applications Who This Book Is For Experienced developers, cloud architects, and technology enthusiasts looking to build cloud-based modern data warehouses using Azure and AWS

Automating the Modern Data Warehouse

2021-03-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Steve Swoyer

AI/ML Cloud Computing Data Governance Data Management DWH data data-engineering storage-repositories

The opportunity to modernize and improve the enterprise data warehouse is one of the best reasons for moving your application to the cloud. A data warehouse can access a greater diversity of use cases and practices than is possible in an existing environment. In this report, researcher and analyst Stephen Swoyer offers a comprehensive overview of the benefits and challenges of implementing a cloud-based data warehouse. Senior IT decision makers, chief data officers, and data professionals will learn about the shifts and new trends in the data management landscape. Explore ways to improve data management, build a data warehouse strategy, and learn how to modernize a data warehouse effectively. Understand how AI, machine learning, self-service data integration, and built-in developer-oriented services have transformed the data warehouse role Use data warehouses to work with cloud-based data lakes for end-to-end data management and data governance Explore how data warehouse platforms as a service (PaaS) pave the way to automation Migrate, manage, and secure a data warehouse in a hybrid or multicloud environment

Data Management at Scale

2020-07-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Piethein Strengholt

Analytics Data Governance Data Management DWH Master Data Management Cyber Security data data-engineering storage-repositories

As data management and integration continue to evolve rapidly, storing all your data in one place, such as a data warehouse, is no longer scalable. In the very near future, data will need to be distributed and available for several technological solutions. With this practical book, you’ll learnhow to migrate your enterprise from a complex and tightly coupled data landscape to a more flexible architecture ready for the modern world of data consumption. Executives, data architects, analytics teams, and compliance and governance staff will learn how to build a modern scalable data landscape using the Scaled Architecture, which you can introduce incrementally without a large upfront investment. Author Piethein Strengholt provides blueprints, principles, observations, best practices, and patterns to get you up to speed. Examine data management trends, including technological developments, regulatory requirements, and privacy concerns Go deep into the Scaled Architecture and learn how the pieces fit together Explore data governance and data security, master data management, self-service data marketplaces, and the importance of metadata

Building Big Data Applications

2019-11-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Krish Krishnan

Big Data DWH data data-engineering storage-repositories

Building Big Data Applications helps data managers and their organizations make the most of unstructured data with an existing data warehouse. It provides readers with what they need to know to make sense of how Big Data fits into the world of Data Warehousing. Readers will learn about infrastructure options and integration and come away with a solid understanding on how to leverage various architectures for integration. The book includes a wide range of use cases that will help data managers visualize reference architectures in the context of specific industries (healthcare, big oil, transportation, software, etc.). Explores various ways to leverage Big Data by effectively integrating it into the data warehouse Includes real-world case studies which clearly demonstrate Big Data technologies Provides insights on how to optimize current data warehouse infrastructure and integrate newer infrastructure matching data processing workloads and requirements

Data Warehousing with Greenplum, 2nd Edition

2019-07-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Marshall Presser

Analytics Data Analytics DWH RDBMS Cyber Security SQL data data-engineering storage-repositories

Data professionals are confronting the most disruptive change since relational databases appeared in the 1980s. SQL is still a major tool for data analytics, but conventional relational database management systems can’t handle the increasing size and complexity of today’s datasets. This updated edition teaches you best practices for Greenplum Database, the open source massively parallel processing (MPP) database that accommodates large sets of nonrelational and relational data. Marshall Presser, field CTO at Pivotal, introduces Greenplum’s approach to data analytics and data-driven decisions, beginning with its shared-nothing architecture. IT managers, developers, data analysts, system architects, and data scientists will all gain from exploring data organization and storage, data loading, running queries, and learning to perform analytics in the database. Discover how MPP and Greenplum will help you go beyond the traditional data warehouse. This ebook covers: Greenplum features, use case examples, and techniques for optimizing use Four Greenplum deployment options to help you balance security, cost, and time to usability Why each networked node in Greenplum’s architecture includes an independent operating system, memory, and storage Additional tools for monitoring, managing, securing, and optimizing query responses in the Pivotal Greenplum commercial database

Hands-On Data Warehousing with Azure Data Factory

2018-05-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Christian Cote , Giuseppe Ciaburro , Michelle Gutzait

AI/ML Analytics Azure ADF BI Cloud Computing Data Engineering Data Lake Databricks DWH ETL/ELT Power BI +6 more

Dive into the world of ETL (Extract, Transform, Load) with 'Hands-On Data Warehousing with Azure Data Factory'. This book guides readers through the essential techniques for working with Azure Data Factory and SQL Server Integration Services to design, implement, and optimize ETL solutions for both on-premises and cloud data environments. What this Book will help me do Understand and utilize Azure Data Factory and SQL Server Integration Services to build ETL solutions. Design scalable and high-performance ETL architectures tailored to modern data problems. Integrate various Azure services, such as Azure Data Lake Analytics, Machine Learning, and Databricks Spark, into your workflows. Troubleshoot and optimize ETL pipelines and address common challenges in data processing. Create insightful Power BI dashboards to visualize and interact with data from your ETL workflows. Author(s) Authors None Cote, Michelle Gutzait, and Giuseppe Ciaburro bring a wealth of experience in data engineering and cloud technologies to this practical guide. Combining expertise in Azure ecosystem and hands-on Data Warehousing, they deliver actionable insights for working professionals. Who is it for? This book is crafted for software professionals working in data engineering, especially those specializing in ETL processes. Readers with a foundational knowledge of SQL Server and cloud infrastructures will benefit most. If you aspire to implement state-of-the-art ETL pipelines or enhance existing workflows with ADF and SSIS, this book is an ideal resource.

Handbook of Data Structures and Applications, 2nd Edition

2018-02-21 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Sartaj Sahni , Dinesh P. Mehta

Big Data data data-engineering storage-repositories

This book provides a comprehensive survey of data structures of various types. The second edition has been revised and updated with new chapters on Bloom Filters, Binary Decision Diagrams, Data Structures for Cheminformatics, and Data Structures for Big Data Stores.

Exam Ref 70-767 Implementing a SQL Data Warehouse

2017-11-13 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Raj Uchhana , Jose Chinchilla

BI Data Quality DWH ETL/ELT Modern Data Stack Microsoft SQL SSIS data data-engineering storage-repositories

Prepare for Microsoft Exam 70-767–and help demonstrate your real-world mastery of skills for managing data warehouses. This exam is intended for Extract, Transform, Load (ETL) data warehouse developers who create business intelligence (BI) solutions. Their responsibilities include data cleansing as well as ETL and data warehouse implementation. The reader should have experience installing and implementing a Master Data Services (MDS) model, using MDS tools, and creating a Master Data Manager database and web application. The reader should understand how to design and implement ETL control flow elements and work with a SQL Service Integration Services package. Focus on the expertise measured by these objectives: • Design, and implement, and maintain a data warehouse • Extract, transform, and load data • Build data quality solutionsThis Microsoft Exam Ref: • Organizes its coverage by exam objectives • Features strategic, what-if scenarios to challenge you • Assumes you have working knowledge of relational database technology and incremental database extraction, as well as experience with designing ETL control flows, using and debugging SSIS packages, accessing and importing or exporting data from multiple sources, and managing a SQL data warehouse. Implementing a SQL Data Warehouse About the Exam Exam 70-767 focuses on skills and knowledge required for working with relational database technology. About Microsoft Certification Passing this exam earns you credit toward a Microsoft Certified Professional (MCP) or Microsoft Certified Solutions Associate (MCSA) certification that demonstrates your mastery of data warehouse management Passing this exam as well as Exam 70-768 (Developing SQL Data Models) earns you credit toward a Microsoft Certified Solutions Associate (MCSA) SQL 2016 Business Intelligence (BI) Development certification. See full details at: microsoft.com/learning

Data Warehousing in the Age of Artificial Intelligence

2017-10-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Eric Boutin , Mike Boyarski , Gary Orenstein , Conor Doherty

AI/ML Analytics BI Cloud Computing DWH data data-engineering storage-repositories

Nearly 7,000 new mobile applications appear every day, and a constant stream of data gives them life. Many organizations rely on a predictive analytics model to turn data into useful business information and ensure the predictions remain accurate as data changes. It can be a complex, time-consuming process. This book shows how to automate and accelerate that process using machine learning (ML) on a modern data warehouse that runs on any cloud. Product specialists from MemSQL explain how today’s modern data warehouses provide the foundations to implement ML algorithms that run efficiently. Through several real-time use cases, you’ll learn how to quickly identify the right metrics to make actionable business decisions. This book explores foundational ML and artificial intelligence concepts to help you understand: How data warehouses accelerate deployment and simplify manageability How companies make a choice between cloud and on-premises deployments for building data processing applications Ways to build analytics and visualizations for business intelligence on historical data The technologies and architecture for building and deploying real-time data pipelines This book demonstrates specific models and examples for building supervised and unsupervised real-time ML applications, and gives practical advice on how to make the choice between building an ML pipeline or buying an existing solution. If you need to use data accurately and efficiently, a real-time data warehouse is a critical business tool.

Data Warehousing with Greenplum

2017-07-27 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Marshall Presser

Analytics BI Data Analytics Data Engineering DWH RDBMS Cyber Security SQL data data-engineering storage-repositories

Relational databases haven’t gone away, but they are evolving to integrate messy, disjointed unstructured data into a cleansed repository for analytics. With the execution of massively parallel processing (MPP), the latest generation of analytic data warehouses is helping organizations move beyond business intelligence to processing a variety of advanced analytic workloads. These MPP databases expose their power with the familiarity of SQL. This report introduces the Greenplum Database, recently released as an open source project by Pivotal Software. Lead author Marshall Presser of Pivotal Data Engineering takes you through the Greenplum approach to data analytics and data-driven decisions, beginning with Greenplum’s shared-nothing architecture. You’ll explore data organization and storage, data loading, running queries, as well as performing analytics in the database. You’ll learn: How each networked node in Greenplum’s architecture features an independent operating system, memory, and storage Four deployment options to help you balance security, cost, and time to usability Ways to organize data, including distribution, storage, partitioning, and loading How to use Apache MADlib for in-database analytics, and GPText to process and analyze free-form text Tools for monitoring, managing, securing, and optimizing query responses available in the Pivotal Greenplum commercial database

World-Class Warehousing and Material Handling, 2nd Edition

2016-03-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Edward Frazelle

data data-engineering storage-repositories

The classic guide to warehouse operations—now fully revised and updated with the latest strategies, best practices, and case studies Under the influence of e-commerce, supply chain collaboration, globalization, and quick response, warehouses today are being asked to do more with less. The expectation now is that warehouses execute an increase in smaller transactions, handle and store more items, provide more product and service customization, process more returns, offer more value-added services, and receive and ship more international orders. Compounding the difficulty of meeting this increased demand is the fact that warehouses now have less time to process an order, less margin for error and fewer skilled personnel. How can a warehouse not only stay afloat but thrive in today’s marketplace? Efficiency and accuracy are the keys to success in warehousing. Despite today's just-in-time production mentality and efforts to eliminate warehouses and their inventory carrying costs, effective warehousing continues to play a critical bottom-line role for companies worldwide. World-Class Warehousing and Material Handling, 2 nd Edition is the first widely published methodology for warehouse problem solving across all areas of the supply chain, providing an organized set of principles that can be used to streamline all types of warehousing operations. Readers will discover state-of-the-art tools, metrics, and methodologies for dramatically increasing the effectiveness, accuracy, and overall productivity of warehousing operations. This comprehensive resource provides authoritative answers on such topics as: The seven principles of world-class warehousing · Warehouse activity profiling · Warehouse performance measures · Warehouse automation and computerization · Receiving, storage and retrieval operations · Picking and packing, and humanizing warehouse operations · Written by one of today's recognized logistics thought leaders, this fully updated comprehensive resource presents timeless insights for planning and managing 21st-century warehouse operations. About the Author Dr. Ed Frazelle is President and CEO of Logistics Resources International and Executive Director of The RightChain Institute. He is also the founding director of The Logistics Institute at Georgia Tech, the world's largest center for supply chain research and professional education.

Agile Data Warehousing for the Enterprise

2015-09-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ralph Hughes

Agile/Scrum BI CI/CD Data Engineering DWH data data-engineering storage-repositories

Building upon his earlier book that detailed agile data warehousing programming techniques for the Scrum master, Ralph's latest work illustrates the agile interpretations of the remaining software engineering disciplines: Requirements management benefits from streamlined templates that not only define projects quickly, but ensure nothing essential is overlooked. Data engineering receives two new "hyper modeling" techniques, yielding data warehouses that can be easily adapted when requirements change without having to invest in ruinously expensive data-conversion programs. Quality assurance advances with not only a stereoscopic top-down and bottom-up planning method, but also the incorporation of the latest in automated test engines. Use this step-by-step guide to deepen your own application development skills through self-study, show your teammates the world's fastest and most reliable techniques for creating business intelligence systems, or ensure that the IT department working for you is building your next decision support system the right way. Learn how to quickly define scope and architecture before programming starts Includes techniques of process and data engineering that enable iterative and incremental delivery Demonstrates how to plan and execute quality assurance plans and includes a guide to continuous integration and automated regression testing Presents program management strategies for coordinating multiple agile data mart projects so that over time an enterprise data warehouse emerges Use the provided 120-day road map to establish a robust, agile data warehousing program

Building a Scalable Data Warehouse with Data Vault 2.0

2015-09-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Michael Olschimke (Scalefree) , Daniel Linstedt

Agile/Scrum Data Quality Data Vault DWH Modern Data Stack SQL SSIS data data-engineering storage-repositories

The Data Vault was invented by Dan Linstedt at the U.S. Department of Defense, and the standard has been successfully applied to data warehousing projects at organizations of different sizes, from small to large-size corporations. Due to its simplified design, which is adapted from nature, the Data Vault 2.0 standard helps prevent typical data warehousing failures. "Building a Scalable Data Warehouse" covers everything one needs to know to create a scalable data warehouse end to end, including a presentation of the Data Vault modeling technique, which provides the foundations to create a technical data warehouse layer. The book discusses how to build the data warehouse incrementally using the agile Data Vault 2.0 methodology. In addition, readers will learn how to create the input layer (the stage layer) and the presentation layer (data mart) of the Data Vault 2.0 architecture including implementation best practices. Drawing upon years of practical experience and using numerous examples and an easy to understand framework, Dan Linstedt and Michael Olschimke discuss: How to load each layer using SQL Server Integration Services (SSIS), including automation of the Data Vault loading processes. Important data warehouse technologies and practices. Data Quality Services (DQS) and Master Data Services (MDS) in the context of the Data Vault architecture. Provides a complete introduction to data warehousing, applications, and the business context so readers can get-up and running fast Explains theoretical concepts and provides hands-on instruction on how to build and implement a data warehouse Demystifies data vault modeling with beginning, intermediate, and advanced techniques Discusses the advantages of the data vault approach over other techniques, also including the latest updates to Data Vault 2.0 and multiple improvements to Data Vault 1.0

Leveraging DB2 10 for High Performance of Your Data Warehouse

2014-01-08 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Enzo Cialini , Whei-Jen Chen , Bhuvana Balaji , Michael Kwok , Scott Andrus , Jessica Rockwood , Roman B. Melnyk

Analytics BI DWH IBM Linux SQL Unix data data-engineering storage-repositories

Building on the business intelligence (BI) framework and capabilities that are outlined in InfoSphere Warehouse: A Robust Infrastructure for Business Intelligence, SG24-7813, this IBM® Redbooks® publication focuses on the new business insight challenges that have arisen in the last few years and the new technologies in IBM DB2® 10 for Linux, UNIX, and Windows that provide powerful analytic capabilities to meet those challenges. This book is organized in to two parts. The first part provides an overview of data warehouse infrastructure and DB2 Warehouse, and outlines the planning and design process for building your data warehouse. The second part covers the major technologies that are available in DB2 10 for Linux, UNIX, and Windows. We focus on functions that help you get the most value and performance from your data warehouse. These technologies include database partitioning, intrapartition parallelism, compression, multidimensional clustering, range (table) partitioning, data movement utilities, database monitoring interfaces, infrastructures for high availability, DB2 workload management, data mining, and relational OLAP capabilities. A chapter on BLU Acceleration gives you all of the details about this exciting DB2 10.5 innovation that simplifies and speeds up reporting and analytics. Easy to set up and self-optimizing, BLU Acceleration eliminates the need for indexes, aggregates, or time-consuming database tuning to achieve top performance and storage efficiency. No SQL or schema changes are required to take advantage of this breakthrough technology. This book is primarily intended for use by IBM employees, IBM clients, and IBM Business Partners.

The Definitive Guide to Warehousing: Managing the Storage and Handling of Materials and Products in the Supply Chain

2013-12-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by CSCMP , Brian C. Keller , Scott B. Keller

data data-engineering storage-repositories

This is the most authoritative and complete guide to planning, implementing, measuring, and optimizing world-class supply chain warehousing processes. Straight from the Council of Supply Chain Management Professionals (CSCMP), it explains each warehousing option, basic warehousing storage and handling operations, strategic planning, and the effects of warehousing design and service decisions on total logistics costs and customer service. This reference introduces crucial concepts including product handling, labor management, warehouse support, and extended value chain processes, facility ownership, planning, and strategy decisions; materials handling; warehouse management systems; Auto-ID, AGVs, and much more. Step by step, The Definitive Guide to Warehousing helps you optimize all facets of warehousing, one of the most pivotal areas of supply chain management. Coverage includes: Basic warehousing management concepts and their essential role in demand fulfillment Key elements, processes, and interactions in warehousing operations management Principles and strategies for effectively planning and managing warehouse operations Principles and strategies for designing materials handling operations in warehousing facilities Critical roles of technology in managing warehouse operations and product flows Best practices for assessing the performance of warehousing operations using standard metrics and frameworks

The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition

2013-07-01 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Margy Ross , Ralph Kimball

Analytics BI Big Data Data Analytics DWH ETL/ELT dimensional modeling data data-engineering storage-repositories

Updated new edition of Ralph Kimball's groundbreaking book on dimensional modeling for data warehousing and business intelligence! The first edition of Ralph Kimball's The Data Warehouse Toolkit introduced the industry to dimensional modeling, and now his books are considered the most authoritative guides in this space. This new third edition is a complete library of updated dimensional modeling techniques, the most comprehensive collection ever. It covers new and enhanced star schema dimensional modeling patterns, adds two new chapters on ETL techniques, includes new and expanded business matrices for 12 case studies, and more. Authored by Ralph Kimball and Margy Ross, known worldwide as educators, consultants, and influential thought leaders in data warehousing and business intelligence Begins with fundamental design recommendations and progresses through increasingly complex scenarios Presents unique modeling techniques for business applications such as inventory management, procurement, invoicing, accounting, customer relationship management, big data analytics, and more Draws real-world case studies from a variety of industries, including retail sales, financial services, telecommunications, education, health care, insurance, e-commerce, and more Design dimensional databases that are easy to understand and provide fast query response with The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition.

Big Data Imperatives: Enterprise 'Big Data' Warehouse, 'BI' Implementations and Analytics

2013-06-26 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Madhu Jagadeesh , Harsha Srivatsa , Soumendra Mohanty

Analytics BI Big Data Data Analytics DWH Marketing data data-engineering storage-repositories

Big Data Imperatives, focuses on resolving the key questions on everyone's mind: Which data matters? Do you have enough data volume to justify the usage? How you want to process this amount of data? How long do you really need to keep it active for your analysis, marketing, and BI applications? Big data is emerging from the realm of one-off projects to mainstream business adoption; however, the real value of big data is not in the overwhelming size of it, but more in its effective use. This book addresses the following big data characteristics: Very large, distributed aggregations of loosely structured data - often incomplete and inaccessible Petabytes/Exabytes of data Millions/billions of people providing/contributing to the context behind the data Flat schema's with few complex interrelationships Involves time-stamped events Made up of incomplete data Includes connections between data elements that must be probabilistically inferred Big Data Imperatives explains 'what big data can do'. It can batch process millions and billions of records both unstructured and structured much faster and cheaper. Big data analytics provide a platform to merge all analysis which enables data analysis to be more accurate, well-rounded, reliable and focused on a specific business capability. Big Data Imperatives describes the complementary nature of traditional data warehouses and big-data analytics platforms and how they feed each other. This book aims to bring the big data and analytics realms together with a greater focus on architectures that leverage the scale and power of big data and the ability to integrate and apply analytics principles to data which earlier was not accessible. This book can also be used as a handbook for practitioners; helping them on methodology,technical architecture, analytics techniques and best practices. At the same time, this book intends to hold the interest of those new to big data and analytics by giving them a deep insight into the realm of big data. What you'll learn Understanding the technology, implementation of big data platforms and their usage for analytics Big data architectures Big data design patterns Implementation best practices Who this book is for This book is designed for IT professionals, data warehousing, business intelligence professionals, data analysis professionals, architects, developers and business users.

Data Warehousing in the Age of Big Data

2013-05-02 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Krish Krishnan

Big Data Data Governance DataViz DWH Hadoop Apache HBase Hive NoSQL data data-engineering storage-repositories

Data Warehousing in the Age of the Big Data will help you and your organization make the most of unstructured data with your existing data warehouse. As Big Data continues to revolutionize how we use data, it doesn't have to create more confusion. Expert author Krish Krishnan helps you make sense of how Big Data fits into the world of data warehousing in clear and concise detail. The book is presented in three distinct parts. Part 1 discusses Big Data, its technologies and use cases from early adopters. Part 2 addresses data warehousing, its shortcomings, and new architecture options, workloads, and integration techniques for Big Data and the data warehouse. Part 3 deals with data governance, data visualization, information life-cycle management, data scientists, and implementing a Big Data–ready data warehouse. Extensive appendixes include case studies from vendor implementations and a special segment on how we can build a healthcare information factory. Ultimately, this book will help you navigate through the complex layers of Big Data and data warehousing while providing you information on how to effectively think about using all these technologies and the architectures to design the next-generation data warehouse. Learn how to leverage Big Data by effectively integrating it into your data warehouse. Includes real-world examples and use cases that clearly demonstrate Hadoop, NoSQL, HBASE, Hive, and other Big Data technologies Understand how to optimize and tune your current data warehouse infrastructure and integrate newer infrastructure matching data processing workloads and requirements

talk-data.com

Activity Trend

Top Events

Top Speakers

Unlocking dbt: Design and Deploy Transformations in Your Cloud Data Warehouse

Express Learning - Data Warehousing and Data Mining, 1st Edition by Pearson

Architecting a Modern Data Warehouse for Large Enterprises: Build Multi-cloud Modern Distributed Data Warehouses with Azure and AWS

Automating the Modern Data Warehouse

Data Management at Scale

Building Big Data Applications

Data Warehousing with Greenplum, 2nd Edition

Hands-On Data Warehousing with Azure Data Factory

Handbook of Data Structures and Applications, 2nd Edition

Exam Ref 70-767 Implementing a SQL Data Warehouse

Data Warehousing in the Age of Artificial Intelligence

Data Warehousing with Greenplum

World-Class Warehousing and Material Handling, 2nd Edition

Agile Data Warehousing for the Enterprise

Building a Scalable Data Warehouse with Data Vault 2.0

Leveraging DB2 10 for High Performance of Your Data Warehouse

The Definitive Guide to Warehousing: Managing the Storage and Handling of Materials and Products in the Supply Chain

The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition

Big Data Imperatives: Enterprise 'Big Data' Warehouse, 'BI' Implementations and Analytics

Data Warehousing in the Age of Big Data