talk-data.com talk-data.com

Topic

Modern Data Stack

16

tagged

Activity Trend

28 peak/qtr
2020-Q1 2026-Q1

Activities

16 activities · Newest first

Unlocking dbt: Design and Deploy Transformations in Your Cloud Data Warehouse

Master the art of data transformation with the second edition of this trusted guide to dbt. Building on the foundation of the first edition, this updated volume offers a deeper, more comprehensive exploration of dbt’s capabilities—whether you're new to the tool or looking to sharpen your skills. It dives into the latest features and techniques, equipping you with the tools to create scalable, maintainable, and production-ready data transformation pipelines. Unlocking dbt, Second Edition introduces key advancements, including the semantic layer, which allows you to define and manage metrics at scale, and dbt Mesh, empowering organizations to orchestrate decentralized data workflows with confidence. You’ll also explore more advanced testing capabilities, expanded CI/CD and deployment strategies, and enhancements in documentation—such as the newly introduced dbt Catalog. As in the first edition, you’ll learn how to harness dbt’s power to transform raw data into actionable insights, while incorporating software engineering best practices like code reusability, version control, and automated testing. From configuring projects with the dbt Platform or open source dbt to mastering advanced transformations using SQL and Jinja, this book provides everything you need to tackle real-world challenges effectively. What You Will Learn Understand dbt and its role in the modern data stack Set up projects using both the cloud-hosted dbt Platform and open source project Connect dbt projects to cloud data warehouses Build scalable models in SQL and Python Configure development, testing, and production environments Capture reusable logic with Jinja macros Incorporate version control with your data transformation code Seamlessly connect your projects using dbt Mesh Build and manage a semantic layer using dbt Deploy dbt using CI/CD best practices Who This Book Is For Current and aspiring data professionals, including architects, developers, analysts, engineers, data scientists, and consultants who are beginning the journey of using dbt as part of their data pipeline’s transformation layer. Readers should have a foundational knowledge of writing basic SQL statements, development best practices, and working with data in an analytical context such as a data warehouse.

The Definitive Guide to Data Integration

Master the modern data stack with 'The Definitive Guide to Data Integration.' This comprehensive book covers the key aspects of data integration, including data sources, storage, transformation, governance, and more. Equip yourself with the knowledge and hands-on skills to manage complex datasets and unlock your data's full potential. What this Book will help me do Understand how to integrate diverse datasets efficiently using modern tools. Develop expertise in designing and implementing robust data integration workflows. Gain insights into real-time data processing and cloud-based data architectures. Learn best practices for data quality, governance, and compliance in integration. Master the use of APIs, workflows, and transformation patterns in practice. Author(s) The authors, None Bonnefoy, None Chaize, Raphaël Mansuy, and Mehdi Tazi, are seasoned experts in data engineering and integration. They bring years of experience in modern data technologies and consulting. Their approachable writing style ensures that readers at various skill levels can grasp complex concepts effectively. Who is it for? This book is ideal for data engineers, architects, analysts, and IT professionals. Whether you're new to data integration or looking to deepen your expertise, this guide caters to individuals seeking to navigate the challenges of the modern data stack.

Automating Data Transformations

The modern data stack has evolved rapidly in the past decade. Yet, as enterprises migrate vast amounts of data from on-premises platforms to the cloud, data teams continue to face limitations executing data transformation at scale. Data transformation is an integral part of the analytics workflow--but it's also the most time-consuming, expensive, and error-prone part of the process. In this report, Satish Jayanthi and Armon Petrossian examine key concepts that will enable you to automate data transformation at scale. IT decision makers, CTOs, and data team leaders will explore ways to democratize data transformation by shifting from activity-oriented to outcome-oriented teams--from manufacturing-line assembly to an approach that lets even junior analysts implement data with only a brief code review. With this insightful report, you will: Learn how successful data systems rely on simplicity, flexibility, user-friendliness, and a metadata-first approach Adopt a product-first mindset (data as a product, or DaaP) for developing data resources that focus on discoverability, understanding, trust, and exploration Build a transformation platform that delivers the most value, using a column-first approach Use data architecture as a service (DAaaS) to help teams build and maintain their own data infrastructure as they work collaboratively About the authors: Armon Petrossian is CEO and cofounder of Coalesce. Previously, he was part of the founding team at WhereScape in North America, where he served as national sales manager for almost a decade. Satish Jayanthi is CTO and cofounder of Coalesce. Prior to that, he was senior solutions architect at WhereScape, where he met his cofounder Armon.

Fabric Resiliency and Best Practices for IBM c-type Products

This IBM Redpaper publication describes best practices for deploying and using advanced Cisco NX-OS features to identify, monitor, and protect Fibre Channel (FC) Storage Area Networks (SANs) from problematic devices and media behavior. The paper focuses on the IBM c-type SAN switches with firmware Cisco MDS NX-OS Release 8.4(2a).

97 Things Every Data Engineer Should Know

Take advantage of today's sky-high demand for data engineers. With this in-depth book, current and aspiring engineers will learn powerful real-world best practices for managing data big and small. Contributors from notable companies including Twitter, Google, Stitch Fix, Microsoft, Capital One, and LinkedIn share their experiences and lessons learned for overcoming a variety of specific and often nagging challenges. Edited by Tobias Macey, host of the popular Data Engineering Podcast, this book presents 97 concise and useful tips for cleaning, prepping, wrangling, storing, processing, and ingesting data. Data engineers, data architects, data team managers, data scientists, machine learning engineers, and software engineers will greatly benefit from the wisdom and experience of their peers. Topics include: The Importance of Data Lineage - Julien Le Dem Data Security for Data Engineers - Katharine Jarmul The Two Types of Data Engineering and Data Engineers - Jesse Anderson Six Dimensions for Picking an Analytical Data Warehouse - Gleb Mezhanskiy The End of ETL as We Know It - Paul Singman Building a Career as a Data Engineer - Vijay Kiran Modern Metadata for the Modern Data Stack - Prukalpa Sankar Your Data Tests Failed! Now What? - Sam Bail

Data Pipelines Pocket Reference

Data pipelines are the foundation for success in data analytics. Moving data from numerous diverse sources and transforming it to provide context is the difference between having data and actually gaining value from it. This pocket reference defines data pipelines and explains how they work in today's modern data stack. You'll learn common considerations and key decision points when implementing pipelines, such as batch versus streaming data ingestion and build versus buy. This book addresses the most common decisions made by data professionals and discusses foundational concepts that apply to open source frameworks, commercial products, and homegrown solutions. You'll learn: What a data pipeline is and how it works How data is moved and processed on modern data infrastructure, including cloud platforms Common tools and products used by data engineers to build pipelines How pipelines support analytics and reporting needs Considerations for pipeline maintenance, testing, and alerting

Implementing VersaStack with Cisco ACI Multi-Pod and IBM HyperSwap for High Availability

The IBM HyperSwap® high availability (HA) function allows business continuity in a hardware failure, power failure, connectivity failure, or disasters, such as fire or flooding. It is available on the IBM SAN Volume Controller and IBM FlashSystem products. This IBM Redbooks publication covers the preferred practices for implementing Cisco VersaStack with IBM HyperSwap. The following are some of the topics covered in this book: Cisco Application Centric Infrastructure to showcase Cisco's ACI with Nexus 9Ks Cisco Fabric Interconnects and Unified Computing System (UCS) management capabilities Cisco Multilayer Director Switch (MDS) to showcase fabric channel connectivity Overall IBM HyperSwap solution architecture Differences between HyperSwap and Metro Mirroring, Volume Mirroring, and Stretch Cluster Multisite IBM SAN Volume Controller (SVC) deployment to showcase HyperSwap configuration and capabilities This book is intended for pre-sales and post-sales technical support professionals and storage administrators who are tasked with deploying a VersaStack solution with IBM HyperSwap.

Exam Ref 70-767 Implementing a SQL Data Warehouse

Prepare for Microsoft Exam 70-767–and help demonstrate your real-world mastery of skills for managing data warehouses. This exam is intended for Extract, Transform, Load (ETL) data warehouse developers who create business intelligence (BI) solutions. Their responsibilities include data cleansing as well as ETL and data warehouse implementation. The reader should have experience installing and implementing a Master Data Services (MDS) model, using MDS tools, and creating a Master Data Manager database and web application. The reader should understand how to design and implement ETL control flow elements and work with a SQL Service Integration Services package. Focus on the expertise measured by these objectives: • Design, and implement, and maintain a data warehouse • Extract, transform, and load data • Build data quality solutionsThis Microsoft Exam Ref: • Organizes its coverage by exam objectives • Features strategic, what-if scenarios to challenge you • Assumes you have working knowledge of relational database technology and incremental database extraction, as well as experience with designing ETL control flows, using and debugging SSIS packages, accessing and importing or exporting data from multiple sources, and managing a SQL data warehouse. Implementing a SQL Data Warehouse About the Exam Exam 70-767 focuses on skills and knowledge required for working with relational database technology. About Microsoft Certification Passing this exam earns you credit toward a Microsoft Certified Professional (MCP) or Microsoft Certified Solutions Associate (MCSA) certification that demonstrates your mastery of data warehouse management Passing this exam as well as Exam 70-768 (Developing SQL Data Models) earns you credit toward a Microsoft Certified Solutions Associate (MCSA) SQL 2016 Business Intelligence (BI) Development certification. See full details at: microsoft.com/learning

VersaStack Solution by Cisco and IBM with Oracle RAC, IBM FlashSystem V9000, and IBM Spectrum Protect

Dynamic organizations want to accelerate growth while reducing costs. To do so, they must speed the deployment of business applications and adapt quickly to any changes in priorities. Organizations today require an IT infrastructure that is easy, efficient, and versatile. The VersaStack solution by Cisco and IBM® can help you accelerate the deployment of your data centers. It reduces costs by more efficiently managing information and resources while maintaining your ability to adapt to business change. The VersaStack solution combines the innovation of Cisco UCS Integrated Infrastructure with the efficiency of the IBM Storwize® storage system. The Cisco UCS Integrated Infrastructure includes the Cisco Unified Computing System (Cisco UCS), Cisco Nexus and Cisco MDS switches, and Cisco UCS Director. The IBM FlashSystem® V9000 enhances virtual environments with its Data Virtualization, IBM Real-time Compression™, and IBM Easy Tier® features. These features deliver extraordinary levels of performance and efficiency. The VersaStack solution is Cisco Application Centric Infrastructure (ACI) ready. Your IT team can build, deploy, secure, and maintain applications through a more agile framework. Cisco Intercloud Fabric capabilities help enable the creation of open and highly secure solutions for the hybrid cloud. These solutions accelerate your IT transformation while delivering dramatic improvements in operational efficiency and simplicity. Cisco and IBM are global leaders in the IT industry. The VersaStack solution gives you the opportunity to take advantage of integrated infrastructure solutions that are targeted at enterprise applications, analytics, and cloud solutions. The VersaStack solution is backed by Cisco Validated Designs (CVD) to provide faster delivery of applications, greater IT efficiency, and less risk. This IBM Redbooks® publication is aimed at experienced storage administrators who are tasked with deploying a VersaStack solution with Oracle Real Application Clusters (RAC) and IBM Spectrum™ Protect.

VersaStack Solution by Cisco and IBM with IBM DB2, IBM Spectrum Control, and IBM Spectrum Protect

Dynamic organizations want to accelerate growth while reducing costs. To do so, they must speed the deployment of business applications and adapt quickly to any changes in priorities. Organizations require an IT infrastructure to be easy, efficient, and versatile. The VersaStack solution by Cisco and IBM® can help you accelerate the deployment of your datacenters. It reduces costs by more efficiently managing information and resources while maintaining your ability to adapt to business change. The VersaStack solution combines the innovation of Cisco Unified Computing System (Cisco UCS) Integrated Infrastructure with the efficiency of the IBM Storwize® storage system. The Cisco UCS Integrated Infrastructure includes the Cisco UCS, Cisco Nexus and Cisco MDS switches, and Cisco UCS Director. The IBM Storwize V7000 storage system enhances virtual environments with its Data Virtualization, IBM Real-time Compression™, and IBM Easy Tier® features. These features deliver extraordinary levels of performance and efficiency. The VersaStack solution is Cisco Application Centric Infrastructure (ACI) ready. Your IT team can build, deploy, secure, and maintain applications through a more agile framework. Cisco Intercloud Fabric capabilities help enable the creation of open and highly secure solutions for the hybrid cloud. These solutions accelerate your IT transformation while delivering dramatic improvements in operational efficiency and simplicity. Cisco and IBM are global leaders in the IT industry. The VersaStack solution gives you the opportunity to take advantage of integrated infrastructure solutions that are targeted at enterprise applications, analytics, and cloud solutions. The VersaStack solution is backed by Cisco Validated Designs (CVDs) to provide faster delivery of applications, greater IT efficiency, and less risk. This IBM Redbooks® publication is aimed at experienced storage administrators that are tasked with deploying a VersaStack solution with IBM DB2® High Availability (DB2 HA), IBM Spectrum™ Protect, and IBM Spectrum Control™.

VersaStack Solution by Cisco and IBM with SQL, Spectrum Control, and Spectrum Protect

Dynamic organizations want to accelerate growth while reducing costs. To do so, they must speed the deployment of business applications and adapt quickly to any changes in priorities. Organizations today require an IT infrastructure to be easy, efficient, and versatile. The VersaStack solution by Cisco and IBM® can help you accelerate the deployment of your data centers. It reduces costs by more efficiently managing information and resources while maintaining your ability to adapt to business change. The VersaStack solution combines the innovation of Cisco UCS Integrated Infrastructure with the efficiency of the IBM Storwize® storage system. The Cisco UCS Integrated Infrastructure includes the Cisco Unified Computing System (Cisco UCS), Cisco Nexus and Cisco MDS switches, and Cisco UCS Director. The IBM Storwize V7000 enhances virtual environments with its Data Virtualization, IBM Real-time Compression™, and IBM Easy Tier® features. These features deliver extraordinary levels of performance and efficiency. The VersaStack solution is Cisco Application Centric Infrastructure (ACI) ready. Your IT team can build, deploy, secure, and maintain applications through a more agile framework. Cisco Intercloud Fabric capabilities help enable the creation of open and highly secure solutions for the hybrid cloud. These solutions accelerate your IT transformation while delivering dramatic improvements in operational efficiency and simplicity. Cisco and IBM are global leaders in the IT industry. The VersaStack solution gives you the opportunity to take advantage of integrated infrastructure solutions that are targeted at enterprise applications, analytics, and cloud solutions. The VersaStack solution is backed by Cisco Validated Designs (CVD) to provide faster delivery of applications, greater IT efficiency, and less risk. This IBM Redbooks® publication is aimed at experienced storage administrators that are tasked with deploying a VersaStack solution with Microsoft Sequel (SQL), IBM Spectrum™ Protect, and IBM Spectrum Control™.

Building a Scalable Data Warehouse with Data Vault 2.0

The Data Vault was invented by Dan Linstedt at the U.S. Department of Defense, and the standard has been successfully applied to data warehousing projects at organizations of different sizes, from small to large-size corporations. Due to its simplified design, which is adapted from nature, the Data Vault 2.0 standard helps prevent typical data warehousing failures. "Building a Scalable Data Warehouse" covers everything one needs to know to create a scalable data warehouse end to end, including a presentation of the Data Vault modeling technique, which provides the foundations to create a technical data warehouse layer. The book discusses how to build the data warehouse incrementally using the agile Data Vault 2.0 methodology. In addition, readers will learn how to create the input layer (the stage layer) and the presentation layer (data mart) of the Data Vault 2.0 architecture including implementation best practices. Drawing upon years of practical experience and using numerous examples and an easy to understand framework, Dan Linstedt and Michael Olschimke discuss: How to load each layer using SQL Server Integration Services (SSIS), including automation of the Data Vault loading processes. Important data warehouse technologies and practices. Data Quality Services (DQS) and Master Data Services (MDS) in the context of the Data Vault architecture. Provides a complete introduction to data warehousing, applications, and the business context so readers can get-up and running fast Explains theoretical concepts and provides hands-on instruction on how to build and implement a data warehouse Demystifies data vault modeling with beginning, intermediate, and advanced techniques Discusses the advantages of the data vault approach over other techniques, also including the latest updates to Data Vault 2.0 and multiple improvements to Data Vault 1.0

Microsoft SQL Server 2014 Business Intelligence Development Beginner's Guide

Microsoft SQL Server 2014 Business Intelligence Development Beginner's Guide introduces you to Microsoft's BI tools and systems. You'll gain hands-on experience building solutions that handle data warehousing, reporting, and predictive analytics. With step-by-step tutorials, you'll be equipped to transform data into actionable insights. What this Book will help me do Understand and implement multidimensional data models using SSAS and MDX. Write and use DAX queries and leverage SSAS tabular models effectively. Improve and maintain data integrity using MDS and DQS tools. Design and develop polished, insightful dashboards and reports using PerformancePoint, Power View, and SSRS. Explore advanced data analysis features, such as Power Query, Power Map, and basic data mining techniques. Author(s) Abolfazl Radgoudarzi and Reza Rad are experienced practitioners and educators in the field of business intelligence. They specialize in SQL Server BI technologies and have extensive careers helping organizations harness data for decision-making. Their approach combines clear explanations with practical examples, ensuring readers can effectively apply what they learn. Who is it for? This book is ideal for database developers, system analysts, and IT professionals looking to build strong foundations in Microsoft SQL Server's BI technologies. Beginners in business intelligence or data management will find the topics accessible. Intermediate practitioners will expand their ability to build complete BI solutions. It's designed for anyone eager to develop skills in data modeling, analysis, and visualization.

Microsoft SQL Server 2012 Master Data Services 2/E, 2nd Edition

Deploy and Maintain an Integrated MDS Architecture Harness your master data and grow revenue while reducing administrative costs. Thoroughly revised to cover the latest MDS features, Microsoft SQL Server 2012 Master Data Services, Second Edition shows how to implement and manage a centralized, customer-focused MDS framework. See how to accurately model business processes, load and cleanse data, enforce business rules, eliminate redundancies, and publish data to external systems. Security, SOA and Web services, and legacy data integration are also covered in this practical guide. Install Microsoft SQL Server 2012 Master Data Services Build custom MDS models and entityspecific staging tables Load and cleanse data from disparate sources Logically group assets into collections and hierarchies Ensure integrity using versioning and business rules Configure security at functional, object, and attribute levels Extend functionality with SOA and Web services Facilitate collaboration using the MDS Excel Add-In Export data to subscribing systems through SQL views

Microsoft SQL Server 2008 R2 Master Data Services

Best Practices for Deploying and Managing Master Data Services (MDS) Effectively manage master data and drive better decision making across your enterprise with detailed instruction from two MDS experts. Microsoft SQL Server 2008 R2 Master Data Services Implementation & Administration shows you how to use MDS to centralize the management of key data within your organization. Find out how to build an MDS model, establish hierarchies, govern data access, and enforce business rules. Legacy system integration and security are also covered. Real-world programming examples illustrate the material presented in this comprehensive guide. Create a process-agnostic solution for managing your business domains Learn how to take advantage of the data modeling capabilities of MDS Manage hierarchies and consolidations across your organization Import data by using SQL Server Integration Services and T-SQL statements Ensure data accuracy and completeness by using business rules and versioning Employ role-based security at functional, object, and attribute levels Design export views and publish data to subscribing systems Use Web services to progrmmatically interact with your implementation

Implementing the Cisco MDS 9000 in an Intermix FCP, FCIP, and FICON Environment

This IBM Redbooks publication describes how to install and configure the Cisco MDS 9000 family in an IBM Fibre Connection (FICON), Fibre Channel Protocol (FCP), and Fibre Channel over IP (FCIP) environment. We have tried to consolidate as much of the critical information as possible while covering procedures and tasks that are likely to be encountered on a daily basis. Each of the products described has much more functionality than we could ever hope to cover in just one book. The IBM/Cisco SAN portfolio is rich in quality products that bring a vast amount of technicality and vitality to the SAN world. Their inclusion and selection is based on a thorough understanding of the storage networking environment that positions Cisco and IBM, and therefore its customers and partners, in an ideal position to take advantage by their subsequent deployment. In this book we cover the latest announcements related to the IBM/Cisco SAN family. We show how these announcements and the products can be implemented in both a mainframe and an open systems environment. We address some of the key concepts that they bring to the market, and in each case, we give an overview of those functions that are essential to building a robust SAN environment in an effective and concise manner.