talk-data.com talk-data.com

Topic

Modern Data Stack

298

tagged

Activity Trend

28 peak/qtr
2020-Q1 2026-Q1

Activities

298 activities · Newest first

Summary The reason for collecting, cleaning, and organizing data is to make it usable by the organization. One of the most common and widely used methods of access is through a business intelligence dashboard. Superset is an open source option that has been gaining popularity due to its flexibility and extensible feature set. In this episode Maxime Beauchemin discusses how data engineers can use Superset to provide self service access to data and deliver analytics. He digs into how it integrates with your data stack, how you can extend it to fit your use case, and why open source systems are a good choice for your business intelligence. If you haven’t already tried out Superset then this conversation is well worth your time. Give it a listen and then take it for a test drive today.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days. Datafold helps Data teams gain visibility and confidence in the quality of their analytical data through data profiling, column-level lineage and intelligent anomaly detection. Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Go to dataengineeringpodcast.com/datafold today to start a 30-day trial of Datafold. Once you sign up and create an alert in Datafold for your company data, they will send you a cool water flask. RudderStack’s smart customer data pipeline is warehouse-first. It builds your customer data warehouse and your identity graph on your data warehouse, with support for Snowflake, Google BigQuery, Amazon Redshift, and more. Their SDKs and plugins make event streaming easy, and their integrations with cloud applications like Salesforce and ZenDesk help you go beyond event streaming. With RudderStack you can use all of your customer data to answer more difficult questions and then send those insights to your whole customer data stack. Sign up free at dataengineeringpodcast.com/rudder today. Your host is Tobias Macey and today I’m interviewing Max Beauchemin about Superset, an open source platform for data exploration, dashboards, and business intelligence

Interview

Introduction How did you get involved in the area of data management? Can you start by describing what Superset is? Superset is becoming part of the reference architecture for a modern data stack. What are the factors that have contributed to its popularity over other tools such as Redash, Metabase, Looker, etc.? Where do dashboarding and exploration tools like Superset fit in the responsibilities and workflow of a data engineer? What are some of the challenges that Superset faces in being performant when working with large data sources?

Which data sources have you found to be the most challenging to work with?

What are some anti-patterns that users of Superset mig

Data Pipelines Pocket Reference

Data pipelines are the foundation for success in data analytics. Moving data from numerous diverse sources and transforming it to provide context is the difference between having data and actually gaining value from it. This pocket reference defines data pipelines and explains how they work in today's modern data stack. You'll learn common considerations and key decision points when implementing pipelines, such as batch versus streaming data ingestion and build versus buy. This book addresses the most common decisions made by data professionals and discusses foundational concepts that apply to open source frameworks, commercial products, and homegrown solutions. You'll learn: What a data pipeline is and how it works How data is moved and processed on modern data infrastructure, including cloud platforms Common tools and products used by data engineers to build pipelines How pipelines support analytics and reporting needs Considerations for pipeline maintenance, testing, and alerting

Summary The data warehouse has become the central component of the modern data stack. Building on this pattern, the team at Hightouch have created a platform that synchronizes information about your customers out to third party systems for use by marketing and sales teams. In this episode Tejas Manohar explains the benefits of sourcing customer data from one location for all of your organization to use, the technical challenges of synchronizing the data to external systems with varying APIs, and the workflow for enabling self-service access to your customer data by your marketing teams. This is an interesting conversation about the importance of the data warehouse and how it can be used beyond just internal analytics.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days. Datafold helps Data teams gain visibility and confidence in the quality of their analytical data through data profiling, column-level lineage and intelligent anomaly detection. Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Go to dataengineeringpodcast.com/datafold today to start a 30-day trial of Datafold. Once you sign up and create an alert in Datafold for your company data, they will send you a cool water flask. This episode of Data Engineering Podcast is sponsored by Datadog, a unified monitoring and analytics platform built for developers, IT operations teams, and businesses in the cloud age. Datadog provides customizable dashboards, log management, and machine-learning-based alerts in one fully-integrated platform so you can seamlessly navigate, pinpoint, and resolve performance issues in context. Monitor all your databases, cloud services, containers, and serverless functions in one place with Datadog’s 400+ vendor-backed integrations. If an outage occurs, Datadog provides seamless navigation between your logs, infrastructure metrics, and application traces in just a few clicks to minimize downtime. Try it yourself today by starting a free 14-day trial and receive a Datadog t-shirt after installing the agent. Go to dataengineeringpodcast.com/datadog today to see how you can enhance visibility into your stack with Datadog. Your host is Tobias Macey and today I’m interviewing Tejas Manohar about Hightouch, a data platform that helps you sync your customer data from your data warehouse to your CRM, marketing, and support tools

Interview

Introduction How did you get involved in the area of data management? Can you start by giving an overview of what you are building at Hightouch and your motivation for creating it? What are the main points of friction for teams who are trying to make use of customer data? Where is Hightouch positioned in the ecosystem of customer data tools such as Segment, Mixpanel

Beyond the Modern Data Stack: dbt Cloud Metadata in Mode

In this video, President and Founder of Mode, Benn Stancil discusses new ways to align the optimal application boundaries in the modern data stack, providing a set of guidelines for determining how and where to draw the lines between your many tools. He also motivates an example of these boundaries by demonstrating how metadata surfaced in an analytics tool like Mode can increase overall data confidence.

The post-modern data stack

dbt is an essential part of the modern data stack. Over the past four years, the most innovative and forward-thinking data teams have implemented a best-of-breed approach to analytics. This approach has solved many problems, but it has also created new ones. In this video, Drew Banin, Chief Product Officer and co-founder of Fishtown Analytics will share his vision for the data stack of the future.

Implementing VersaStack with Cisco ACI Multi-Pod and IBM HyperSwap for High Availability

The IBM HyperSwap® high availability (HA) function allows business continuity in a hardware failure, power failure, connectivity failure, or disasters, such as fire or flooding. It is available on the IBM SAN Volume Controller and IBM FlashSystem products. This IBM Redbooks publication covers the preferred practices for implementing Cisco VersaStack with IBM HyperSwap. The following are some of the topics covered in this book: Cisco Application Centric Infrastructure to showcase Cisco's ACI with Nexus 9Ks Cisco Fabric Interconnects and Unified Computing System (UCS) management capabilities Cisco Multilayer Director Switch (MDS) to showcase fabric channel connectivity Overall IBM HyperSwap solution architecture Differences between HyperSwap and Metro Mirroring, Volume Mirroring, and Stretch Cluster Multisite IBM SAN Volume Controller (SVC) deployment to showcase HyperSwap configuration and capabilities This book is intended for pre-sales and post-sales technical support professionals and storage administrators who are tasked with deploying a VersaStack solution with IBM HyperSwap.

Send us a text Jayson Gehri directs the marketing team for Hybrid Data Management at IBM, following roles as marketing director for Dell and Quest Software. In this special episode, he lets us know what to watch for as IBM kicks off its annual THINK Conference, happening this year in the heart of downtown San Francisco from February 12th to the 15th.


Shownotes 00:00 - Check us out on YouTube and SoundCloud!  00:05 - Be sure to check out other MDS episodes here!  00:10 - Connect with Producer Steve Moore on LinkedIn & Twitter   00:15 - Connect with Producer Liam Seston on LinkedIn & Twitter   00:20 - Connect with Producer Rachit Sharma on LinkedIn   00:25 - Connect with Host Al Martin on LinkedIn & Twitter   00:40 – Connect with Jayson Gehri on LinkedIn & Twitter   00:55 – Get more info on THINK  01:30 – Pier 39   02:30 – Rob Thomas   02:35 – Arvind Krishna Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Exam Ref 70-767 Implementing a SQL Data Warehouse

Prepare for Microsoft Exam 70-767–and help demonstrate your real-world mastery of skills for managing data warehouses. This exam is intended for Extract, Transform, Load (ETL) data warehouse developers who create business intelligence (BI) solutions. Their responsibilities include data cleansing as well as ETL and data warehouse implementation. The reader should have experience installing and implementing a Master Data Services (MDS) model, using MDS tools, and creating a Master Data Manager database and web application. The reader should understand how to design and implement ETL control flow elements and work with a SQL Service Integration Services package. Focus on the expertise measured by these objectives: • Design, and implement, and maintain a data warehouse • Extract, transform, and load data • Build data quality solutionsThis Microsoft Exam Ref: • Organizes its coverage by exam objectives • Features strategic, what-if scenarios to challenge you • Assumes you have working knowledge of relational database technology and incremental database extraction, as well as experience with designing ETL control flows, using and debugging SSIS packages, accessing and importing or exporting data from multiple sources, and managing a SQL data warehouse. Implementing a SQL Data Warehouse About the Exam Exam 70-767 focuses on skills and knowledge required for working with relational database technology. About Microsoft Certification Passing this exam earns you credit toward a Microsoft Certified Professional (MCP) or Microsoft Certified Solutions Associate (MCSA) certification that demonstrates your mastery of data warehouse management Passing this exam as well as Exam 70-768 (Developing SQL Data Models) earns you credit toward a Microsoft Certified Solutions Associate (MCSA) SQL 2016 Business Intelligence (BI) Development certification. See full details at: microsoft.com/learning

VersaStack Solution by Cisco and IBM with Oracle RAC, IBM FlashSystem V9000, and IBM Spectrum Protect

Dynamic organizations want to accelerate growth while reducing costs. To do so, they must speed the deployment of business applications and adapt quickly to any changes in priorities. Organizations today require an IT infrastructure that is easy, efficient, and versatile. The VersaStack solution by Cisco and IBM® can help you accelerate the deployment of your data centers. It reduces costs by more efficiently managing information and resources while maintaining your ability to adapt to business change. The VersaStack solution combines the innovation of Cisco UCS Integrated Infrastructure with the efficiency of the IBM Storwize® storage system. The Cisco UCS Integrated Infrastructure includes the Cisco Unified Computing System (Cisco UCS), Cisco Nexus and Cisco MDS switches, and Cisco UCS Director. The IBM FlashSystem® V9000 enhances virtual environments with its Data Virtualization, IBM Real-time Compression™, and IBM Easy Tier® features. These features deliver extraordinary levels of performance and efficiency. The VersaStack solution is Cisco Application Centric Infrastructure (ACI) ready. Your IT team can build, deploy, secure, and maintain applications through a more agile framework. Cisco Intercloud Fabric capabilities help enable the creation of open and highly secure solutions for the hybrid cloud. These solutions accelerate your IT transformation while delivering dramatic improvements in operational efficiency and simplicity. Cisco and IBM are global leaders in the IT industry. The VersaStack solution gives you the opportunity to take advantage of integrated infrastructure solutions that are targeted at enterprise applications, analytics, and cloud solutions. The VersaStack solution is backed by Cisco Validated Designs (CVD) to provide faster delivery of applications, greater IT efficiency, and less risk. This IBM Redbooks® publication is aimed at experienced storage administrators who are tasked with deploying a VersaStack solution with Oracle Real Application Clusters (RAC) and IBM Spectrum™ Protect.

VersaStack Solution by Cisco and IBM with IBM DB2, IBM Spectrum Control, and IBM Spectrum Protect

Dynamic organizations want to accelerate growth while reducing costs. To do so, they must speed the deployment of business applications and adapt quickly to any changes in priorities. Organizations require an IT infrastructure to be easy, efficient, and versatile. The VersaStack solution by Cisco and IBM® can help you accelerate the deployment of your datacenters. It reduces costs by more efficiently managing information and resources while maintaining your ability to adapt to business change. The VersaStack solution combines the innovation of Cisco Unified Computing System (Cisco UCS) Integrated Infrastructure with the efficiency of the IBM Storwize® storage system. The Cisco UCS Integrated Infrastructure includes the Cisco UCS, Cisco Nexus and Cisco MDS switches, and Cisco UCS Director. The IBM Storwize V7000 storage system enhances virtual environments with its Data Virtualization, IBM Real-time Compression™, and IBM Easy Tier® features. These features deliver extraordinary levels of performance and efficiency. The VersaStack solution is Cisco Application Centric Infrastructure (ACI) ready. Your IT team can build, deploy, secure, and maintain applications through a more agile framework. Cisco Intercloud Fabric capabilities help enable the creation of open and highly secure solutions for the hybrid cloud. These solutions accelerate your IT transformation while delivering dramatic improvements in operational efficiency and simplicity. Cisco and IBM are global leaders in the IT industry. The VersaStack solution gives you the opportunity to take advantage of integrated infrastructure solutions that are targeted at enterprise applications, analytics, and cloud solutions. The VersaStack solution is backed by Cisco Validated Designs (CVDs) to provide faster delivery of applications, greater IT efficiency, and less risk. This IBM Redbooks® publication is aimed at experienced storage administrators that are tasked with deploying a VersaStack solution with IBM DB2® High Availability (DB2 HA), IBM Spectrum™ Protect, and IBM Spectrum Control™.

VersaStack Solution by Cisco and IBM with SQL, Spectrum Control, and Spectrum Protect

Dynamic organizations want to accelerate growth while reducing costs. To do so, they must speed the deployment of business applications and adapt quickly to any changes in priorities. Organizations today require an IT infrastructure to be easy, efficient, and versatile. The VersaStack solution by Cisco and IBM® can help you accelerate the deployment of your data centers. It reduces costs by more efficiently managing information and resources while maintaining your ability to adapt to business change. The VersaStack solution combines the innovation of Cisco UCS Integrated Infrastructure with the efficiency of the IBM Storwize® storage system. The Cisco UCS Integrated Infrastructure includes the Cisco Unified Computing System (Cisco UCS), Cisco Nexus and Cisco MDS switches, and Cisco UCS Director. The IBM Storwize V7000 enhances virtual environments with its Data Virtualization, IBM Real-time Compression™, and IBM Easy Tier® features. These features deliver extraordinary levels of performance and efficiency. The VersaStack solution is Cisco Application Centric Infrastructure (ACI) ready. Your IT team can build, deploy, secure, and maintain applications through a more agile framework. Cisco Intercloud Fabric capabilities help enable the creation of open and highly secure solutions for the hybrid cloud. These solutions accelerate your IT transformation while delivering dramatic improvements in operational efficiency and simplicity. Cisco and IBM are global leaders in the IT industry. The VersaStack solution gives you the opportunity to take advantage of integrated infrastructure solutions that are targeted at enterprise applications, analytics, and cloud solutions. The VersaStack solution is backed by Cisco Validated Designs (CVD) to provide faster delivery of applications, greater IT efficiency, and less risk. This IBM Redbooks® publication is aimed at experienced storage administrators that are tasked with deploying a VersaStack solution with Microsoft Sequel (SQL), IBM Spectrum™ Protect, and IBM Spectrum Control™.

Building a Scalable Data Warehouse with Data Vault 2.0

The Data Vault was invented by Dan Linstedt at the U.S. Department of Defense, and the standard has been successfully applied to data warehousing projects at organizations of different sizes, from small to large-size corporations. Due to its simplified design, which is adapted from nature, the Data Vault 2.0 standard helps prevent typical data warehousing failures. "Building a Scalable Data Warehouse" covers everything one needs to know to create a scalable data warehouse end to end, including a presentation of the Data Vault modeling technique, which provides the foundations to create a technical data warehouse layer. The book discusses how to build the data warehouse incrementally using the agile Data Vault 2.0 methodology. In addition, readers will learn how to create the input layer (the stage layer) and the presentation layer (data mart) of the Data Vault 2.0 architecture including implementation best practices. Drawing upon years of practical experience and using numerous examples and an easy to understand framework, Dan Linstedt and Michael Olschimke discuss: How to load each layer using SQL Server Integration Services (SSIS), including automation of the Data Vault loading processes. Important data warehouse technologies and practices. Data Quality Services (DQS) and Master Data Services (MDS) in the context of the Data Vault architecture. Provides a complete introduction to data warehousing, applications, and the business context so readers can get-up and running fast Explains theoretical concepts and provides hands-on instruction on how to build and implement a data warehouse Demystifies data vault modeling with beginning, intermediate, and advanced techniques Discusses the advantages of the data vault approach over other techniques, also including the latest updates to Data Vault 2.0 and multiple improvements to Data Vault 1.0

Microsoft SQL Server 2014 Business Intelligence Development Beginner's Guide

Microsoft SQL Server 2014 Business Intelligence Development Beginner's Guide introduces you to Microsoft's BI tools and systems. You'll gain hands-on experience building solutions that handle data warehousing, reporting, and predictive analytics. With step-by-step tutorials, you'll be equipped to transform data into actionable insights. What this Book will help me do Understand and implement multidimensional data models using SSAS and MDX. Write and use DAX queries and leverage SSAS tabular models effectively. Improve and maintain data integrity using MDS and DQS tools. Design and develop polished, insightful dashboards and reports using PerformancePoint, Power View, and SSRS. Explore advanced data analysis features, such as Power Query, Power Map, and basic data mining techniques. Author(s) Abolfazl Radgoudarzi and Reza Rad are experienced practitioners and educators in the field of business intelligence. They specialize in SQL Server BI technologies and have extensive careers helping organizations harness data for decision-making. Their approach combines clear explanations with practical examples, ensuring readers can effectively apply what they learn. Who is it for? This book is ideal for database developers, system analysts, and IT professionals looking to build strong foundations in Microsoft SQL Server's BI technologies. Beginners in business intelligence or data management will find the topics accessible. Intermediate practitioners will expand their ability to build complete BI solutions. It's designed for anyone eager to develop skills in data modeling, analysis, and visualization.

Microsoft SQL Server 2012 Master Data Services 2/E, 2nd Edition

Deploy and Maintain an Integrated MDS Architecture Harness your master data and grow revenue while reducing administrative costs. Thoroughly revised to cover the latest MDS features, Microsoft SQL Server 2012 Master Data Services, Second Edition shows how to implement and manage a centralized, customer-focused MDS framework. See how to accurately model business processes, load and cleanse data, enforce business rules, eliminate redundancies, and publish data to external systems. Security, SOA and Web services, and legacy data integration are also covered in this practical guide. Install Microsoft SQL Server 2012 Master Data Services Build custom MDS models and entityspecific staging tables Load and cleanse data from disparate sources Logically group assets into collections and hierarchies Ensure integrity using versioning and business rules Configure security at functional, object, and attribute levels Extend functionality with SOA and Web services Facilitate collaboration using the MDS Excel Add-In Export data to subscribing systems through SQL views

Microsoft SQL Server 2008 R2 Master Data Services

Best Practices for Deploying and Managing Master Data Services (MDS) Effectively manage master data and drive better decision making across your enterprise with detailed instruction from two MDS experts. Microsoft SQL Server 2008 R2 Master Data Services Implementation & Administration shows you how to use MDS to centralize the management of key data within your organization. Find out how to build an MDS model, establish hierarchies, govern data access, and enforce business rules. Legacy system integration and security are also covered. Real-world programming examples illustrate the material presented in this comprehensive guide. Create a process-agnostic solution for managing your business domains Learn how to take advantage of the data modeling capabilities of MDS Manage hierarchies and consolidations across your organization Import data by using SQL Server Integration Services and T-SQL statements Ensure data accuracy and completeness by using business rules and versioning Employ role-based security at functional, object, and attribute levels Design export views and publish data to subscribing systems Use Web services to progrmmatically interact with your implementation

Implementing the Cisco MDS 9000 in an Intermix FCP, FCIP, and FICON Environment

This IBM Redbooks publication describes how to install and configure the Cisco MDS 9000 family in an IBM Fibre Connection (FICON), Fibre Channel Protocol (FCP), and Fibre Channel over IP (FCIP) environment. We have tried to consolidate as much of the critical information as possible while covering procedures and tasks that are likely to be encountered on a daily basis. Each of the products described has much more functionality than we could ever hope to cover in just one book. The IBM/Cisco SAN portfolio is rich in quality products that bring a vast amount of technicality and vitality to the SAN world. Their inclusion and selection is based on a thorough understanding of the storage networking environment that positions Cisco and IBM, and therefore its customers and partners, in an ideal position to take advantage by their subsequent deployment. In this book we cover the latest announcements related to the IBM/Cisco SAN family. We show how these announcements and the products can be implemented in both a mainframe and an open systems environment. We address some of the key concepts that they bring to the market, and in each case, we give an overview of those functions that are essential to building a robust SAN environment in an effective and concise manner.

Is BI Too Big for Small Data?

This is a talk about how we thought we had Big Data, and we built everything planning for Big Data, but then it turns out we didn't have Big Data, and while that's nice and fun and seems more chill, it's actually ruining everything, and I am here asking you to please help us figure out what we are supposed to do now.

📓 Resources Big Data is Dead: https://motherduck.com/blog/big-data-... Small Data Manifesto: https://motherduck.com/blog/small-dat... Is Excel Immortal?: https://benn.substack.com/p/is-excel-immortal Small Data SF: https://www.smalldatasf.com/

➡️ Follow Us LinkedIn: / motherduck
X/Twitter : / motherduck
Blog: https://motherduck.com/blog/


Mode founder David Wheeler challenges the data industry's obsession with "big data," arguing that most companies are actually working with "small data," and our tools are failing us. This talk deconstructs the common sales narrative for BI tools, exposing why the promise of finding game-changing insights through data exploration often falls flat. If you've ever built dashboards nobody uses or wondered why your analytics platform doesn't deliver on its promises, this is a must-watch reality check on the modern data stack.

We explore the standard BI demo, where an analyst uncovers a critical insight by drilling into event data. This story sells tools like Tableau and Power BI, but it rarely reflects reality, leading to a "revolving door of BI" as companies swap tools every few years. Discover why the narrative of the intrepid analyst finding a needle in the haystack only works in movies and how this disconnect creates a cycle of failed data initiatives and unused "trashboards."

The presentation traces our belief that "data is the new oil" back to the early 2010s, with examples from Target's predictive analytics and Facebook's growth hacking. However, these successes were built on truly massive datasets. For most businesses, analyzing small data results in noisy charts that offer vague "directional vibes" rather than clear, actionable insights. We contrast the promise of big data analytics with the practical challenges of small data interpretation.

Finally, learn actionable strategies for extracting real value from the data you actually have. We argue that BI tools should shift focus from data exploration to data interpretation, helping users understand what their charts actually mean. Learn why "doing things that don't scale," like manually analyzing individual customer journeys, can be more effective than complex models for small datasets. This talk offers a new perspective for data scientists, analysts, and developers looking for better data analysis techniques beyond the big data hype.

Think Inside the Box: Constraints Drive Data Warehousing Innovation

As a Head of Data or a one-person data team, keeping the lights on for the business while running all things data-related as efficiently as possible is no small feat. This talk will focus on tactics and strategies to manage within and around constraints, including monetary costs, time and resources, and data volumes.

📓 Resources Big Data is Dead: https://motherduck.com/blog/big-data-... Small Data Manifesto: https://motherduck.com/blog/small-dat... Why Small Data?: https://benn.substack.com/p/is-excel-... Small Data SF: https://www.smalldatasf.com/

➡️ Follow Us LinkedIn: / motherduck
X/Twitter : / motherduck
Blog: https://motherduck.com/blog/


Learn how your data team can drive innovation and maximize ROI by embracing constraints, drawing inspiration from SpaceX's revolutionary cost-effective approach. This video challenges the "abundance mindset" prevalent in the modern data stack, where easily scalable cloud data warehouses and a surplus of tools often lead to unmanageable data models and underutilized dashboards. We explore a focused data strategy for extracting maximum value from small data, shifting the paradigm from "more data" to more impact.

To maximize value, data teams must move beyond being order-takers and practice strategic stakeholder management. Discover how to use frameworks like the stakeholder engagement matrix to prioritize high-impact business leaders and align your work with core business goals. This involves speaking the language of business growth models, not technical jargon about data pipelines or orchestration, ensuring your data engineering efforts resonate with key decision-makers and directly contribute to revenue-generating activities.

Embracing constraints is key to innovation and effective data project management. We introduce the Iron Triangle—a fundamental engineering concept balancing scope, cost, and time—as a powerful tool for planning data projects and having transparent conversations with the business. By treating constraints not as limitations but as opportunities, data engineers and analysts can deliver higher-quality data products without succumbing to scope creep or uncontrolled costs.

A critical component of this strategy is understanding the Total Cost of Ownership (TCO), which goes far beyond initial compute costs to include ongoing maintenance, downtime, and the risk of vendor pricing changes. Learn how modern, efficient tools like DuckDB and MotherDuck are designed for cost containment from the ground up, enabling teams to build scalable, cost-effective data platforms. By making the true cost of data requests visible, you can foster accountability and make smarter architectural choices. Ultimately, this guide provides a blueprint for resisting data stack bloat and turning cost and constraints into your greatest assets for innovation.