talk-data.com talk-data.com

Topic

Modern Data Stack

298

tagged

Activity Trend

28 peak/qtr
2020-Q1 2026-Q1

Activities

298 activities · Newest first

Summary Building data products is an undertaking that has historically required substantial investments of time and talent. With the rise in cloud platforms and self-serve data technologies the barrier of entry is dropping. Shane Gibson co-founded AgileData to make analytics accessible to companies of all sizes. In this episode he explains the design of the platform and how it builds on agile development principles to help you focus on delivering value.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos. Prefect is the modern Dataflow Automation platform for the modern data stack, empowering data practitioners to build, run and monitor robust pipelines at scale. Guided by the principle that the orchestrator shouldn’t get in your way, Prefect is the only tool of its kind to offer the flexibility to write code as workflows. Prefect specializes in glueing together the disparate pieces of a pipeline, and integrating with modern distributed compute libraries to bring power where you need it, when you need it. Trusted by thousands of organizations and supported by over 20,000 community members, Prefect powers over 100MM business critical tasks a month. For more information on Prefect, visit dataengineeringpodcast.com/prefect. Data engineers don’t enjoy writing, maintaining, and modifying ETL pipelines all day, every day. Especially once they realize 90% of all major data sources like Google Analytics, Salesforce, Adwords, Facebook, Spreadsheets, etc., are already available as plug-and-play connectors with reliable, intuitive SaaS solutions. Hevo Data is a highly reliable and intuitive data pipeline platform used by data engineers from 40+ countries to set up and run low-latency ELT pipelines with zero maintenance. Boasting more than 150 out-of-the-box connectors that can be set up in minutes, Hevo also allows you to monitor and control your pipelines. You get: real-time data flow visibility, fail-safe mechanisms, and alerts if anything breaks; preload transformations and auto-schema mapping precisely control how data lands in your destination; models and workflows to transform data for analytics; and reverse-ETL capability to move the transformed data back to your business software to inspire timely action. All of this, plus its transparent pricing and 24*7 live support, makes it consistently voted by users as the Leader in the Data Pipeline category on review platforms like G2. Go to dataengineeringpodcast.com/hevodata and sign up for a free 14-day trial that also comes with 24×7 support. Your host is Tobias Macey and today I’m interviewing Shane Gibson about AgileData

Summary A lot of the work that goes into data engineering is trying to make sense of the "data exhaust" from other applications and services. There is an undeniable amount of value and utility in that information, but it also introduces significant cost and time requirements. In this episode Nick King discusses how you can be intentional about data creation in your applications and services to reduce the friction and errors involved in building data products and ML applications. He also describes the considerations involved in bringing behavioral data into your systems, and the ways that he and the rest of the Snowplow team are working to make that an easy addition to your platforms.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos. Prefect is the modern Dataflow Automation platform for the modern data stack, empowering data practitioners to build, run and monitor robust pipelines at scale. Guided by the principle that the orchestrator shouldn’t get in your way, Prefect is the only tool of its kind to offer the flexibility to write code as workflows. Prefect specializes in glueing together the disparate pieces of a pipeline, and integrating with modern distributed compute libraries to bring power where you need it, when you need it. Trusted by thousands of organizations and supported by over 20,000 community members, Prefect powers over 100MM business critical tasks a month. For more information on Prefect, visit dataengineeringpodcast.com/prefect. Data engineers don’t enjoy writing, maintaining, and modifying ETL pipelines all day, every day. Especially once they realize 90% of all major data sources like Google Analytics, Salesforce, Adwords, Facebook, Spreadsheets, etc., are already available as plug-and-play connectors with reliable, intuitive SaaS solutions. Hevo Data is a highly reliable and intuitive data pipeline platform used by data engineers from 40+ countries to set up and run low-latency ELT pipelines with zero maintenance. Boasting more than 150 out-of-the-box connectors that can be set up in minutes, Hevo also allows you to monitor and control your pipelines. You get: real-time data flow visibility, fail-safe mechanisms, and alerts if anything breaks; preload transformations and auto-schema mapping precisely control how data lands in your destination; models and workflows to transform data for analytics; and reverse-ETL capability to move the transformed data back to your business software to inspire timely action. All of this, plus its transparent pricing and 24*7 live support, makes it consistently voted by users as the Leader in the Data Pipeline

Summary Business intelligence has grown beyond its initial manifestation as dashboards and reports. In its current incarnation it has become a ubiquitous need for analytics and opportunities to answer questions with data. In this episode Amir Orad discusses the Sisense platform and how it facilitates the embedding of analytics and data insights in every aspect of organizational and end-user experiences.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos. Prefect is the modern Dataflow Automation platform for the modern data stack, empowering data practitioners to build, run and monitor robust pipelines at scale. Guided by the principle that the orchestrator shouldn’t get in your way, Prefect is the only tool of its kind to offer the flexibility to write code as workflows. Prefect specializes in glueing together the disparate pieces of a pipeline, and integrating with modern distributed compute libraries to bring power where you need it, when you need it. Trusted by thousands of organizations and supported by over 20,000 community members, Prefect powers over 100MM business critical tasks a month. For more information on Prefect, visit dataengineeringpodcast.com/prefect. Data engineers don’t enjoy writing, maintaining, and modifying ETL pipelines all day, every day. Especially once they realize 90% of all major data sources like Google Analytics, Salesforce, Adwords, Facebook, Spreadsheets, etc., are already available as plug-and-play connectors with reliable, intuitive SaaS solutions. Hevo Data is a highly reliable and intuitive data pipeline platform used by data engineers from 40+ countries to set up and run low-latency ELT pipelines with zero maintenance. Boasting more than 150 out-of-the-box connectors that can be set up in minutes, Hevo also allows you to monitor and control your pipelines. You get: real-time data flow visibility, fail-safe mechanisms, and alerts if anything breaks; preload transformations and auto-schema mapping precisely control how data lands in your destination; models and workflows to transform data for analytics; and reverse-ETL capability to move the transformed data back to your business software to inspire timely action. All of this, plus its transparent pricing and 24*7 live support, makes it consistently voted by users as the Leader in the Data Pipeline category on review platforms like G2. Go to dataengineeringpodcast.com/hevodata and sign up for a free 14-day trial that also comes with 24×7 support. Your host is Tobias Macey and today I’m interviewing Amir Orad about Sisense, a platform focused on providing intelligent analytics

Beyond pretty graphs: How end-to-end lineage drives better actions

Everyone is talking about data lineage these days, and for a good reason. Data lineage helps ensure better data quality across your modern data stack. But not everyone speaks the same lineage language. Data engineers use lineage for impact and root cause analysis. Analysts and Analytics engineers use lineage to trace jobs and transformations in their warehouses. And consumers use lineage to understand why data never reached their expected destination. This results in a narrow, siloed view lineage in which only one group benefits. It’s time to stop using siloed lineage views for pretty graphs and start using end-to-end lineage to drive focused actions. In the talk, you will learn:

• How data quality tailors to specific needs of data engineers, analysts, & consumers

• How data lineage should drive actions

• A real-world example of end-to-end data lineage with Airflow, dbt, Spark, and Redshift

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Back to the Future: Where Dimensional Modeling Enters the Modern Data Stack

dbt’s powerful capabilities allow data teams to deliver data products and analytics solutions to solve business problems faster than ever. Yet still, even with the best modern technologies, challenges arise. How can you be certain what your building will stand up to changing requirements? How can you connect disparate parts of your business to derive new insights? The answer may be a blast from the past—but the fundamentals never change. Learn how to apply fundamental techniques—like dimensional modeling—to modern tools, helping you to build scalable and reusable solutions to solve data problems today, and in the future.

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Escape from Data Island - Orchestrate and Connect Your Data Stack for Smooth Sailing

The Modern Data Stack is becoming more and more fragmented. With new tools and processes popping up continuously, it’s easy to get stranded on various “data islands”, with everything running independently. In this session, we’ll teach you:

  • What benefits you gain by turning your Modern Data Stack into an Integrated Data Stack

  • How Shipyard can help you quickly connect the data tools you already use

  • How orchestration is the missing step in your data journey to get your team off “data islands”

  • How dbt fits into the picture of a connected data stack

Check the slides here: https://docs.google.com/presentation/d/1NT7RnMtTLxb5ew5_VXXzciJusfbBKZJx86B6BtNZv90/edit?usp=sharing

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Field-level lineage with dbt, ANTLR, and Snowflake

Lineage is a critical component of any root cause, impact analysis, and overall analytics heath assessment workflow. But it hasn’t always been easy to create, particularly at the field level. In this session, Mei Tao, Helena Munoz, and Xuanzi Han (Monte Carlo) tackle this challenge head-on by leveraging some of the most popular tools in the modern data stack, including dbt, Airflow, Snowflake, and ANother Tool for Language Recognition (ANTLR). Learn how they designed the data model, query parser, and larger database design for field-level lineage—highlighting learnings, wrong turns, and best practices developed along the way.

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Introducing dbt with Databricks

In this live, instructor-led hands-on lab, you’ll learn how to build a modern data stack with Databricks and dbt, using dbt to manage data transformations in Databricks and perform exploratory data analysis on the clean data sets using Databricks SQL. Based on the lakehouse architecture and built on an open data lake, data analysts, analytics engineers, and data scientists can use dbt and Databricks to work with the freshest and most complete data, and quickly derive new insights for accurate decision-making.

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Keynote: The End of the Road for The Modern Data Stack You Know

The products that make up the “modern data stack” have all grown to prominence over the past decade. In this heady time, so much has changed about how data work is done.

But some of the “rules of engagement” that defined the original modern data stack are starting to break down. As a result, big changes are coming for the data tooling ecosystem.

The end result? Better, more integrated tooling, used by more humans inside of every company, that actually understands the data that it is operating on.

This modern data stack—if we still want to call it that!—will be unrecognizable to its former self.

Check the slides here: https://docs.google.com/presentation/d/1G0c3w19AwBEWEzyd9vwTKK5zMvXR76-NPQn6x0xZoSg/view

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Modern Data Management: how to setup your data for success

Got your Modern Data Stack setup, now what? A mature data practice goes beyond setting up the data pipeline, and ensures there are both systems and processes in place to make it easy for everyone to find and understand data. At Select Star, we work with many organizations to enable data discovery, so the “tribal knowledge” of data is searchable and understandable for everyone. In this session, we’ll share the best practices and change management tips for setting up a data discovery portal and making it the single source of truth of data for your business.

Check the slides here:https://docs.google.com/presentation/d/1F3CPBhWenf2jt5hmXrhXvOBe5wei6hcLUj95jZtVZEw/edit?usp=sharing

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Maximizing data leverage at Vendr with dbt and Metaplane

How do you support exponentially growing companies without breaking as a data team? The answer is increasing your leverage with tools and processes. This session centers around four principles to achieve this goal: 1. don’t reinvent the wheel, 2. make your own job easier, 3. save time for innovation, and 4. invest in onboarding.

First, the first data leader at Vendr, the SaaS buying platform with customers like GitLab, Brex, and The Washington Post, will share his learnings on building a stack and team that scaled as the company grew 10x from 30 to 300 employees in under two years.

Second, we’ll give a demo of how Metaplane pulls lineage and metadata from a modern data stack that is centered around dbt. By the end of the demo, you’ll know how to setup tests, extract lineage throughout your data stack, and triage data quality alerts.More details coming soon!

Check the slides here: https://docs.google.com/presentation/d/15dQJIGeGhG0WGO6MLXtxWhmf8neY-u0c8ZLRG9GJB-s/edit?usp=sharing

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

dbt and MDS in small-batch academic research: a working example

Academia/open science is an as-yet untapped market for analytics engineering, as well as one that could majorly benefit from the tight coupling of data transformation and software engineering best practices. But introducing dbt into this context comes with its own set of challenges. In this session, Šimon Podhajský (iLife Technologies), explains what’s slowing progress here,, and what academics can do to progress this work.

Check the slides here: https://docs.google.com/presentation/d/1aw_cs6V0n-oT9Lp7Vq3MNcRbthEFJEYwcvkCBfuzlR0/edit?usp=sharing

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Demystifying event streams: Transforming events into tables with dbt

Pulling data directly out of application databases is commonplace in the MDS, but also risky. Apps change quickly, and application teams might update database schemas in unexpected ways, leading to pipeline failures, data quality issues, data delivery slow-downs. There is a better way. In his session, Charlie Summers (Merit) describes how their organization transforms application event streams into analytics-ready tables, more resilient to event scheme changes.

Check the slides here:https://docs.google.com/presentation/d/1K5PcoVshiHKZs_xI3K4P5JRNYTkbmnQJPMl8NmBlGfo/edit

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Operational AI for the Modern Data Stack

The opportunities for AI and machine learning are everywhere in modern businesses, but today's MLOps ecosystem is drowning in complexity. In this talk, we'll show how to use dbt and Continual to scale operational AI — from customer churn predictions to inventory forecasts — without complex engineering or operational burden.

Check the slides here: https://docs.google.com/presentation/d/1vNcQxCjAK4xZVZC1ZHzqBzPiJE7uwhDIVWGeT9Poi1U/edit#slide=id.g15b1f544dd5_0_1500

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Preparing for the Next Wave: Data Apps

Data apps are the next wave in analytics engineering. The explosion of data volume and variety combined with an increasing demand for analytics by consumers, and a leap in cloud data technologies triggered an evolution of traditional analytics into the realms of modern data apps. Question is: How do you prepare for this wave? In this session we’ll explore real-world examples of modern data apps, and how the modern data stack is advancing to support sub-second and high concurrency analytics to meet the new wave of demand. We will cover: performance challenges, semi-structured data, data freshness, data modeling and toolsets.

Check the slides here: https://docs.google.com/presentation/d/1MC18SgT_ZHOJePjYizz_WT7dVveaycNw/edit?usp=sharing&ouid=110293204340061069659&rtpof=true&sd=true

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

The modern data team

The "socio" is inseparable from the "technical". In fact, technological change often begets social and organizational change.

And in the data space, the technical changes that some now refer to as the "modern data stack" call for changes in how teams work with data, and in turn how data specialists work within those teams. Enter the Modern Data Team.

In this talk, Abhi Sivasailam will unpack the changing landscape of data roles and teams and what this looks like in action at Flexport. Come learn how Flexport approaches data contracts, management, and governance, and the central role that Analytics Engineers and Product Analysts play in these processes.

Check the slides here: https://docs.google.com/presentation/d/1Sgm3J6EkeKQf5D1MKopsLLAMOhAZ05CxDlei2mbDE90/edit#slide=id.g16424dcc8d3_0_1145

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

When the Real World Messes with Your Schedule: Event Driven Dbt Models for the MDS

The real world is unreliable. Planes take off late, trains leave early, and cars break down. Sometimes, we need to get data from a source without a standard connector. Sometimes, a schedule really doesn't cut it. In this talk, we'll build a pipeline that responds to events to ensure that data is delivered quickly and reliably. We'll also ensure it can handle failure and keep bad data from clogging the plumbing.

Check the slides here: https://docs.google.com/presentation/d/1W9p7H4l0fUr7iAJ3GxEGUTmWGtmc_iu02N-MKb2BSFM/edit?usp=sharing

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

podcast_episode
by Igor Vieira (Grupo Boticário) , Marcus Guidoti (Grupo Boticário) , Rafael Ubertini (Grupo Boticário)

A Modern Data Stack é uma das maiores tendências na área de dados. O termo se tornou famoso por ser uma abordagem que empresas podem utilizar para se adequar rapidamente a mudanças, não só de negócios, mas do mercado em si, plugando soluções que resolvem diferentes etapas do dado nas empresas, da coleta a documentação. E, para nos contar como é aplicar os conceitos da Modern Data Stack de forma bem prática no dia a dia, convidamos nossos amigos do Grupo Boticário para esse papo. Vem conferir!

Participe do State of Data Brazil, a maior pesquisa do mercado de dados no Brasil: https://www.stateofdata.com.br/

Conheça nossos participantes: Linkedin do Igor Vieira Linkedin do Rafael Ubertini Linkedin do Marcus Guidoti

Acesse o post no Medium para acessar todas as referências do episódio: https://medium.com/data-hackers/o-que-%C3%A9-a-modern-data-stack-data-hackers-podcast-59-a084bf351016

Summary Agile methodologies have been adopted by a majority of teams for building software applications. Applying those same practices to data can prove challenging due to the number of systems that need to be included to implement a complete feature. In this episode Shane Gibson shares practical advice and insights from his years of experience as a consultant and engineer working in data about how to adopt agile principles in your data work so that you can move faster and provide more value to the business, while building systems that are maintainable and adaptable.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos. Prefect is the modern Dataflow Automation platform for the modern data stack, empowering data practitioners to build, run and monitor robust pipelines at scale. Guided by the principle that the orchestrator shouldn’t get in your way, Prefect is the only tool of its kind to offer the flexibility to write code as workflows. Prefect specializes in glueing together the disparate pieces of a pipeline, and integrating with modern distributed compute libraries to bring power where you need it, when you need it. Trusted by thousands of organizations and supported by over 20,000 community members, Prefect powers over 100MM business critical tasks a month. For more information on Prefect, visit dataengineeringpodcast.com/prefect. Data engineers don’t enjoy writing, maintaining, and modifying ETL pipelines all day, every day. Especially once they realize 90% of all major data sources like Google Analytics, Salesforce, Adwords, Facebook, Spreadsheets, etc., are already available as plug-and-play connectors with reliable, intuitive SaaS solutions. Hevo Data is a highly reliable and intuitive data pipeline platform used by data engineers from 40+ countries to set up and run low-latency ELT pipelines with zero maintenance. Boasting more than 150 out-of-the-box connectors that can be set up in minutes, Hevo also allows you to monitor and control your pipelines. You get: real-time data flow visibility, fail-safe mechanisms, and alerts if anything breaks; preload transformations and auto-schema mapping precisely control how data lands in your destination; models and workflows to transform data for analytics; and reverse-ETL capability to move the transformed data back to your business software to inspire timely action. All of this, plus its transparent pricing and 24*7 live support, makes it consistently voted by users as the Leader in the Data Pipeline category on review platforms like G2. Go to dataengineeringpodcast.com/hevodata and sign up for a free 14-day trial that also comes with 24×7 support. Your host is Tobias Macey and today I’m interviewing Shane Gibson about how to bring Agile practices to your data management workflows

Interview

Introduction How did you get involved in the area of data management? Can you describe what AgileData is and the story behind it? What are the main industries and/or use cases that you are focused on supporting? The data ecosystem has been trying on different paradigms from software development for some time now (e.g. DataOps, version control, etc.). What are the aspects of Agile that do and don’t map well to data engineering/analysis? One of the perennial challenges of data analysis is how to approach data modeling. How do you balance the need to provide value with the long-term impacts of incomplete or underinformed modeling decisions made in haste at the beginning of a project?

How do you design in affordances for refactoring of the data models without breaking downstream assets?

Another aspect of implementing data products/platforms is how to manage permissions and governance. What are the incremental ways that those principles can be incorporated early and evolved along with the overall analytical products? What are some of the organizational design strategies that you find most helpful when establishing or training a team who is working on data products? In order to have a useful target to work toward it’s necessary to understand what the data consumers are hoping to achieve. What are some of the challenges of doing requirements gathering for data products? (e.g. not knowing what information is available, consumers not understanding what’s hard vs. easy, etc.)

How do you work with the "customers" to help them understand what a reasonable scope is and translate that to the actual project stages for the engineers?

What are some of the perennial questions or points of confusion that you have had to address with your clients on how to design and implement analytical assets? What are the most interesting, innovative, or unexpected ways that you have seen agile principles used for data? What are the most interesting, unexpected, or challenging lessons that you have learned while working on AgileData? When is agile the wrong choice for a data project? What do you have planned for the future of AgileData?

Contact Info

LinkedIn @shagility on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don’t forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

AgileData OptimalBI How To Make Toast Data Mesh Information Product Canvas DataKitchen

Podcast Episode

Great Expectations

Podcast Episode

Soda Data

Podcast Episode

Google DataStore Unfix.work Activity Schema

Podcast Episode

Data Vault

Podcast Episode

Star Schema Lean Methodology Scrum Kanban

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Sponsored By: Atlan: Atlan

Have you ever woken up to a crisis because a number on a dashboard is broken and no one knows why? Or sent out frustrating slack messages trying to find the right data set? Or tried to understand what a column name means?

Our friends at Atlan started out as a data team themselves and faced all this collaboration chaos themselves, and started building Atlan as an internal tool for themselves. Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets & code, Atlan enables teams to create a single source of truth for all their data assets, and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker and more.

Go to dataengineeringpodcast.com/atlan and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $3000 on an annual subscription.Prefect: Prefect

Prefect is the modern Dataflow Automation platform for the modern data stack, empowering data practitioners to build, run and monitor robust pipelines at scale. Guided by the principle that the orchestrator shouldn’t get in your way, Prefect is the only tool of its kind to offer the flexibility to write code as workflows. Prefect specializes in glueing together the disparate pieces of a pipeline, and integrating with modern distributed compute libraries to bring power where you need it, when you need it. Trusted by thousands of organizations and supported by over 20,000 community members, Prefect powers over 100MM business critical tasks a month. For more information on Prefect, visit…

Summary Logistics and supply chains are under increased stress and scrutiny in recent years. In order to stay ahead of customer demands, businesses need to be able to react quickly and intelligently to changes, which requires fast and accurate insights into their operations. Pathway is a streaming database engine that embeds artificial intelligence into the storage, with functionality designed to support the spatiotemporal data that is crucial for shipping and logistics. In this episode Adrian Kosowski explains how the Pathway product got started, how its design simplifies the creation of data products that support supply chain operations, and how developers can help to build an ecosystem of applications that allow businesses to accelerate their time to insight.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos. Prefect is the modern Dataflow Automation platform for the modern data stack, empowering data practitioners to build, run and monitor robust pipelines at scale. Guided by the principle that the orchestrator shouldn’t get in your way, Prefect is the only tool of its kind to offer the flexibility to write code as workflows. Prefect specializes in glueing together the disparate pieces of a pipeline, and integrating with modern distributed compute libraries to bring power where you need it, when you need it. Trusted by thousands of organizations and supported by over 20,000 community members, Prefect powers over 100MM business critical tasks a month. For more information on Prefect, visit dataengineeringpodcast.com/prefect. Data engineers don’t enjoy writing, maintaining, and modifying ETL pipelines all day, every day. Especially once they realize 90% of all major data sources like Google Analytics, Salesforce, Adwords, Facebook, Spreadsheets, etc., are already available as plug-and-play connectors with reliable, intuitive SaaS solutions. Hevo Data is a highly reliable and intuitive data pipeline platform used by data engineers from 40+ countries to set up and run low-latency ELT pipelines with zero maintenance. Boasting more than 150 out-of-the-box connectors that can be set up in minutes, Hevo also allows you to monitor and control your pipelines. You get: real-time data flow visibility, fail-safe mechanisms, and alerts if anything breaks; preload transformations and auto-schema mapping precisely control how data lands in your destination; models and workflows to transform data for analytics; and reverse-ETL capability to move the transformed data back to your business software to inspire timely action. All of this, plus its transparent pricing and 24*7 live s