dbt

Tougher Than Berghain Bouncers: Crafting a CI Pipeline for Your dbt Repo That Turns Away the Unworthy

2024-09-25 · Berlin dbt Meetup

talk

ci/cd

Streamlining dbt: How to Build a Project Structure that Keeps Your Team on the Same Page

2024-09-25 · Berlin dbt Meetup

talk

Creating value from GenAI in the enterprise (w/ Nisha Paliwal)

2024-09-22 · The Analytics Engineering Podcast Listen

podcast_episode

by Tristan Handy (dbt Labs) , Nisha Paliwal (Capital One)

AI/ML Analytics Analytics Engineering GenAI

Nisha Paliwal, who leads enterprise data tech at Capital One, joins Tristan to discuss building a strong data culture for in the world of AI. She is the co-author of the book Secrets of AI Value Creation. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.

De dbt data terminal van Schiphol: een veilig vertrekpunt voor analyses

2024-09-11 · Data Expo NL 2024

talk

by Dennis de Groot , Samuel Beukers

Developer productivity on GitHub Copilot (w/ Eirini Kalliamvakou)

2024-09-08 · The Analytics Engineering Podcast Listen

podcast_episode

by Tristan Handy (dbt Labs) , Eirini Kalliamvakou (GitHub Next)

Analytics Analytics Engineering GitHub

Dr. Eirini Kalliamvakou is a senior researcher at GitHub Next. Eirini has built a career on studying software engineers, how to measure their productivity, how developer experience impacts productivity, and more. Recently, Eirini has been working on quantifying the impacts of GitHub Copilot. Does it actually help software engineers be more productive? Tristan and Eirini explore how to quantify developer productivity in the first place, and finally, arriving at whether or not Copilot‌ makes a difference. In the search for real business value, this research is a real bellwether of things to come. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs. Join data practitioners and data leaders this October in Las Vegas at Coalesce, the analytics engineering conference hosted by dbt Labs. Register now at coalesece.getdbt.com. Listeners of this show can use the code podcast20 for a 20% discount.

The Minisodes, Live from Data Connect 2024: In Conversation with Lindsay Murphy

2024-08-07 · Data Product Management in Action: The Practitioner's Podcast Listen

podcast_episode

by Lindsay Murphy (Secoda) , Michael Toland (Pathfinder Product)

Data Engineering Data Management Modern Data Stack

The Data Product Management In Action podcast, brought to you by Soda and executive producer Scott Hirleman, is a platform for data product management practitioners to share insights and experiences. We've released a special edition series of minisodes of our podcast. Recorded live at Data Connect 2024, our host Michael Toland engages in short, sweet, informative, and delightful conversations with five prevelant practitioners who are forging their way forward in data and technology. In this minisode, Michael reconnects with his former colleague Lindsay Murphy as she delves into a crucial yet often overlooked aspect of data management—cost containment. Lindsay's session at Data Connect 2024 emphasizes the importance of considering costs as a critical piece of your data team's ROI. While data teams often focus on value creation and return on investment, they can easily lose sight of the expenses associated with the complex stacks they build. Lindsay offers practical insights on how to strike a balance between innovation and cost-efficiency. Plus, a special shout-out to Lindsay's new podcast, Women Lead Data—hurrah! This podcast is set to inspire and empower women in the data industry, providing a platform for sharing experiences, insights, and strategies for success. About our host Michael Toland: Michael is a Product Management Coach and Consultant with Pathfinder Product, a Test Double Operation. Since 2016, Michael has worked on large-scale system modernizations and migration initiatives at Verizon. Outside his professional career, Michael serves as the Treasurer for the New Leaders Council, mentors with Venture for America, sings with the Columbus Symphony, and writes satire for his blog Dignified Product. He is excited to discuss data product management with the podcast audience. Connect with Michael on LinkedIn About our guest Lindsay Murphy: Lindsay is a data leader with 13 years of experience in building and scaling data teams. She has successfully launched and led data initiatives at startups such as BenchSci, Maple, and Secoda. Her expertise includes developing internal data products, implementing modern data stack infrastructures, building and mentoring data engineering teams, and crafting data strategies that align with organizational goals. An active member of the data community, Lindsay organizes the Toronto Modern Data Stack Meetup group, which boasts over 2,500 members. She has also taught Advanced dbt to more than 100 students through Uplimit and hosts a weekly podcast, Women Lead Data, where she shares insights and amplifies the voices of women in the data industry. Connect with Lindsay on LinkedIn. All views and opinions expressed are those of the individuals and do not necessarily reflect their employers or anyone else. Join the conversation on LinkedIn. Apply to be a guest or nominate a practitioner. Do you love what you're listening to? Please rate and review the podcast, and share it with fellow practitioners you know. Your support helps us reach more listeners and continue providing valuable insights!

Achieving Data Reliability: The Role of Data Contracts in Modern Data Management

2024-07-28 · Data Engineering Podcast Listen

podcast_episode

by Tom Baeyens (Soda Data) , Tobias Macey

AI/ML API Cloud Computing Data Contracts Data Engineering Data Lake Data Lakehouse Data Management Data Quality Delta GenAI Hive +3 more

Summary Data contracts are both an enforcement mechanism for data quality, and a promise to downstream consumers. In this episode Tom Baeyens returns to discuss the purpose and scope of data contracts, emphasizing their importance in achieving reliable analytical data and preventing issues before they arise. He explains how data contracts can be used to enforce guarantees and requirements, and how they fit into the broader context of data observability and quality monitoring. The discussion also covers the challenges and benefits of implementing data contracts, the organizational impact, and the potential for standardization in the field.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end-to-end data lakehouse platform built on Trino, the query engine Apache Iceberg was designed for, with complete support for all table formats including Apache Iceberg, Hive, and Delta Lake. Trusted by teams of all sizes, including Comcast and Doordash. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino.At Outshift, the incubation engine from Cisco, they are driving innovation in AI, cloud, and quantum technologies with the powerful combination of enterprise strength and startup agility. Their latest innovation for the AI ecosystem is Motific, addressing a critical gap in going from prototype to production with generative AI. Motific is your vendor and model-agnostic platform for building safe, trustworthy, and cost-effective generative AI solutions in days instead of months. Motific provides easy integration with your organizational data, combined with advanced, customizable policy controls and observability to help ensure compliance throughout the entire process. Move beyond the constraints of traditional AI implementation and ensure your projects are launched quickly and with a firm foundation of trust and efficiency. Go to motific.ai today to learn more!Your host is Tobias Macey and today I'm interviewing Tom Baeyens about using data contracts to build a clearer API for your dataInterview IntroductionHow did you get involved in the area of data management?Can you describe the scope and purpose of data contracts in the context of this conversation?In what way(s) do they differ from data quality/data observability?Data contracts are also known as the API for data, can you elaborate on this?What are the types of guarantees and requirements that you can enforce with these data contracts?What are some examples of constraints or guarantees that cannot be represented in these contracts?Are data contracts related to the shift-left?Data contracts are also known as the API for data, can you elaborate on this?The obvious application of data contracts are in the context of pipeline execution flows to prevent failing checks from propagating further in the data flow. What are some of the other ways that these contracts can be integrated into an organization's data ecosystem?How did you approach the design of the syntax and implementation for Soda's data contracts?Guarantees and constraints around data in different contexts have been implemented in numerous tools and systems. What are the areas of overlap in e.g. dbt, great expectations?Are there any emerging standards or design patterns around data contracts/guarantees that will help encourage portability and integration across tooling/platform contexts?What are the most interesting, innovative, or unexpected ways that you have seen data contracts used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on data contracts at Soda?When are data contracts the wrong choice?What do you have planned for the future of data contracts?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links SodaPodcast EpisodeJBossData ContractAirflowUnit TestingIntegration TestingOpenAPIGraphQLCircuit Breaker PatternSodaCLSoda Data ContractsData MeshGreat Expectationsdbt Unit TestsOpen Data ContractsODCS == Open Data Contract StandardODPS == Open Data Product SpecificationThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA