The Arc of Data Innovation (w/ Bob Muglia, former CEO of Snowflake)

2023-07-12 · The Analytics Engineering Podcast Listen

podcast_episode

by Tristan Handy (dbt Labs) , Bob Muglia (Snowflake; Microsoft) , Julia Schottenstein (dbt labs)

Analytics Analytics Engineering dbt LLM Microsoft RDBMS Snowflake

Bob Muglia likely needs no introduction. The former CEO of Snowflake led the company during its early, transformational years after a long career at Microsoft and Juniper. Bob recently released the book The Datapreneurs about the arc of innovation in the data industry, starting with the first relational databases all the way to the present craze of LLMs and beyond. In this conversation with Tristan and Julia, Bob shares insights into the future of data engineering and its potential business impact while offering a glimpse into his professional journey. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.

Peter Hanssens - Building Awesome Data Communities in Australia and Beyond!

2023-07-12 · The Joe Reis Show Listen

podcast_episode

by Peter Hanssens (DataEngBytes) , Joe Reis (DeepLearning.AI)

Peter Hanssens is the founder of DataEngBytes, the forward-thinking conference on all things data engineering. In this chat, we talk about what to expect at the 2023 edition of DataEngBytes, the tech scene in Australia, his views on the current and future field of data engineering, and much more.

data #dataengineering #dataengbytes

Reduce Friction In Your Business Analytics Through Entity Centric Data Modeling

2023-07-09 · Data Engineering Podcast Listen

podcast_episode

by Maxime Beauchemin (Preset) , Tobias Macey

Activity Schema AI/ML Airflow Analytics Data Management Data Modelling dbt ETL/ELT GitHub Informatica dimensional modeling Python +3 more

Summary

For business analytics the way that you model the data in your warehouse has a lasting impact on what types of questions can be answered quickly and easily. The major strategies in use today were created decades ago when the software and hardware for warehouse databases were far more constrained. In this episode Maxime Beauchemin of Airflow and Superset fame shares his vision for the entity-centric data model and how you can incorporate it into your own warehouse design.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack Your host is Tobias Macey and today I'm interviewing Max Beauchemin about the concept of entity-centric data modeling for analytical use cases

Interview

Introduction How did you get involved in the area of data management? Can you describe what entity-centric modeling (ECM) is and the story behind it?

How does it compare to dimensional modeling strategies? What are some of the other competing methods Comparison to activity schema

What impact does this have on ML teams? (e.g. feature engineering)

What role does the tooling of a team have in the ways that they end up thinking about modeling? (e.g. dbt vs. informatica vs. ETL scripts, etc.)

What is the impact on the underlying compute engine on the modeling strategies used?

What are some examples of data sources or problem domains for which this approach is well suited?

What are some cases where entity centric modeling techniques might be counterproductive?

What are the ways that the benefits of ECM manifest in use cases that are down-stream from the warehouse?

What are some concrete tactical steps that teams should be thinking about to implement a workable domain model using entity-centric principles?

How does this work across business domains within a given organization (especially at "enterprise" scale)?

What are the most interesting, innovative, or unexpected ways that you have seen ECM used?

What are the most interesting, unexpected, or challenging lessons that you have learned while working on ECM?

When is ECM the wrong choice?

What are your predictions for the future direction/adoption of ECM or other modeling techniques?

Contact Info

mistercrunch on GitHub LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

Entity Centric Modeling Blog Post Max's Previous Apperances

Defining Data Engineering with Maxime Beauchemin Self Service Data Exploration And Dashboarding With Superset Exploring The Evolving Role Of Data Engineers Alumni Of AirBnB's Early Years Reflect On What They Learned About Building Data Driven Organizations

Apache Airflow Apache Superset Preset Ubisoft Ralph Kimball The Rise Of The Data Engineer The Downfall Of The Data Engineer The Rise Of The Data Scientist Dimensional Data Modeling Star Schema Databas

5 Minute Friday - "I Have People Skills!"

2023-07-07 · The Joe Reis Show Listen

podcast_episode

by Joe Reis (DeepLearning.AI)

I'm starting to see more and more discussions about soft skills and people skills. In this episode, I talk about why tech skills are table stakes, and soft skills are where you need to level up if you want to boost your career.

Office Space clip: https://www.youtube.com/watch?v=hNuu9CpdjIo

If you like this show, give it a 5-star rating on your favorite podcast platform.

Purchase Fundamentals of Data Engineering at your favorite bookseller.

Subscribe to my Substack: https://joereis.substack.com/

Tristan Handy - Balancing Competing Tensions and Handling Complexity

2023-07-06 · The Joe Reis Show Listen

podcast_episode

by Tristan Handy (dbt Labs) , Joe Reis (DeepLearning.AI)

dbt

Tristan Handy and I chat about balancing competing tensions, both personally and leading dbt Labs. We also discuss the power of organizational behavior, naming problems to solve, and home remodeling.

This is different from the normal interviews you'll hear with Tristan, and I hope you enjoy it!

dbtlabs #data #analyticsengineering

If you like this show, give it a 5-star rating on your favorite podcast platform.

Purchase Fundamentals of Data Engineering at your favorite bookseller.

Subscribe to my Substack: https://joereis.substack.com/

DataOps In Data Engineering - Audio Blog

2023-07-05 · Secrets of Data Analytics Leaders Listen

podcast_episode

DataOps

The unbundling of the data ecosystem is causing organizations to “duct tape” products and frameworks together to build their solutions and data delivery processes. Organizations fail to build and deploy end-to-end, automated, repeatable data-driven systems, ignoring data engineering & dataops principles as well as best practices. Published at: https://www.eckerson.com/articles/dataops-in-data-engineering

Should AI Bots Build Your Data Pipelines? Part IV - Audio Blog

2023-07-05 · Secrets of Data Analytics Leaders Listen

podcast_episode

AI/ML

This blog recommends guiding principles for successful implementation of language models to assist data engineering. Published at: https://www.eckerson.com/articles/should-ai-bots-build-your-data-pipelines-part-iv-guiding-principles-for-success-with-language-models-and-data-engineering

Should AI Bots Build Your Data Pipelines Part III - Audio Blog

2023-07-05 · Secrets of Data Analytics Leaders Listen

podcast_episode

AI/ML GenAI

An emerging approach to generative AI will help data engineering teams achieve much-needed productivity gains while controlling risk. Published at: https://www.eckerson.com/articles/should-ai-bots-build-your-data-pipelines-part-iii-the-emergence-of-small-language-models-for-data-engineering

How Data Engineering Teams Power Machine Learning With Feature Platforms

2023-07-03 · Data Engineering Podcast Listen

podcast_episode

by Razi Raziuddin , Tobias Macey

AI/ML Data Management Python SaaS SQL

Summary

Feature engineering is a crucial aspect of the machine learning workflow. To make that possible, there are a number of technical and procedural capabilities that must be in place first. In this episode Razi Raziuddin shares how data engineering teams can support the machine learning workflow through the development and support of systems that empower data scientists and ML engineers to build and maintain their own features.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack Your host is Tobias Macey and today I'm interviewing Razi Raziuddin about how data engineers can empower data scientists to develop and deploy better ML models through feature engineering

Interview

Introduction How did you get involved in the area of data management? What is feature engineering is and why/to whom it matters?

A topic that commonly comes up in relation to feature engineering is the importance of a feature store. What are the tradeoffs for that to be a separate infrastructure/architecture component?

What is the overall lifecycle of a feature, from definition to deployment and maintenance?

How is this distinct from other forms of data pipeline development and delivery? Who are the participants in that workflow?

What are the sharp edges/roadblocks that typically manifest in that lifecycle? What are the interfaces that are needed for data scientists/ML engineers to be able to self-serve their feature management?

What is the role of the data engineer in supporting those interfaces? What are the communication/collaboration channels that are necessary to make the overall process a success?

From an implementation/architecture perspective, what are the patterns that you have seen teams build around for feature development/serving? What are the most interesting, innovative, or unexpected ways that you have seen feature platforms used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on feature engineering? What are the resources that you find most helpful in understanding and designing feature platforms?

Contact Info

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

FeatureByte DataRobot Feature Store Feast Feature Store Feathr Kaggle Yann LeCun

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Sponsored By: Rudderstack:

Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations fo