Ryan Dolley - Moving Beyond Dashboards & the Evolution of BI

2023-04-13 · The Joe Reis Show Listen

podcast_episode

by Ryan Dolley (GoodData) , Joe Reis (DeepLearning.AI)

AI/ML Analytics BI GenAI

Ryan Dolley and I chat about why BI needs to evolve, moving beyond dashboards, the impact of generative AI on analytics, SuperDataBros, and more.

data #analytics #businessintelligence #datascience

If you like this show, give it a 5-star rating on your favorite podcast platform.

Purchase Fundamentals of Data Engineering at your favorite bookseller.

Check out my substack: https://joereis.substack.com/

Snowflake SnowPro™ Advanced Architect Certification Companion: Hands-on Preparation and Practice

2023-04-13 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ruchi Soni

Analytics DWH Cyber Security Snowflake data data-engineering

Master the intricacies of Snowflake and prepare for the SnowPro Advanced Architect Certification exam with this comprehensive study companion. This book provides robust and effective study tools to help you prepare for the exam and is also designed for those who are interested in learning the advanced features of Snowflake. The practical examples and in-depth background on theory in this book help you unleash the power of Snowflake in building a high-performance system. The best practices demonstrated in the book help you use Snowflake more powerfully and effectively as a data warehousing and analytics platform. Reading this book and reviewing the concepts will help you gain the knowledge you need to take the exam. The book guides you through a study of the different domains covered on the exam: Accounts and Security, Snowflake Architecture, Data Engineering, and Performance Optimization. You’ll also be well positioned to apply your newly acquired practical skills to real-world Snowflake solutions. You will have a deep understanding of Snowflake to help you take full advantage of Snowflake’s architecture to deliver value analytics insight to your business. What You Will Learn Gain the knowledge you need to prepare for the exam Review in-depth theory on Snowflake to help you build high-performance systems Broaden your skills as a data warehouse designer to cover the Snowflake ecosystem Optimize performance and costs associated with your use of the Snowflake data platform Share data securely both inside your organization and with external partners Apply your practical skills to real-world Snowflake solutions Who This Book Is For Anyone who is planning to take the SnowPro Advanced Architect Certification exam, those who want to move beyond traditional database technologies and build their skills to design and architect solutions using Snowflake services, and veteran database professionals seeking an on-the-job reference to understand one of the newest and fastest-growing technologies in data

An Exploration Of The Composable Customer Data Platform

2023-04-10 · Data Engineering Podcast Listen

podcast_episode

by Tejas Manohar (Hightouch) , Darren Haken (Autotrader UK) , Tobias Macey

AI/ML CDP Cloud Computing Data Lake Data Management Data Modelling dbt DWH Python Data Streaming

Summary

The customer data platform is a category of services that was developed early in the evolution of the current era of cloud services for data processing. When it was difficult to wire together the event collection, data modeling, reporting, and activation it made sense to buy monolithic products that handled every stage of the customer data lifecycle. Now that the data warehouse has taken center stage a new approach of composable customer data platforms is emerging. In this episode Darren Haken is joined by Tejas Manohar to discuss how Autotrader UK is addressing their customer data needs by building on top of their existing data stack.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. Sign up free at dataengineeringpodcast.com/rudderstack Your host is Tobias Macey and today I'm interviewing Darren Haken and Tejas Manohar about building a composable CDP and how you can start adopting it incrementally

Interview

Introduction How did you get involved in the area of data management? Can you describe what you mean by a "composable CDP"?

What are some of the key ways that it differs from the ways that we think of a CDP today?

What are the problems that you were focused on addressing at Autotrader that are solved by a CDP? One of the promises of the first generation CDP was an opinionated way to model your data so that non-technical teams could own this responsibility. What do you see as the risks/tradeoffs of moving CDP functionality into the same data stack as the rest of the organization?

What about companies that don't have the capacity to run a full data infrastructure?

Beyond the core technology of the data warehouse, what are the other evolutions/innovations that allow for a CDP experience to be built on top of the core data stack? added burden on core data teams to generate event-driven data models When iterating toward a CDP on top of the core investment of the infrastructure to feed and manage a data warehouse, what are the typical first steps?

What are some of the components in the ecosystem that help to speed up the time to adoption? (e.g. pre-built dbt packages for common transformations, etc.)

What are the most interesting, innovative, or unexpected ways that you have seen CDPs implemented? What are the most interesting, unexpected, or challenging lessons that you have learned while working on CDP related functionality? When is a CDP (composable or monolithic) the wrong choice? What do you have planned for the future of the CDP stack?

Contact Info

Darren

LinkedIn @DarrenHaken on Twitter

Tejas

LinkedIn @tejasmanohar on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

Autotrader Hightouch

Customer Studio

CDP == Customer Data Platform Segment

Podcast Episode

mPar

Mastering Self-Learning in Machine Learning - Aaisha Muhammad

2023-04-07 · DataTalks.Club Listen

podcast_episode

by Aaisha Muhammad

AI/ML GitHub HTML LLM

We talked about:

Aaisha’s background How homeschooling affects self-study Deciding on what to learn about Establishing whether a resource is good How Aaisha focuses on learning Deciding on what kind of project to build Find research materials Aaisha’s experience with the Data Talks Club ML Zoomcamp ML Zoomcamp projects Aaisha’s interest in bioinformatics Keeping motivated with deadlines Notes and time-tracking tools Drawbacks to self-studying Aaisha’s interest in machine learning Aaisha’s least favorable part of ML Zoomcamp Helping people as a way to learn Using ChatGPT as a “study group” Is it possible to use self-studying to learn high-level topics Switching topics to avoid burnout Aaisha’s resource recommendations

Links:

LinkedIn: https://www.linkedin.com/in/aaisha-muhammad/ Twitter: https://twitter.com/ZealousMushroom Github: https://github.com/AaishaMuhammad Website: http://www.aaishamuhammad.co.za/

Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

5 Minute Friday - Is Data Modeling on Life Support?

2023-04-07 · The Joe Reis Show Listen

podcast_episode

by Joe Reis (DeepLearning.AI)

AI/ML Data Modelling Data Streaming

Is data modeling on life support? I posed this question to LinkedIn earlier this week. It got a fair number of replies, some supportive and others saying I'm full of sh*t. In this 5 minute Friday nerdy rant, I unpack what I mean by data modeling being on life support, and where I think data modeling needs to go given newer practices like streaming and machine learning, which aren't currently discussed in data modeling circles.

LinkedIn post about data modeling on life support: https://www.linkedin.com/posts/josephreis_dataengineering-datamodeling-data-activity-7048722463010013185-OyIy

dataengineering #datamodel #data

If you like this show, give it a 5-star rating on your favorite podcast platform.

Purchase Fundamentals of Data Engineering at your favorite bookseller.

Check out my substack: https://joereis.substack.com/

Shane Gibson - Making Data Modeling Accessible

2023-04-07 · The Joe Reis Show Listen

podcast_episode

by Shane Gibson , Joe Reis (DeepLearning.AI)

Data Modelling

Shane Gibson joins the show to discuss how to make data modeling more accessible, why the world's moved past traditional data modeling, enabling data mesh, and more.

Shane's LinkedIn: https://www.linkedin.com/in/shagility/

Shagility: https://shagility.nz/

Shane's podcasts: https://shagility.nz/podcasts/

If you like this show, give it a 5-star rating on your favorite podcast platform.

Purchase Fundamentals of Data Engineering at your favorite book seller.

Check out my substack: https://joereis.substack.com/

Mapping The Data Infrastructure Landscape As A Venture Capitalist

2023-04-03 · Data Engineering Podcast Listen

podcast_episode

by Matt Turck (FirstMark Capital) , Tobias Macey

AI/ML Cloud Computing Data Management Databricks Dataiku dbt DuckDB ETL/ELT GenAI Hadoop Hudi Iceberg +5 more

Summary

The data ecosystem has been building momentum for several years now. As a venture capital investor Matt Turck has been trying to keep track of the main trends and has compiled his findings into the MAD (ML, AI, and Data) landscape reports each year. In this episode he shares his experiences building those reports and the perspective he has gained from the exercise.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Businesses that adapt well to change grow 3 times faster than the industry average. As your business adapts, so should your data. RudderStack Transformations lets you customize your event data in real-time with your own JavaScript or Python code. Join The RudderStack Transformation Challenge today for a chance to win a $1,000 cash prize just by submitting a Transformation to the open-source RudderStack Transformation library. Visit dataengineeringpodcast.com/rudderstack today to learn more Your host is Tobias Macey and today I'm interviewing Matt Turck about his annual report on the Machine Learning, AI, & Data landscape and the insights around data infrastructure that he has gained in the process

Interview

Introduction How did you get involved in the area of data management? Can you describe what the MAD landscape report is and the story behind it?

At a high level, what is your goal in the compilation and maintenance of your landscape document? What are your guidelines for what to include in the landscape?

As the data landscape matures, how have you seen that influence the types of projects/companies that are founded?

What are the product categories that were only viable when capital was plentiful and easy to obtain? What are the product categories that you think will be swallowed by adjacent concerns, and which are likely to consolidate to remain competitive?

The rapid growth and proliferation of data tools helped establish the "Modern Data Stack" as a de-facto architectural paradigm. As we move into this phase of contraction, what are your predictions for how the "Modern Data Stack" will evolve?

Is there a different architectural paradigm that you see as growing to take its place?

How has your presentation and the types of information that you collate in the MAD landscape evolved since you first started it?~~ What are the most interesting, innovative, or unexpected product and positioning approaches that you have seen while tracking data infrastructure as a VC and maintainer of the MAD landscape? What are the most interesting, unexpected, or challenging lessons that you have learned while working on the MAD landscape over the years? What do you have planned for future iterations of the MAD landscape?

Contact Info

Website @mattturck on Twitter MAD Landscape Comments Email

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

MAD Landscape First Mark Capital Bayesian Learning AI Winter Databricks Cloud Native Landscape LUMA Scape Hadoop Ecosystem Modern Data Stack Reverse ETL Generative AI dbt Transform

Podcast Episode

Snowflake IPO Dataiku Iceberg

Podcast Episode

Hudi

Podcast Episode

DuckDB

Podcast Episode

Trino Y42

Podcast Episode

Mozart Data

Podcast Episode

Keboola MPP Database

The intro and outro music is f

The Secret Sauce of Data Science Management - Shir Meir Lador

2023-03-31 · DataTalks.Club Listen

podcast_episode

by Shir Meir Lador

Agile/Scrum AI/ML Data Science GitHub HTML

We talked about:

Shir’s background Debrief culture The responsibilities of a group manager Defining the success of a DS manager The three pillars of data science management Managing up Managing down Managing across Managing data science teams vs business teams Scrum teams, brainstorming, and sprints The most important skills and strategies for DS and ML managers Making sure proof of concepts get into production

Links:

The secret sauce of data science management: https://www.youtube.com/watch?v=tbBfVHIh-38 Lessons learned leading AI teams: https://blogs.intuit.com/2020/06/23/lessons-learned-leading-ai-teams/ How to avoid conflicts and delays in the AI development process (Part I): https://blogs.intuit.com/2020/12/08/how-to-avoid-conflicts-and-delays-in-the-ai-development-process-part-i/ How to avoid conflicts and delays in the AI development process (Part II): https://blogs.intuit.com/2021/01/06/how-to-avoid-conflicts-and-delays-in-the-ai-development-process-part-ii/ Leading AI teams deck: https://drive.google.com/drive/folders/1_CnqjugtsEbkIyOUKFHe48BeRttX0uJG Leading AI teams video: https://www.youtube.com/watch?app=desktop&v=tbBfVHIh-38

Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

[Radar Recap] Unleashing the Power of Data Teams in 2023

2023-03-30 · DataFramed Listen

podcast_episode

by Vijay Yadav (Center for Mathematical Sciences at Merck) , Vanessa Gonzalez (Transamerica)

AI/ML Analytics BI Data Science

In 2023, businesses are relying more heavily on data science and analytics teams than ever before. However, simply having a team of talented individuals is not enough to guarantee success. In the last of our RADAR 2023 sessions, Vijay Yadav and Vanessa Gonzalez will outline the keys to building high-impact data teams in 2023. They will discuss what are the hallmarks of a high-performing data team, the importance of diversity of background and skillset needed to build impactful data teams, setting up career pathways for data scientists, and more. Vijay Yadav is a highly respected data and analytics thought leader with over 20 years of experience in data product development, data engineering, and advanced analytics. As Director of Quantitative Sciences - Digital, Data, and Analytics at Merck, he leads data & analytics teams in creating AI/ML-driven data products to drive digital transformation. Vijay has held numerous leadership positions at various companies and is known for his ability to lead global teams to achieve high-impact results. Vanessa Gonzalez is the Sr. Director of Data Science and Innovation at Businessolver where she leads the Computational Linguistics, Machine Learning Engineering, Data Science, BI Analytics, and BI Engineering teams. She is experienced in leading data transformations, performing analytical and management functions that contribute to the goals and growth objectives of organizations and divisions. Listen in as Vanessa and Vijay share how to enable data teams to flourish in an ever-evolving data landscape.

Unlocking The Potential Of Streaming Data Applications Without The Operational Headache At Grainite

2023-03-25 · Data Engineering Podcast Listen

podcast_episode

by Ashish Kumar (Grainite) , Abhishek Chauhan (Grainite) , Tobias Macey

AI/ML Data Management Data Science JavaScript Modern Data Stack Python React Data Streaming

Summary

The promise of streaming data is that it allows you to react to new information as it happens, rather than introducing latency by batching records together. The peril is that building a robust and scalable streaming architecture is always more complicated and error-prone than you think it's going to be. After experiencing this unfortunate reality for themselves, Abhishek Chauhan and Ashish Kumar founded Grainite so that you don't have to suffer the same pain. In this episode they explain why streaming architectures are so challenging, how they have designed Grainite to be robust and scalable, and how you can start using it today to build your streaming data applications without all of the operational headache.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Businesses that adapt well to change grow 3 times faster than the industry average. As your business adapts, so should your data. RudderStack Transformations lets you customize your event data in real-time with your own JavaScript or Python code. Join The RudderStack Transformation Challenge today for a chance to win a $1,000 cash prize just by submitting a Transformation to the open-source RudderStack Transformation library. Visit dataengineeringpodcast.com/rudderstack today to learn more Hey there podcast listener, are you tired of dealing with the headache that is the 'Modern Data Stack'? We feel your pain. It's supposed to make building smarter, faster, and more flexible data infrastructures a breeze. It ends up being anything but that. Setting it up, integrating it, maintaining it—it’s all kind of a nightmare. And let's not even get started on all the extra tools you have to buy to get it to do its thing. But don't worry, there is a better way. TimeXtender takes a holistic approach to data integration that focuses on agility rather than fragmentation. By bringing all the layers of the data stack together, TimeXtender helps you build data solutions up to 10 times faster and saves you 70-80% on costs. If you're fed up with the 'Modern Data Stack', give TimeXtender a try. Head over to dataengineeringpodcast.com/timextender where you can do two things: watch us build a data estate in 15 minutes and start for free today. Join in with the event for the global data community, Data Council Austin. From March 28-30th 2023, they'll play host to hundreds of attendees, 100 top speakers, and dozens of startups that are advancing data science, engineering and AI. Data Council attendees are amazing founders, data scientists, lead engineers, CTOs, heads of data, investors and community organizers who are all working together to build the future of data. As a listener to the Data Engineering Podcast you can get a special discount of 20% off your ticket by using the promo code dataengpod20. Don't miss out on their only event this year! Visit: dataengineeringpodcast.com/data-council today Your host is Tobias Macey and today I'm interviewing Ashish Kumar and Abhishek Chauhan about Grainite, a platform designed to give you a single place to build streaming data applications

Interview

Introduction How did you get involved in the area of data management? Can you describe what Grainite is and the story behind it? What are the personas that you are focused on addressing with Grainite? What are some of the most complex aspects of building streaming data applications in the absence of something like Grainite?

How does Grainite work to reduce that complexity?

What are some of the commonalities that you see in the teams/organizations that find their way to Grainite?

What are some of the higher-order projects that teams are able to build when they are using Grainite as a starting point vs. where they would be spending effort on a fully managed streaming architecture?

Can you describe how Grainite is architected?

How have the design and goals of the platform changed/evolved since you first started working on it?

Wh

SE4ML - Software Engineering for Machine Learning - Nadia Nahar

2023-03-24 · DataTalks.Club Listen

podcast_episode

by Nadia Nahar

Agile/Scrum AI/ML GitHub HTML

We talked about:

Nadia’s background Academic research in software engineering Design patterns Software engineering for ML systems Problems that people in industry have with software engineering and ML Communication issues and setting requirements Artifact research in open source products Product vs model Nadia’s open source product dataset Failure points in machine learning projects Finding solutions to issues using Nadia’s dataset and experience The problem of siloing data scientists and other structure issues The importance of documentation and checklists Responsible AI How data scientists and software engineers can work in an Agile way

Links:

Model Card: https://arxiv.org/abs/1810.03993 Datasheets: https://arxiv.org/abs/1803.09010 Factsheets: https://arxiv.org/abs/1808.07261 Research Paper: https://www.cs.cmu.edu/~ckaestne/pdf/icse22_seai.pdf Arxiv version: https://arxiv.org/pdf/2110.

Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Dave Langer - Excel is Awesome! Teaching Data to the Masses

2023-03-22 · The Joe Reis Show Listen

podcast_episode

by Dave Langer , Joe Reis (DeepLearning.AI)

Dave Langer teaches data literacy with the world's most popular data tool - Excel. We chat about why Excel is awesome, ways to teach data to the masses, and much more.

Dave Langer LinkedIn: https://www.linkedin.com/in/davelanger/

Dave on Data YouTube: https://www.youtube.com/@davidlanger8217

Website: https://www.daveondata.com/

If you like this show, give it a 5-star rating on your favorite podcast platform.

Purchase Fundamentals of Data Engineering at your favorite book seller.

Check out my substack: https://joereis.substack.com/

Aligning Data Security With Business Productivity To Deploy Analytics Safely And At Speed

2023-03-19 · Data Engineering Podcast Listen

podcast_episode

by Yoav Cohen (Satori) , Tobias Macey

AI/ML Analytics CDP Data Management Data Science JavaScript Modern Data Stack Python Cyber Security

Summary

As with all aspects of technology, security is a critical element of data applications, and the different controls can be at cross purposes with productivity. In this episode Yoav Cohen from Satori shares his experiences as a practitioner in the space of data security and how to align with the needs of engineers and business users. He also explains why data security is distinct from application security and some methods for reducing the challenge of working across different data systems.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Join in with the event for the global data community, Data Council Austin. From March 28-30th 2023, they'll play host to hundreds of attendees, 100 top speakers, and dozens of startups that are advancing data science, engineering and AI. Data Council attendees are amazing founders, data scientists, lead engineers, CTOs, heads of data, investors and community organizers who are all working together to build the future of data. As a listener to the Data Engineering Podcast you can get a special discount of 20% off your ticket by using the promo code dataengpod20. Don't miss out on their only event this year! Visit: dataengineeringpodcast.com/data-council today RudderStack makes it easy for data teams to build a customer data platform on their own warehouse. Use their state of the art pipelines to collect all of your data, build a complete view of your customer and sync it to every downstream tool. Sign up for free at dataengineeringpodcast.com/rudder Hey there podcast listener, are you tired of dealing with the headache that is the 'Modern Data Stack'? We feel your pain. It's supposed to make building smarter, faster, and more flexible data infrastructures a breeze. It ends up being anything but that. Setting it up, integrating it, maintaining it—it’s all kind of a nightmare. And let's not even get started on all the extra tools you have to buy to get it to do its thing. But don't worry, there is a better way. TimeXtender takes a holistic approach to data integration that focuses on agility rather than fragmentation. By bringing all the layers of the data stack together, TimeXtender helps you build data solutions up to 10 times faster and saves you 70-80% on costs. If you're fed up with the 'Modern Data Stack', give TimeXtender a try. Head over to dataengineeringpodcast.com/timextender where you can do two things: watch us build a data estate in 15 minutes and start for free today. Your host is Tobias Macey and today I'm interviewing Yoav Cohen about the challenges that data teams face in securing their data platforms and how that impacts the productivity and adoption of data in the organization

Interview

Introduction How did you get involved in the area of data management? Data security is a very broad term. Can you start by enumerating some of the different concerns that are involved? How has the scope and complexity of implementing security controls on data systems changed in recent years?

In your experience, what is a typical number of data locations that an organization is trying to manage access/permissions within?

What are some of the main challenges that data/compliance teams face in establishing and maintaining security controls?

How much of the problem is technical vs. procedural/organizational?

As a vendor in the space, how do you think about the broad categories/boundary lines for the different elements of data security? (e.g. masking vs. RBAC, etc.)

What are the different layers that are best suited to managing each of those categories? (e.g. masking and encryption in storage layer, RBAC in warehouse, etc.)

What are some of the ways that data security and organizational productivity are at odds with each other?

What are some of the shortcuts that you see teams and individuals taking to address the productivity hit from security controls?

What are some of the methods that you have found to be most effective at mitigating or even improving productivity impacts through security controls?

How does up-front design of the security layers improve the final outcome vs. trying to bolt on security after the platform is already in use? How can education about the motivations for different security practices improve compliance and user experience?

What are the most interesting, innovative, or unexpected ways that you have seen data teams align data security and productivity? What are the most interesting, unexpected, or challenging lessons that you have learned while working on data security technology? What are the areas of data security that still need improvements?

Contact Info

Yoav Cohen

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

Satori

Podcast Episode

Data Masking RBAC == Role Based Access Control ABAC == Attribute Based Access Control Gartner Data Security Platform Report

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Sponsored By: Rudderstack: Businesses that adapt well to change grow 3 times faster than the industry average. As your business adapts, so should your data. RudderStack Transformations lets you customize your event data in real-time with your own JavaScript or Python code. Join The RudderStack Transformation Challenge today for a chance to win a $1,000 cash prize just by submitting a Transformation to the open-source RudderStack Transformation library. Visit RudderStack.com/DEP to learn moreData Council: Data Council Logo Join us at the event for the global data community, Data Council Austin. From March 28-30th 2023, we'll play host to hundreds of attendees, 100 top speakers, and dozens of startups that are advancing data science, engineering and AI. Data Council attendees are amazing founders, data scientists, lead engineers, CTOs, heads of data, investors and community organizers who are all working together to build the future of data. As a listener to the Data Engineering Podcast you can get a special discount off tickets by using the promo code dataengpod20. Don't miss out on our only event this year! Visit: dataengineeringpodcast.com/data-council Promo Code: dataengpod20TimeXtender: TimeXtender Logo TimeXtender is a holistic, metadata-driven solution for data integration, optimized for agility. TimeXtender provides all the features you need to build a future-proof infrastructure for ingesting, transforming, modelling, and delivering clean, reliable data in the fastest, most efficient way possible.

You can't optimize for everything all at once. That's why we take a holistic approach to data integration that optimises for agility instead of fragmentation. By unifying each layer of the data stack, TimeXtender empowers you to build data solutions 10x faster while reducing costs by 70%-80%. We do this for one simple reason: because time matters.

Go to dataengineeringpodcast.com/timextender today to get started for free!Support Data Engineering Podcast

Starting a Consultancy in the Data Space - Aleksander Kruszelnicki

2023-03-17 · DataTalks.Club Listen

podcast_episode

by Aleksander Kruszelnicki

GitHub HTML Marketing

We talked about:

Aleksander’s background The difficulty of selling data stack as a service How Aleksander got into consulting The Mom Test – extracting feedback from people User interviews Why Aleksander’s data stack as a service startup was not viable How Aleksander decided to switch to consulting Finding clients to consult Figuring out how to position your services Geographical limitations Figuring out your target audience The importance of networking and marketing Pricing your services The pitfalls of daily and hourly pricing and how to balance incentives Is Germany a good place to found a company? Aleksander’s book recommendations

Links:

LinkedIn: https://www.linkedin.com/in/alkrusz/ Twitter: https://twitter.com/alkrusz Website: www.leukos.io

Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Zach Wilson - REAL talk on being a data engineer, entrepreneurship, ADHD, and much more

2023-03-15 · The Joe Reis Show Listen

podcast_episode

by Zach Wilson (Airbnb) , Joe Reis (DeepLearning.AI)

Zach Wilson is one of my favorite people, and when we chat, it's total honesty and great vibes. In this episode, we discuss his transition from a staff data engineer at Airbnb to an entrepreneur (!), and we both talk about our experiences with ADHD, the data engineering field today, content creation, and a ton in between.

If you like this show, give it a 5-star rating on your favorite podcast platform.

Purchase Fundamentals of Data Engineering at your favorite book seller.

Check out my substack: https://joereis.substack.com/

Biohacking for Data Scientists and ML Engineers - Ruslan Shchuchkin

2023-03-10 · DataTalks.Club Listen

podcast_episode

by Ruslan Shchuchkin

AI/ML GitHub HTML

We talked about:

Ruslan’s background Fighting procrastination and perfectionism What is biohacking? The role of dopamine and other hormones in daily life How meditation can help The influence light has on our bodies Behavioral biohacking Daylight lamps and using light to wake up Sleep cycles How nutrition affects productivity Measuring productivity Examples of unsuccessful biohacking attempts Stoicism, voluntary discomfort, and self-challenges Biohacking risks and ways to prevent them Coffee and tea biohacking Using self-reflection and tracking to measure results Mindset shifting Stoicism book recommendation Work/life balance Ruslan’s biohacking resource recommendation

Links:

LinkedIn: https://www.linkedin.com/in/ruslanshchuchkin/

ree data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Use Your Data Warehouse To Power Your Product Analytics With NetSpring

2023-03-10 · Data Engineering Podcast Listen

podcast_episode

by Priyendra Deshwal (NetSpring) , Tobias Macey

AI/ML Amplitude Analytics API CDP Data Lake Data Management Data Science DWH ETL/ELT GDPR/CCPA Mixpanel +3 more

Summary

With the rise of the web and digital business came the need to understand how customers are interacting with the products and services that are being sold. Product analytics has grown into its own category and brought with it several services with generational differences in how they approach the problem. NetSpring is a warehouse-native product analytics service that allows you to gain powerful insights into your customers and their needs by combining your event streams with the rest of your business data. In this episode Priyendra Deshwal explains how NetSpring is designed to empower your product and data teams to build and explore insights around your products in a streamlined and maintainable workflow.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Join in with the event for the global data community, Data Council Austin. From March 28-30th 2023, they'll play host to hundreds of attendees, 100 top speakers, and dozens of startups that are advancing data science, engineering and AI. Data Council attendees are amazing founders, data scientists, lead engineers, CTOs, heads of data, investors and community organizers who are all working together to build the future of data. As a listener to the Data Engineering Podcast you can get a special discount of 20% off your ticket by using the promo code dataengpod20. Don't miss out on their only event this year! Visit: dataengineeringpodcast.com/data-council today! RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. Sign up free at dataengineeringpodcast.com/rudder Your host is Tobias Macey and today I'm interviewing Priyendra Deshwal about how NetSpring is using the data warehouse to deliver a more flexible and detailed view of your product analytics

Interview

Introduction How did you get involved in the area of data management? Can you describe what NetSpring is and the story behind it?

What are the activities that constitute "product analytics" and what are the roles/teams involved in those activities?

When teams first come to you, what are the common challenges that they are facing and what are the solutions that they have attempted to employ? Can you describe some of the challenges involved in bringing product analytics into enterprise or highly regulated environments/industries?

How does a warehouse-native approach simplify that effort?

There are many different players (both commercial and open source) in the product analytics space. Can you share your view on the role that NetSpring plays in that ecosystem? How is the NetSpring platform implemented to be able to best take advantage of modern warehouse technologies and the associated data stacks?

What are the pre-requisites for an organization's infrastructure/data maturity for being able to benefit from NetSpring? How have the goals and implementation of the NetSpring platform evolved from when you first started working on it?

Can you describe the steps involved in integrating NetSpring with an organization's existing warehouse?

What are the signals that NetSpring uses to understand the customer journeys of different organizations? How do you manage the variance of the data models in the warehouse while providing a consistent experience for your users?

Given that you are a product organization, how are you using NetSpring to power NetSpring? What are the most interesting, innovative, or unexpected ways that you have seen NetSpring used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on NetSpring? When is NetSpring the wrong choice? What do you have planned for the future of NetSpring?

Contact Info

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

NetSpring ThoughtSpot Product Analytics Amplitude Mixpanel Customer Data Platform GDPR CCPA Segment

Podcast Episode

Rudderstack

Podcast Episode

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Sponsored By: TimeXtender: TimeXtender Logo TimeXtender is a holistic, metadata-driven solution for data integration, optimized for agility. TimeXtender provides all the features you need to build a future-proof infrastructure for ingesting, transforming, modelling, and delivering clean, reliable data in the fastest, most efficient way possible.

You can't optimize for everything all at once. That's why we take a holistic approach to data integration that optimises for agility instead of fragmentation. By unifying each layer of the data stack, TimeXtender empowers you to build data solutions 10x faster while reducing costs by 70%-80%. We do this for one simple reason: because time matters.

Go to dataengineeringpodcast.com/timextender today to get started for free!Rudderstack:

RudderStack provides all your customer data pipelines in one platform. You can collect, transform, and route data across your entire stack with its event streaming, ETL, and reverse ETL pipelines.

RudderStack’s warehouse-first approach means it does not store sensitive information, and it allows you to leverage your existing data warehouse/data lake infrastructure to build a single source of truth for every team.

RudderStack also supports real-time use cases. You can Implement RudderStack SDKs once, then automatically send events to your warehouse and 150+ business tools, and you’ll never have to worry about API changes again.

Visit dataengineeringpodcast.com/rudderstack to sign up for free today, and snag a free T-Shirt just for being a Data Engineering Podcast listener.Data Council: Data Council Logo Join us at the event for the global data community, Data Council Austin. From March 28-30th 2023, we'll play host to hundreds of attendees, 100 top speakers, and dozens of startups that are advancing data science, engineering and AI. Data Council attendees are amazing founders, data scientists, lead engineers, CTOs, heads of data, investors and community organizers who are all working together to build the future of data. As a listener to the Data Engineering Podcast you can get a special discount off tickets by using the promo code dataengpod20. Don't miss out on our only event this year! Visit: dataengineeringpodcast.com/data-council Promo Code: dataengpod20Support Data Engineering Podcast

Chris Tabb - Data Monetization and Business Value

2023-03-08 · The Joe Reis Show Listen

podcast_episode

by Chris Tabb (LEIT DATA) , Joe Reis (DeepLearning.AI)

Chris Tabb (LEIT Data) and I hang out at my house and chat about data monetization and business value.

What the heck are those things? Good question. Listen and find out.

LEIT Data: https://www.leit-data.com/

Chris Tabb: https://www.linkedin.com/in/chris-tabb-datatips/

If you like this show, give it a 5-star rating on your favorite podcast platform.

Purchase Fundamentals of Data Engineering at your favorite book seller.

Check out my substack: https://joereis.substack.com/

Exploring The Nuances Of Building An Intentional Data Culture

2023-03-06 · Data Engineering Podcast Listen

podcast_episode

by Pete Soderling (Data Council) , Maggie Hays (DataHub) , Tobias Macey

AI/ML Data Management dbt Modern Data Stack Python

Summary

The ecosystem for data professionals has matured to the point that there are a large and growing number of distinct roles. With the scope and importance of data steadily increasing it is important for organizations to ensure that everyone is aligned and operating in a positive environment. To help facilitate the nascent conversation about what constitutes an effective and productive data culture, the team at Data Council have dedicated an entire conference track to the subject. In this episode Pete Soderling and Maggie Hays join the show to explore this topic and their experience preparing for the upcoming conference.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Hey there podcast listener, are you tired of dealing with the headache that is the 'Modern Data Stack'? We feel your pain. It's supposed to make building smarter, faster, and more flexible data infrastructures a breeze. It ends up being anything but that. Setting it up, integrating it, maintaining it—it’s all kind of a nightmare. And let's not even get started on all the extra tools you have to buy to get it to do its thing. But don't worry, there is a better way. TimeXtender takes a holistic approach to data integration that focuses on agility rather than fragmentation. By bringing all the layers of the data stack together, TimeXtender helps you build data solutions up to 10 times faster and saves you 70-80% on costs. If you're fed up with the 'Modern Data Stack', give TimeXtender a try. Head over to dataengineeringpodcast.com/timextender where you can do two things: watch us build a data estate in 15 minutes and start for free today. Your host is Tobias Macey and today I'm interviewing Pete Soderling and Maggie Hays about the growing importance of establishing and investing in an organization's data culture and their experience forming an entire conference track around this topic

Interview

Introduction How did you get involved in the area of data management? Can you describe what your working definition of "Data Culture" is?

In what ways is a data culture distinct from an organization's corporate culture? How are they interdependent? What are the elements that are most impactful in forming the data culture of an organization?

What are some of the motivations that teams/companies might have in fighting against the creation and support of an explicit data culture?

Are there any strategies that you have found helpful in counteracting those tendencies?

In terms of the conference, what are the factors that you consider when deciding how to group the different presentations into tracks or themes?

What are the experiences that you have had personally and in community interactions that led you to elevate data culture to be it's own track?

What are the broad challenges that practitioners are facing as they develop their own understanding of what constitutes a healthy and productive data culture? What are some of the risks that you considered when forming this track and evaluating proposals? What are your criteria for determining whether this track is successful? What are the most interesting, innovative, or unexpected aspects of data culture that you have encountered through developing this track? What are the most interesting, unexpected, or challenging lessons that you have learned while working on selecting presentations for this year's event? What do you have planned for the future of this topic at Data Council events?

Contact Info

Pete

@petesoder on Twitter LinkedIn

Maggie

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

Data Council

Podcast Episode

Data Community Fund DataHub

Podcast Episode

Database Design For Mere Mortals by Michael J. Hernandez (affiliate link) SOAP REST Econometrics DBA == Database Administrator Conway's Law dbt

Podcast Episode

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Sponsored By: TimeXtender: TimeXtender Logo TimeXtender is a holistic, metadata-driven solution for data integration, optimized for agility. TimeXtender provides all the features you need to build a future-proof infrastructure for ingesting, transforming, modelling, and delivering clean, reliable data in the fastest, most efficient way possible.

You can't optimize for everything all at once. That's why we take a holistic approach to data integration that optimises for agility instead of fragmentation. By unifying each layer of the data stack, TimeXtender empowers you to build data solutions 10x faster while reducing costs by 70%-80%. We do this for one simple reason: because time matters.

Go to dataengineeringpodcast.com/timextender today to get started for free!Support Data Engineering Podcast

Analytics for a Better World - Parvathy Krishnan

2023-03-03 · DataTalks.Club Listen

podcast_episode

by Parvathy Krishnan (Analytics for a Better World)

Analytics GitHub HTML

We talked about:

Parvathy’s background Brainstorming sessions with nonprofits to establish data maturity Example of an Analytics for a Better World project The overall data maturity situation of nonprofits vs private sector Solving the skill gap Publicly available content The Analytics for a Better World Academy The Academy’s target audience How researchers can work with Analytics for a Better World Improving data maturity in nonprofit organizations People, processes, and technology Typical tools that Analytics for a Better World recommends to nonprofits Profiles in nonprofits Does Analytics for a Better World has a need for data engineers? The Analytics for a Better World team Factors that help organizations become more data-driven Parvathy’s resource recommendations

Links:

LinkedIn: https://www.linkedin.com/in/parvathykrishnank/ Twitter: https://twitter.com/ABWInstitute Github: https://github.com/Analytics-for-a-Better-World Website: https://analyticsbetterworld.org/

Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

talk-data.com

Data Engineering

Activity Trend

Top Events

Top Speakers

Ryan Dolley - Moving Beyond Dashboards & the Evolution of BI

data #analytics #businessintelligence #datascience

Snowflake SnowPro™ Advanced Architect Certification Companion: Hands-on Preparation and Practice

An Exploration Of The Composable Customer Data Platform

Mastering Self-Learning in Machine Learning - Aaisha Muhammad

5 Minute Friday - Is Data Modeling on Life Support?

dataengineering #datamodel #data

Shane Gibson - Making Data Modeling Accessible

Mapping The Data Infrastructure Landscape As A Venture Capitalist

The Secret Sauce of Data Science Management - Shir Meir Lador

[Radar Recap] Unleashing the Power of Data Teams in 2023

Unlocking The Potential Of Streaming Data Applications Without The Operational Headache At Grainite

SE4ML - Software Engineering for Machine Learning - Nadia Nahar

Dave Langer - Excel is Awesome! Teaching Data to the Masses

Aligning Data Security With Business Productivity To Deploy Analytics Safely And At Speed

Starting a Consultancy in the Data Space - Aleksander Kruszelnicki

Zach Wilson - REAL talk on being a data engineer, entrepreneurship, ADHD, and much more

Biohacking for Data Scientists and ML Engineers - Ruslan Shchuchkin

Use Your Data Warehouse To Power Your Product Analytics With NetSpring

Chris Tabb - Data Monetization and Business Value

Exploring The Nuances Of Building An Intentional Data Culture

Analytics for a Better World - Parvathy Krishnan