talk-data.com talk-data.com

Topic

Analytics

data_analysis insights metrics

4552

tagged

Activity Trend

398 peak/qtr
2020-Q1 2026-Q1

Activities

4552 activities · Newest first

Summary

Building a database engine requires a substantial amount of engineering effort and time investment. Over the decades of research and development into building these software systems there are a number of common components that are shared across implementations. When Paul Dix decided to re-write the InfluxDB engine he found the Apache Arrow ecosystem ready and waiting with useful building blocks to accelerate the process. In this episode he explains how he used the combination of Apache Arrow, Flight, Datafusion, and Parquet to lay the foundation of the newest version of his time-series database.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. It is an open-source, cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. Your team can get up and running in minutes thanks to Dagster Cloud, an enterprise-class hosted solution that offers serverless and hybrid deployments, enhanced security, and on-demand ephemeral test deployments. Go to dataengineeringpodcast.com/dagster today to get started. Your first 30 days are free! Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Join us at the top event for the global data community, Data Council Austin. From March 26-28th 2024, we'll play host to hundreds of attendees, 100 top speakers and dozens of startups that are advancing data science, engineering and AI. Data Council attendees are amazing founders, data scientists, lead engineers, CTOs, heads of data, investors and community organizers who are all working together to build the future of data and sharing their insights and learnings through deeply technical talks. As a listener to the Data Engineering Podcast you can get a special discount off regular priced and late bird tickets by using the promo code dataengpod20. Don't miss out on our only event this year! Visit dataengineeringpodcast.com/data-council and use code dataengpod20 to register today! Your host is Tobias Macey and today I'm interviewing Paul Dix about his investment in the Apache Arrow ecosystem and how it led him to create the latest PFAD in database design

Interview

Introduction How did you get involved in the area of data management? Can you start by describing the FDAP stack and how the components combine to provide a foundational architecture for database engines?

This was the core of your recent re-write of the InfluxDB engine. What were the design goals and constraints that led you to this architecture?

Each of the architectural components are well engineered for their particular scope. What is the engineering work that is involved in building a cohesive platform from those components? One of the major benefits of using open source components is the network effect of ecosystem integrations. That can also be a risk when the community vision for the project doesn't align with your own goals. How have you worked to mitigate that risk in your specific platform? Can you describe the

Benn Stancil, cofounder and CTO at Mode, returns to The Analytics Engineering Podcast to discuss the evolution of the term "modern data stack" and its value today. Tristan wrote on this idea for The Analytics Engineering Roundup in Is the Modern Data Stack Still a Useful Idea? For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.

Inside Economics welcomes back Mark Calabria, the former director of the Federal Housing Finance Agency. We discuss the current housing affordability crisis and what policymakers should do to address it, the FHFA’s response to the COVID-19 pandemic, and the risks posed by nonbank mortgage companies. The group also takes up the role of the Federal Home Loan Banks. Plenty of debate, and even some agreement. For more info on Mark Calabria For more info on Mark Calabria's book, Shelter from the Storm, click here Follow Mark Zandi @MarkZandi, Cris deRitis @MiddleWayEcon, and Marisa DiNatale on LinkedIn for additional insight.

Questions or Comments, please email us at [email protected]. We would love to hear from you.    To stay informed and follow the insights of Moody's Analytics economists, visit Economic View.

podcast_episode
by Cris deRitis , Mark Zandi (Moody's Analytics) , Marisa DiNatale (Moody's Analytics)

Mark, Marisa, and Cris take a deep dive into current housing market trends.  They consider the demand and supply drivers that are depressing existing home sales and pushing homebuyers towards new construction.  Along the way, the team deconstructs mortgage rates and provides its best estimate of the nation's housing deficit.  Mark challenges Marisa and Cris to come up with solutions to the housing crisis and wonders if we'll ever experience another sharp increase in foreclosures.   Follow Mark Zandi @MarkZandi, Cris deRitis @MiddleWayEcon, and Marisa DiNatale on LinkedIn for additional insight.

Questions or Comments, please email us at [email protected]. We would love to hear from you.    To stay informed and follow the insights of Moody's Analytics economists, visit Economic View.

We’ve heard so much about the value and capabilities of generative AI over the past year, and we’ve all become accustomed to the chat interfaces of our preferred models. One of the main concerns many of us have had has been privacy. Is OpenAI keeping the data and information I give to ChatGPT secure? One of the touted solutions to this problem is running LLMs locally on your own machine, but with the hardware cost that comes with it, running LLMs locally has not been possible for many of us. That might now be starting to change. Nuri Canyaka is VP of AI Marketing at Intel. Prior to Intel, Nuri spent 16 years at Microsoft, starting out as a Technical Evangelist, and leaving the organization as the Senior Director of Product Marketing. He ran the GTM team that helped generate adoption of GPT in Microsoft Azure products. La Tiffaney Santucci is Intel’s AI Marketing Director, specializing in their Edge and Client products. La Tiffaney has spent over a decade at Intel, focussing on partnerships with Dell, Google Amazon and Microsoft.  In the episode, Richie, Nuri and La Tiffaney explore AI’s impact on marketing analytics, the adoptions of AI in the enterprise, how AI is being integrated into existing products, the workflow for implementing AI into business processes and the challenges that come with it, the importance of edge AI for instant decision-making in uses-cases like self-driving cars, the emergence of AI engineering as a distinct field of work, the democratization of AI, what the state of AGI might look like in the near future and much more.  About the AI and the Modern Data Stack DataFramed Series This week we’re releasing 4 episodes focused on how AI is changing the modern data stack and the analytics profession at large. The modern data stack is often an ambiguous and all-encompassing term, so we intentionally wanted to cover the impact of AI on the modern data stack from different angles. Here’s what you can expect: Why the Future of AI in Data will be Weird with Benn Stancil, CTO at Mode & Field CTO at ThoughtSpot — Covering how AI will change analytics workflows and tools How Databricks is Transforming Data Warehousing and AI with Ari Kaplan, Head Evangelist & Robin Sutara, Field CTO at Databricks — Covering Databricks, data intelligence and how AI tools are changing data democratizationAdding AI to the Data Warehouse with Sridhar Ramaswamy, CEO at Snowflake — Covering Snowflake and its uses, how generative AI is changing the attitudes of leaders towards data, and how to improve your data managementAccelerating AI Workflows with Nuri Cankaya, VP of AI Marketing & La Tiffaney Santucci, AI Marketing Director at Intel — Covering AI’s impact on marketing analytics, how AI is being integrated into existing products, and the democratization of AI Links Mentioned in the Show: Intel OpenVINO™ toolkitIntel Developer Clouds for Accelerated ComputingAWS Re:Invent[Course] Implementing AI Solutions in BusinessRelated Episode: Intel CTO Steve Orrin on How Governments Can Navigate the Data & AI RevolutionSign up to a href="https://www.datacamp.com/radar-analytics-edition"...

In today's episode, host Jason Foster is joined by Olivia Duane Adams (Libby), Chief Advocacy Officer and co-founder of Alteryx, and Jason Belland, Vice President of the Global Sparked Education Program at Alteryx. Together, they delve into the critical role of data analytics in modern education and its transformative impact on bridging skill gaps and empowering individuals worldwide. Join the conversation as they explore the evolving landscape of skill acquisition, discuss the importance of collaboration among academia, industry, and government, and highlight how democratising access to data education is revolutionising industries across the globe.

In this episode of the Data Career Podcast, Avery talks with his childhood friend, Paul Alstrom, about his journey into data analytics from a non-technical background.

Paul emphasises the importance of networking, understanding the business, and getting the requirements right at the start.

They also explore the day-to-day life of a data analyst, how to make yourself useful to the business, as well as how to manage senior stakeholders.

Connect with Paul Ahlstrom:

🤝 Connect on Linkedin

✉️ Discover what we wish we knew about landing the dream job

🤖 Data Analytics Answers At Your Finger Tips

🤝 Ace your data analyst interview with the interview simulator

📩 Get my weekly email with helpful data career tips

📊 Come to my next free “How to Land Your First Data Job” training

🏫 Check out my 10-week data analytics bootcamp

Timestamps:

(14:31) - The Importance of Networking in Job Hunting (22:44) - Understanding User Behavior through Data (23:12) - The Role of SQL in Data Analysis (23:25) - Business Use Cases for Data Analysis (27:55) - The Art of Reporting in Data Analysis (29:14) - The Importance of Asking the Right Questions (31:17) - The Role of Communication in Data Analysis (31:46) - The Power of Iterative Analytics (39:47) - Understanding the Business Context in Data Analysis

Connect with Avery:

📺 Subscribe on YouTube

🎙Listen to My Podcast

👔 Connect with me on LinkedIn

📸 Instagram

🎵 TikTok

Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

Snowflake has been foundational in the data space for years. In the mid-2010s, the platform was a major driver of moving data to the cloud. More recently, it's become apparent that combining data and AI in the cloud is key to accelerating innovation. Snowflake has been rapidly adding AI features to provide value to the modern data stack, but what’s really been going on under the hood? At the time of recording, Sridhar Ramaswamy was the SVP of AI at Snowflake, being appointed CEO at Snowflake in February 2024. Sridhar was formerly Co-Founder of Neeva, acquired in 2023 by Snowflake. Before founding Neeva, Ramaswamy oversaw Google's advertising products, including search, display, video advertising, analytics, shopping, payments, and travel. He joined Google in 2003 and was part of the growth of AdWords and Google's overall advertising business. He spent more than 15 years at Google, where he started as a software engineer and rose to SVP of Ads & Commerce.  In the episode, Richie and Sridhar explore Snowflake and its uses, how generative AI is changing the attitudes of leaders towards data, how NLP and AI have impacted enterprise business operations as well as new applications of AI in an enterprise environment, the challenges of enterprise search, the importance of data quality, management and the role of semantic layers in the effective use of AI, a look into Snowflakes products including Snowpilot and Cortex, the collaboration required for successful data and AI projects, advice for organizations looking to improve their data management and much more.     About the AI and the Modern Data Stack DataFramed Series This week we’re releasing 4 episodes focused on how AI is changing the modern data stack and the analytics profession at large. The modern data stack is often an ambiguous and all-encompassing term, so we intentionally wanted to cover the impact of AI on the modern data stack from different angles. Here’s what you can expect: Why the Future of AI in Data will be Weird with Benn Stancil, CTO at Mode & Field CTO at ThoughtSpot — Covering how AI will change analytics workflows and tools How Databricks is Transforming Data Warehousing and AI with Ari Kaplan, Head Evangelist & Robin Sutara, Field CTO at Databricks — Covering Databricks, data intelligence and how AI tools are changing data democratizationAdding AI to the Data Warehouse with Sridhar Ramaswamy, CEO at Snowflake — Covering Snowflake and its uses, how generative AI is changing the attitudes of leaders towards data, and how to improve your data managementAccelerating AI Workflows with Nuri Cankaya, VP of AI Marketing & La Tiffaney Santucci, AI Marketing Director at Intel — Covering AI’s impact on marketing analytics, how AI is being integrated into existing products, and the democratization of AI Links Mentioned in the Show: SnowflakeSnowflake acquires Neeva to accelerate search in the Data Cloud through generative AIUse AI in Seconds with Snowflake Cortex[Course] Introduction to SnowflakeRelated Episode: Why AI will Change Everything—with Former Snowflake CEO, Bob MugliaSign up to a...

Mastering Microsoft Fabric: SAASification of Analytics

Learn and explore the capabilities of Microsoft Fabric, the latest evolution in cloud analytics suites. This book will help you understand how users can leverage Microsoft Office equivalent experience for performing data management and advanced analytics activity. The book starts with an overview of the analytics evolution from on premises to cloud infrastructure as a service (IaaS), platform as a service (PaaS), and now software as a service (SaaS version) and provides an introduction to Microsoft Fabric. You will learn how to provision Microsoft Fabric in your tenant along with the key capabilities of SaaS analytics products and the advantage of using Fabric in the enterprise analytics platform. OneLake and Lakehouse for data engineering is discussed as well as OneLake for data science. Author Ghosh teaches you about data warehouse offerings inside Microsoft Fabric and the new data integration experience which brings Azure Data Factory and Power Query Editor of Power BI together in a single platform. Also demonstrated is Real-Time Analytics in Fabric, including capabilities such as Kusto query and database. You will understand how the new event stream feature integrates with OneLake and other computations. You also will know how to configure the real-time alert capability in a zero code manner and go through the Power BI experience in the Fabric workspace. Fabric pricing and its licensing is also covered. After reading this book, you will understand the capabilities of Microsoft Fabric and its Integration with current and upcoming Azure OpenAI capabilities. What You Will Learn Build OneLake for all data like OneDrive for Microsoft Office Leverage shortcuts for cross-cloud data virtualization in Azure and AWS Understand upcoming OpenAI integration Discover new event streaming and Kusto query inside Fabric real-time analytics Utilize seamless tooling for machine learning and data science Who This Book Is For Citizen users and experts in the data engineering and data science fields, along with chief AI officers

Databricks started out as a platform for using Spark, a big data analytics engine, but it's grown a lot since then. Databricks now allows users to leverage their data and AI projects in the same place, ensuring ease of use and consistency across operations. The Databricks platform is converging on the idea of data intelligence, but what does this mean, how will it help data teams and organizations, and where does AI fit in the picture? Ari is Databricks’ Head of Evangelism and "The Real Moneyball Guy" - the popular movie was partly based on his analytical innovations in Major League Baseball. He is a leading influencer in analytics, artificial intelligence, data science, and high-growth business innovation. Ari was previously the Global AI Evangelist at DataRobot, Nielsen’s regional VP of Analytics, Caltech Alumni of the Decade, President Emeritus of the worldwide Independent Oracle Users Group, on Intel’s AI Board of Advisors, Sports Illustrated Top Ten GM Candidate, an IBM Watson Celebrity Data Scientist, and on the Crain’s Chicago 40 Under 40. He's also written 5 books on analytics, databases, and baseball. Robin is the Field CTO at Databricks. She has consulted with hundreds of organizations on data strategy, data culture, and building diverse data teams. Robin has had an eclectic career path in technical and business functions with more than two decades in tech companies, including Microsoft and Databricks. She also has achieved multiple academic accomplishments from her juris doctorate to a masters in law to engineering leadership. From her first technical role as an entry-level consumer support engineer to her current role in the C-Suite, Robin supports creating an inclusive workplace and is the current co-chair of Women in Data Safety Committee. She was also recognized in 2023 as a Top 20 Women in Data and Tech, as well as DataIQ 100 Most Influential People in Data. In the episode, Richie, Ari, and Robin explore Databricks, the application of generative AI in improving services operations and providing data insights, data intelligence, and lakehouse technology, the wide-ranging applications of generative AI, how AI tools are changing data democratization, the challenges of data governance and management and how tools like Databricks can help, how jobs in data and AI are changing and much more.  About the AI and the Modern Data Stack DataFramed Series This week we’re releasing 4 episodes focused on how AI is changing the modern data stack and the analytics profession at large. The modern data stack is often an ambiguous and all-encompassing term, so we intentionally wanted to cover the impact of AI on the modern data stack from different angles. Here’s what you can expect: Why the Future of AI in Data will be Weird with Benn Stancil, CTO at Mode & Field CTO at ThoughtSpot — Covering how AI will change analytics workflows and tools How Databricks is Transforming Data Warehousing and AI with Ari Kaplan, Head Evangelist & Robin Sutara, Field CTO at Databricks — Covering Databricks, data intelligence and how AI tools are changing data democratizationAdding AI to the Data Warehouse with Sridhar Ramaswamy, CEO at Snowflake — Covering Snowflake and its uses, how generative AI is changing the attitudes of leaders towards data, and how to improve your data managementAccelerating AI Workflows with Nuri Cankaya, VP of AI Marketing & La Tiffaney Santucci, AI Marketing Director at Intel — Covering AI’s impact on marketing analytics, how AI is being integrated into existing products, and the democratization of AI Links Mentioned in the Show: DatabricksDelta Lakea href="https://mlflow.org/" rel="noopener...

This week, I'm chatting with Karen Meppen, a founding member of the Data Product Leadership Community and a Data Product Architect and Client Services Director at Hakkoda. Today, we're tackling the difficult topic of developing data products in situations where a product-oriented culture and data infrastructures may still be emerging or “at odds” with a human-centered approach. Karen brings extensive experience and a strong belief in how to effectively negotiate the early stages of data maturity. Together we look at the major hurdles that businesses encounter when trying to properly exploit data products, as well as the necessity of leadership support and strategy alignment in these initiatives. Karen's insights offer a roadmap for those seeking to adopt a product and UX-driven methodology when significant tech or cultural hurdles may exist.

Highlights/ Skip to:

I Introduce Karen Meppen and the challenges of dealing with data products in places where the data and tech aren't quite there yet (00:00) Karen shares her thoughts on what it's like working with "immature data" (02:27) Karen breaks down what a data product actually is (04:20) Karen and I discuss why having executive buy-in is crucial for moving forward with data products (07:48) The sometimes fuzzy definition of "data products." (12:09) Karen defines “shadow data teams” and explains how they sometimes conflict with tech teams (17:35) How Karen identifies the nature of each team to overcome common hurdles of connecting tech teams with business units (18:47) How she navigates conversations with tech leaders who think they already understand the requirements of business users (22:48) Using design prototypes and design reviews with different teams to make sure everyone is on the same page about UX (24:00) Karen shares stories from earlier in her career that led her to embrace human-centered design to ensure data products actually meet user needs (28:29) We reflect on our chat about UX, data products, and the “producty” approach to ML and analytics solutions (42:11) 

Quotes from Today’s Episode "It’s not really fair to get really excited about what we hear about or see on LinkedIn, at conferences, etc. We get excited about the shiny things, and then want to go straight to it when [our] organization [may not be ] ready to do that, for a lot of reasons." - Karen Meppen (03:00)

"If you do not have support from leadership and this is not something [they are]  passionate about, you probably aren’t a great candidate for pursuing data products as a way of working." - Karen Meppen (08:30)

"Requirements are just friendly lies." - Karen, quoting Brian about how data teams need to interpret stakeholder requests  (13:27)

"The greatest challenge that we have in technology is not technology, it’s the people, and understanding how we’re using the technology to meet our needs." - Karen Meppen (24:04)

"You can’t automate something that you haven’t defined. For example, if you don’t have clarity on your tagging approach for your PII, or just the nature of all the metadata that you’re capturing for your data assets and what it means or how it’s handled—to make it good, then how could you possibly automate any of this that hasn’t been defined?" - Karen Meppen (38:35)

"Nothing upsets an end-user more than lifting-and-shifting an existing report with the same problems it had in a new solution that now they’ve never used before." - Karen Meppen (40:13)

“Early maturity may look different in many ways depending upon the nature of  business you’re doing, the structure of your data team, and how it interacts with folks.” (42:46) 

Links  Data Product Leadership Community https://designingforanalytics.com/community/ Karen Meppen on LinkedIn: ​​https://www.linkedin.com/in/karen--m/ Hakkōda, Karen's company, for more insights on data products and services:https://hakkoda.io/ 

podcast_episode
by Val Kroll , Julie Hoyer , Kirsten Lum (storytellers.ai) , Tim Wilson (Analytics Power Hour - Columbus (OH) , Moe Kiss (Canva) , Michael Helbling (Search Discovery)

Is it just us, or does it seem like we're going to need to start plotting the pace of change in the world of analytics on a logarithmic scale? The evolution of the space is exciting, but it can also be a bit dizzying. And intimidating! There's so much to learn, and there are only so many hours in a day! Why did we choose that [insert totally unrelated field of study] degree program?! These questions and more—including a quick explanation of bootstrapping for Tim's benefit, which is NOT bootstrapping or bootstrap—are the subject of the latest episode of the show, with Kirsten Lum, the CTO of storytellers.ai, joining us to discuss strategies and tactics for the technically-non-technical analyst to thrive in an increasingly technical analytics world. For complete show notes, including links to items mentioned in this episode and a transcript of the show, visit the show page.

One of the biggest surprises of the generative AI revolution over the past 2 years lies in the counter-intuitiveness of its most successful use cases. Counter to most predictions made about AI years ago, AI-assisted coding, specifically AI-assisted data work, has been surprisingly one of the biggest killer apps of generative AI tools and copilots. However, what happens when we take this notion even further? How will analytics workflows look like when generative AI tools can also assist us in problem-solving? What type of analytics use cases can we expect to operationalize, and what tools can we expect to work with when AI systems can provide scalable qualitative data instead of relying on imperfect quantitative proxies? Today’s guest calls this future “weird”.  Benn Stancil is the Field CTO at ThoughtSpot. He joined ThoughtSpot in 2023 as part of its acquisition of Mode, where he was a Co-Founder and CTO. While at Mode, Benn held roles leading Mode’s data, product, marketing, and executive teams. He regularly writes about data and technology at benn.substack.com. Prior to founding Mode, Benn worked on analytics teams at Microsoft and Yammer. Throughout the episode, Benn and Adel talk about the nature of AI-assisted analytics workflows, the potential for generative AI in assisting problem-solving, how he imagines analytics workflows to look in the future, and a lot more.  About the AI and the Modern Data Stack DataFramed Series This week we’re releasing 4 episodes focused on how AI is changing the modern data stack and the analytics profession at large. The modern data stack is often an ambiguous and all-encompassing term, so we intentionally wanted to cover the impact of AI on the modern data stack from different angles. Here’s what you can expect: Why the Future of AI in Data will be Weird with Benn Stancil, CTO at Mode & Field CTO at ThoughtSpot — Covering how AI will change analytics workflows and tools How Databricks is Transforming Data Warehousing and AI with Ari Kaplan, Head Evangelist & Robin Sutara, Field CTO at Databricks — Covering Databricks, data intelligence and how AI tools are changing data democratizationAdding AI to the Data Warehouse with Sridhar Ramaswamy, CEO at Snowflake — Covering Snowflake and its uses, how generative AI is changing the attitudes of leaders towards data, and how to improve your data managementAccelerating AI Workflows with Nuri Cankaya, VP of AI Marketing & La Tiffaney Santucci, AI Marketing Director at Intel — Covering AI’s impact on marketing analytics, how AI is being integrated into existing products, and the democratization of AI Links Mentioned in the Show: Mode AnalyticsThoughtSpot acquires Mode: Empowering data teams to bring Generative AI to BIEverybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are[Course] Generative AI for Business[Skill Track] SQL FundamentalsRelated Episode: The Future of Marketing Analytics with Cory Munchbach, CEO at...

Summary

A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Multiple open source projects and vendors have been working together to make this vision a reality. In this episode Dain Sundstrom, CTO of Starburst, explains how the combination of the Trino query engine and the Iceberg table format offer the ease of use and execution speed of data warehouses with the infinite storage and scalability of data lakes.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. It is an open-source, cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. Your team can get up and running in minutes thanks to Dagster Cloud, an enterprise-class hosted solution that offers serverless and hybrid deployments, enhanced security, and on-demand ephemeral test deployments. Go to dataengineeringpodcast.com/dagster today to get started. Your first 30 days are free! Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Join in with the event for the global data community, Data Council Austin. From March 26th-28th 2024, they'll play host to hundreds of attendees, 100 top speakers, and dozens of startups that are advancing data science, engineering and AI. Data Council attendees are amazing founders, data scientists, lead engineers, CTOs, heads of data, investors and community organizers who are all working togethr to build the future of data. As a listener to the Data Engineering Podcast you can get a special discount of 20% off your ticket by using the promo code dataengpod20. Don't miss out on their only event this year! Visit: dataengineeringpodcast.com/data-council today. Your host is Tobias Macey and today I'm interviewing Dain Sundstrom about building a data lakehouse with Trino and Iceberg

Interview

Introduction How did you get involved in the area of data management? To start, can you share your definition of what constitutes a "Data Lakehouse"?

What are the technical/architectural/UX challenges that have hindered the progression of lakehouses? What are the notable advancements in recent months/years that make them a more viable platform choice?

There are multiple tools and vendors that have adopted the "data lakehouse" terminology. What are the benefits offered by the combination of Trino and Iceberg?

What are the key points of comparison for that combination in relation to other possible selections?

What are the pain points that are still prevalent in lakehouse architectures as compared to warehouse or vertically integrated systems?

What progress is being made (within or across the ecosystem) to address those sharp edges?

For someone who is interested in building a data lakehouse with Trino and Iceberg, how does that influence their selection of other platform elements? What are the differences in terms of pipeline design/access and usage patterns when using a Trino

My voice is sort of working, and I chat about Tristan Handy's article that raised quite a ruckus this week, "Is the "Modern Data Stack" Still a Useful Idea?"

In the end, the Modern Data Stack won - people use the cloud for analytics. And everything ends, so I'm excited for what's next.

Article: https://roundup.getdbt.com/p/is-the-modern-data-stack-still-a?r=oc02

podcast_episode
by Matt Colyar (Moody's Analytics) , Cris deRitis , Mark Zandi (Moody's Analytics) , Marisa DiNatale (Moody's Analytics)

The Inside Economics team is joined by CPI guru and colleague Matt Colyar to discuss the bevy of inflation-related data released this week. First the team dissects the Federal Reserve’s CCAR stress test scenarios and laments the perpetually inconvenient timing of their release. Talk turns to the root causes for the inflation of the past few years and why shelter inflation is so stubborn. The team imagines themselves on the FOMC for a day and what they would do with interest rates going forward.   Follow Mark Zandi @MarkZandi, Cris deRitis @MiddleWayEcon, and Marisa DiNatale on LinkedIn for additional insight.

Questions or Comments, please email us at [email protected]. We would love to hear from you.    To stay informed and follow the insights of Moody's Analytics economists, visit Economic View.

In this podcast episode, Avery talks with Akshay Swaminathan, co-author of the book 'Winning with Data Science: A Handbook for Business Leaders.

They discuss the vague terms often thrown around in the data science industry and underline the importance of understanding these terms in order to make effective business decisions.

Swaminathan highlights the need to leverage data science in healthcare and real estate.

The episode covers various aspects of data analysis, including prediction, association, and description.

Connect with Akshay Swaminathan :

🤝 Connect on Linkedin

📘 Learn About Winning with Data Science

✉️ Discover what we wish we knew about landing the dream job

🤖 Data Analytics Answers At Your Finger Tips

🤝 Ace your data analyst interview with the interview simulator

📩 Get my weekly email with helpful data career tips

📊 Come to my next free “How to Land Your First Data Job” training

🏫 Check out my 10-week data analytics bootcamp

Timestamps:

(12:14) - The Value of Data Science in Business (14:57) - The Role of Data Science in Problem Solving (20:36) - Understanding the Difference Between Data Science and Data Analytics (25:39) - The Importance of Framing Business Questions for Data Science (30:48) - The Power of Meeting Business Needs with Data Science: A Real-Life Example (38:26 ) - The Limitations of Data Science

Connect with Avery:

📺 Subscribe on YouTube

🎙Listen to My Podcast

👔 Connect with me on LinkedIn

📸 Instagram

🎵 TikTok

Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

Summary

Sharing data is a simple concept, but complicated to implement well. There are numerous business rules and regulatory concerns that need to be applied. There are also numerous technical considerations to be made, particularly if the producer and consumer of the data aren't using the same platforms. In this episode Andrew Jefferson explains the complexities of building a robust system for data sharing, the techno-social considerations, and how the Bobsled platform that he is building aims to simplify the process.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Dagster offers a new approach to building and running data platforms and data pipelines. It is an open-source, cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. Your team can get up and running in minutes thanks to Dagster Cloud, an enterprise-class hosted solution that offers serverless and hybrid deployments, enhanced security, and on-demand ephemeral test deployments. Go to dataengineeringpodcast.com/dagster today to get started. Your first 30 days are free! Your host is Tobias Macey and today I'm interviewing Andy Jefferson about how to solve the problem of data sharing

Interview

Introduction How did you get involved in the area of data management? Can you start by giving some context and scope of what we mean by "data sharing" for the purposes of this conversation? What is the current state of the ecosystem for data sharing protocols/practices/platforms?

What are some of the main challenges/shortcomings that teams/organizations experience with these options?

What are the technical capabilities that need to be present for an effective data sharing solution?

How does that change as a function of the type of data? (e.g. tabular, image, etc.)

What are the requirements around governance and auditability of data access that need to be addressed when sharing data? What are the typical boundaries along which data access requires special consideration for how the sharing is managed? Many data platform vendors have their own interfaces for data sharing. What are the shortcomings of those options, and what are the opportunities for abstracting the sharing capability from the underlying platform? What are the most interesting, innovative, or unexpected ways that you have seen data sharing/Bobsled used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on data sharing? When is Bobsled the wrong choice? What do you have planned for the future of data sharing?

Contact Info

LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine

podcast_episode
by Cris deRitis , Mark Zandi (Moody's Analytics) , Bill Adams (Comerica Bank) , Marisa DiNatale (Moody's Analytics)

Bill Adams, Chief Economist of Dallas-based Comerica bank, joins the Inside Economics team to assess the economic outlook and consider a range of economic issues from consumer credit to China’s prospects. We also learned what he is most anxious about, and it isn’t the outcome of the Super Bowl.   For more info on Bill Adams click here Follow Mark Zandi @MarkZandi, Cris deRitis @MiddleWayEcon, and Marisa DiNatale on LinkedIn for additional insight.

Questions or Comments, please email us at [email protected]. We would love to hear from you.    To stay informed and follow the insights of Moody's Analytics economists, visit Economic View.

It's not easy being the head of data & analytics at a large organization. You must align a large team across multiple disciplines; you must deal with oodles of legacy systems and tools that hamper innovation; and you must deliver business value fast to keep executives at bay and your job intact. You also need to recruit dynamic managers who can push the envelope while meeting operational objectives. And when you falter--which you inevitably will-you have to rebound fast.

No one knows these lessons better than Tiffany Perkins-Munn. She currently runs a 275-person data & analytics team at JP Morgan Chase that consists of data engineers, data scientists, behavioral economists, and business intelligence experts. She thrives on versatility, having earned a Ph.D. in Social-Personality Psychology with an interdisciplinary focus on Advanced Quantitative Methods. Building on this foundation, she has accumulated vast experience in the art of managing data & analytics teams during her 23 years in technical and managerial roles in the financial services industry.

In this interview, you’ll learn:

  1. Tiffany’s secret for aligning a large data & analytics team and keep them from splitting into silos of specialization
  2. Her favorite techniques for recruiting the right people to her team.
  3. How to wade through the thicket of legacy systems and deliver innovative solutions quickly.
  4. The impact of GenAI on her operations and the financial services industry.
  5. How to advance your careers in data & analytics.