92: What is Linear Programming? (A Glimpse to What I Did as a Data Scientist at ExxonMobil)

2024-01-10 · Data Career Podcast: Helping You Land a Data Analyst Job FAST Listen

podcast_episode

by Avery Smith

AI/ML Data Analytics

In this episode, we dive deep into linear programming and its applications in various business scenarios.

We discuss its powerful use in optimizing critical business decisions and also touch on how it was utilized in crucial roles at ExxonMobil to maximize refinery profits.

Tune in to uncover how linear programming can be a game-changer in your data career journey.

🌟Leave your review and download the bonus!

🤝 Ace your data analyst interview with the interview simulator

📩 Get my weekly email with helpful data career tips

📊 Come to my next free “How to Land Your First Data Job” training

🏫 Check out my 10-week data analytics bootcamp

Timestamps:

(05:32) - Introduction to Linear Programming (09:28) - Understanding Linear Programming (12:12) - Applications of Linear Programming (14:55) - Maximizing and Minimizing in Linear Programming (18:43) - Applying Linear Programming to Business (33:20) - Using Linear Regression to Determine Slopes (35:06) - A Real-World Application

Connect with Avery:

📺 Subscribe on YouTube

🎙Listen to My Podcast

👔 Connect with me on LinkedIn

📸 Instagram

🎵 TikTok

Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

134 - What Sanjeev Mohan Learned Co-Authoring “Data Products for Dummies”

2024-01-09 · Experiencing Data w/ Brian T. O’Neill (AI & data product management leadership—powered by UX design) Listen

podcast_episode

by Sanjeev Mohan (Gartner (former)) , Brian O’Neill (Designing for Analytics)

AI/ML Data Analytics DataOps dbt GenAI Snowflake

In this episode, I’m chatting with former Gartner analyst Sanjeev Mohan who is the Co-Author of Data Products for Dummies. Throughout our conversation, Sanjeev shares his expertise on the evolution of data products, and what he’s seen as a result of implementing practices that prioritize solving for use cases and business value. Sanjeev also shares a new approach of structuring organizations to best implement ownership and accountability of data product outcomes. Sanjeev and I also explore the common challenges of product adoption and who is responsible for user experience. I purposefully had Sanjeev on the show because I think we have pretty different perspectives from which we see the data product space.

Highlights/ Skip to:

I introduce Sanjeev Mohan, co-author of Data Products for Dummies (00:39) Sanjeev expands more on the concept of writing a “for Dummies” book (00:53) Sanjeev shares his definition of a data product, including both a technical and a business definition (01:59) Why Sanjeev believes organizational changes and accountability are the keys to preventing the acceleration of shipping data products with little to no tangible value (05:45) How Sanjeev recommends getting buy-in for data product ownership from other departments in an organization (11:05) Sanjeev and I explore adoption challenges and the topic of user experience (13:23) Sanjeev explains what role is responsible for user experience and design (19:03) Who should be responsible for defining the metrics that determine business value (28:58) Sanjeev shares some case studies of companies who have adopted this approach to data products and their outcomes (30:29) Where companies are finding data product managers currently (34:19) Sanjeev expands on his perspective regarding the importance of prioritizing business value and use cases (40:52) Where listeners can get Data Products for Dummies, and learn more about Sanjeev’s work (44:33)

Quotes from Today’s Episode “You may slap a label of data product on existing artifact; it does not make it a data product because there’s no sense of accountability. In a data product, because they are following product management best practices, there must be a data product owner or a data product manager. There’s a single person [responsible for the result]. — Sanjeev Mohan (09:31)

“I haven’t even mentioned the word data mesh because data mesh and data products, they don’t always have to go hand-in-hand. I can build data products, but I don’t need to go into the—do all of data mesh principles.” – Sanjeev Mohan (26:45)

“We need to have the right organization, we need to have a set of processes, and then we need a simplified technology which is standardized across different teams. So, this way, we have the benefit of reusing the same technology. Maybe it is Snowflake for storage, DBT for modeling, and so on. And the idea is that different teams should have the ability to bring their own analytical engine.” – Sanjeev Mohan (27:58)

“Generative AI, right now as we are recording, is still in a prototyping phase. Maybe in 2024, it’ll go heavy-duty production. We are not in prototyping phase for data products for a lot of companies. They’ve already been experimenting for a year or two, and now they’re actually using them in production. So, we’ve crossed that tipping point for data products.” – Sanjeev Mohan (33:15)

“Low adoption is a problem that’s not just limited to data products. How long have we had data catalogs, but they have low adoption. So, it’s a common problem.” – Sanjeev Mohan (39:10)

“That emphasis on technology first is a wrong approach. I tell people that I’m sorry to burst your bubble, but there are no technology projects, there are only business projects. Technology is an enabler. You don’t do technology for the sake of technology; you have to serve a business cause, so let’s start with that and keep that front and center.” – Sanjeev Mohan (43:03)

Links Data Products for Dummies: https://www.dataops.live/dataproductsfordummies “What Exactly is A Data Product” article: https://medium.com/data-mesh-learning/what-exactly-is-a-data-product-7f6935a17912 It Depends: https://www.youtube.com/@SanjeevMohan Chief Data Analytics and Product Officer of Equifax: https://www.youtube.com/watch?v=kFY7WGc-jFM SanjMo Consulting: https://www.sanjmo.com/ dataops.live: https://dataops.live dataops.live/dataproductsfordummies: https://dataops.live/dataproductsfordummies LinkedIn: https://www.linkedin.com/in/sanjmo/ Medium articles: https://sanjmo.medium.com

Organizing for Success Part III: How to Organize and Staff Data Analytics Teams - Audio Blog

2024-01-08 · Secrets of Data Analytics Leaders Listen

podcast_episode

Data Analytics

Companies need to invest heavily in teams and people, both at corporate and in the field, if they want to become a data-driven organization. Published at: https://www.eckerson.com/articles/organizing-for-success-part-iii-how-to-organize-and-staff-data-analytics-teams

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

2024-01-07 · Data Engineering Podcast Listen

podcast_episode

by Jignesh Patel (Carnegie Mellon University) , Tobias Macey

AI/ML Data Engineering Data Lake Data Lakehouse Data Management Delta Hudi Iceberg Python SQL Trino

Summary

Data processing technologies have dramatically improved in their sophistication and raw throughput. Unfortunately, the volumes of data that are being generated continue to double, requiring further advancements in the platform capabilities to keep up. As the sophistication increases, so does the complexity, leading to challenges for user experience. Jignesh Patel has been researching these areas for several years in his work as a professor at Carnegie Mellon University. In this episode he illuminates the landscape of problems that we are faced with and how his research is aimed at helping to solve these problems.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Your host is Tobias Macey and today I'm interviewing Jignesh Patel about the research that he is conducting on technical scalability and user experience improvements around data management

Interview

Introduction How did you get involved in the area of data management? Can you start by summarizing your current areas of research and the motivations behind them? What are the open questions today in technical scalability of data engines?

What are the experimental methods that you are using to gain understanding in the opportunities and practical limits of those systems?

As you strive to push the limits of technical capacity in data systems, how does that impact the usability of the resulting systems?

When performing research and building prototypes of the projects, what is your process for incorporating user experience into the implementation of the product?

What are the main sources of tension between technical scalability and user experience/ease of comprehension? What are some of the positive synergies that you have been able to realize between your teaching, research, and corporate activities?

In what ways do they produce conflict, whether personally or technically?

What are the most interesting, innovative, or unexpected ways that you have seen your research used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on research of the scalability limits of data systems? What is your heuristic for when a given research project needs to be terminated or productionized? What do you have planned for the future of your academic research?

Contact Info

Website LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tel

Catch Up and Ketchup

2024-01-05 · Moody's Talks - Inside Economics Listen

podcast_episode

by Dante DeAntonio (Moody's Analytics) , Cris deRitis , Mark Zandi (Moody's Analytics) , Marisa DiNatale (Moody's Analytics)

This week’s podcast focuses on the jobs report for December. The usual cast of characters discusses the job catch-up (not ketchup) in government and healthcare, and its implications. Everyone agreed that despite the considerable cross-currents in the numbers, it was a good report. Follow Mark Zandi @MarkZandi, Cris deRitis @MiddleWayEcon, and Marisa DiNatale on LinkedIn for additional insight.

Questions or Comments, please email us at [email protected]. We would love to hear from you. To stay informed and follow the insights of Moody's Analytics economists, visit Economic View.

91: If I Wanted to Become a Data Analyst In 2024, This is What I'd Do [FULL BLUEPRINT]

2024-01-05 · Data Career Podcast: Helping You Land a Data Analyst Job FAST Listen

podcast_episode

by Avery Smith

AI/ML Data Analytics SQL

This is the blueprint to becoming a data analyst & I will walk you through the levels step-by-step to becoming one this year.

Follow this blueprint, and I promise you can become a data analyst. If you’d like more of my free resources, including a more in-depth webinar where I talk more about projects & networking, click the link to the descriptions.

Practice SQL with :

🛠️ Analyst Builder

🖱️ Stratascratch

🐒 DataLemur

🤝 Ace your data analyst interview with the interview simulator

📩 Get my weekly email with helpful data career tips

📊 Come to my next free “How to Land Your First Data Job” training

🏫 Check out my 10-week data analytics bootcamp

Timestamps:

(00:08) - Level 1

(04:20) - Level 2

(06:15) - Level 3

(07:50) - Level 4

(09:10) - Level 5

(13:05) - Level 6

(15:30) - Level 7

(18:20) - Level 8

(19:42) - Level 9

(21:50) - Level 10

Connect with Avery:

📺 Subscribe on YouTube

🎙Listen to My Podcast

👔 Connect with me on LinkedIn

📸 Instagram

🎵 TikTok Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

The Path To Modern Data Governance - Audio Blog

2024-01-04 · Secrets of Data Analytics Leaders Listen

podcast_episode

Agile/Scrum Data Governance

Conventional data governance conflicts with today’s world of self-service analytics and agile projects. Published at: https://www.eckerson.com/articles/modern-data-governance-problems

How to Become a Data Analyst

2024-01-04 · O'Reilly Data Science Books O'Reilly Amazon

book

by Annie Nelson

Data Analytics LLM analytics-platforms data data-science

Start a brand-new career in data analytics with no-nonsense advice from a self-taught data analytics consultant In How to Become a Data Analyst: My Low-Cost, No Code Roadmap for Breaking into Tech, data analyst and analytics consultant Annie Nelson walks you through how she took the reins and made a dramatic career change to unlock new levels of career fulfilment and enjoyment. In the book, she talks about the adaptability, curiosity, and persistence you’ll need to break free from the 9-5 grind and how data analytics—with its wide variety of skills, roles, and options—is the perfect field for people looking to refresh their careers. Annie offers practical and approachable data portfolio-building advice to help you create one that’s manageable for an entry-level professional but will still catch the eye of employers and clients. You’ll also find: Deep dives into the learning journey required to step into a data analytics role Ways to avoid getting lost in the maze of online courses and certifications you can find online—while still obtaining the skills you need to be competitive Explorations of the highs and lows of Annie’s career-change journey and job search—including what was hard, what was easy, what worked well, and what didn’t Strategies for using ChatGPT to help you in your job search A must-read roadmap to a brand-new and exciting career in data analytics, How to Become a Data Analyst is the hands-on tutorial that shows you exactly how to succeed.

Designing Data Platforms For Fintech Companies

2024-01-01 · Data Engineering Podcast Listen

podcast_episode

by Andrey Korchak (Monite) , Tobias Macey

AI/ML Cloud Computing Data Engineering Data Governance Data Lake Data Lakehouse Data Management Dataflow Delta Hudi Iceberg Microsoft +6 more

Summary

Working with financial data requires a high degree of rigor due to the numerous regulations and the risks involved in security breaches. In this episode Andrey Korchack, CTO of fintech startup Monite, discusses the complexities of designing and implementing a data platform in that sector.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! Your host is Tobias Macey and today I'm interviewing Andrey Korchak about how to manage data in a fintech environment

Interview

Introduction How did you get involved in the area of data management? Can you start by summarizing the data challenges that are particular to the fintech ecosystem? What are the primary sources and types of data that fintech organizations are working with?

What are the business-level capabilities that are dependent on this data?

How do the regulatory and business requirements influence the technology landscape in fintech organizations?

What does a typical build vs. buy decision process look like?

Fraud prediction in e.g. banks is one of the most well-established applications of machine learning in industry. What are some of the other ways that ML plays a part in fintech?

How does that influence the architectural design/capabilities for data platforms in those organizations?

Data governance is a notoriously challenging problem. What are some of the strategies that fintech companies are able to apply to this problem given their regulatory burdens? What are the most interesting, innovative, or unexpected approaches to data management that you have seen in the fintech sector? What are the most interesting, unexpected, or challenging lessons that you have learned while working on data in fintech? What do you have planned for the future of your data capabilities at Monite?

Contact Info

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

Monite ISO 270001 Tesseract GitOps SWIFT Protocol

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Sponsored By: Starburst: Starburst Logo

This episode is brought to you by Starburst - a data lake analytics platform for data engineers who are battling to build and scale high quality data pipelines on the data lake. Powered by Trino, Starburst runs petabyte-scale SQL analytics fast at a fraction of the cost of traditional methods, helping you meet all your data needs ranging from AI/ML workloads to data applications to complete analytics.

Trusted by the teams at Comcast and Doordash, Starburst delivers the adaptability and flexibility a lakehouse ecosystem promises, while providing a single point of access for your data and all your data governance allowing you to discover, transform, govern, and secure all in one place. Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Try Starburst Galaxy today, the easiest and fastest way to get started using Trino, and get $500 of credits free. dataengineeringpodcast.com/starburstRudderstack:

Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstackMaterialize:

You shouldn't have to throw away the database to build with fast-changing data. Keep the familiar SQL, keep the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date.

That is Materialize, the only true SQL streaming database built from the ground up to meet the needs of modern data products: Fresh, Correct, Scalable — all in a familiar SQL UI. Built on Timely Dataflow and Differential Dataflow, open source frameworks created by cofounder Frank McSherry at Microsoft Research, Materialize is trusted by data and engineering teams at Ramp, Pluralsight, Onward and more to build real-time data products without the cost, complexity, and development time of stream processing.

Go to materialize.com today and get 2 weeks free!Support Data Engineering Podcast

A Tour Around Credit Land

2023-12-29 · Moody's Talks - Inside Economics Listen

podcast_episode

by John Toohig (Raymond James) , Cris deRitis , Mark Zandi (Moody's Analytics) , Marisa DiNatale (Moody's Analytics)

John Toohig, head of wholesale trading for Raymond James makes a return appearance on Inside Economics. He last joined us in the wake of the banking crisis this past March, and made the case that the banking system while bowed would not break. He was right. Join us to hear what John is now saying about the system, loan growth and quality, and what it all means for the Fed and economy. For more on John Toohig, click here Follow Mark Zandi @MarkZandi, Cris deRitis @MiddleWayEcon, and Marisa DiNatale on LinkedIn for additional insight.

Questions or Comments, please email us at [email protected]. We would love to hear from you. To stay informed and follow the insights of Moody's Analytics economists, visit Economic View.

Maintaining Federal Governance Policies with Multiple Data Sources w/ Tim Tutt

2023-12-29 · Data Unchained

podcast_episode

by Tim Tutt (Night Shift Development, Inc.)

Cyber Security

Welcome back to another podcast episode of Data Unchained! This time, I am pleased to welcome Tim Tutt, #CEO and #CoFounder Night Shift Development, Inc. Tim and his company work with clients on analyzing and generating data at exponential rates. In this episode, I talk with Tim about the security and technology that goes into the analytics for humans while maintaining federal governance policies.

data #datascience #datagovernance #podcast #analysts #dataanalytics

Cyberpunk by jiglr | https://soundcloud.com/jiglrmusic Music promoted by https://www.free-stock-music.com Creative Commons Attribution 3.0 Unported License https://creativecommons.org/licenses/by/3.0/deed.en_US Hosted on Acast. See acast.com/privacy for more information.

Data Observability for Data Engineering

2023-12-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Michele Pinto , Sammy El Khammal

Data Engineering Data Quality Python data data-engineering

"Data Observability for Data Engineering" introduces you to the foundational concepts of observing and validating data pipeline health. With real-world projects and Python code examples, you'll gain hands-on experience in improving data quality and minimizing risks, enabling you to implement strategies that ensure accuracy and reliability in your data systems. What this Book will help me do Master data observability techniques to monitor and validate data pipelines effectively. Learn to collect and analyze meaningful metrics to gauge and improve data quality. Develop skills in Python programming specific to applying data concepts such as observable data state. Address scalability challenges using state-of-the-art observability frameworks and practices. Enhance your ability to manage and optimize data workflows ensuring seamless operation from start to end. Author(s) Authors Michele Pinto and Sammy El Khammal bring a wealth of experience in data engineering and observing scalable data systems. Pinto specializes in constructing robust analytics platforms while Khammal offers insights into integrating software observability into massive pipelines. Their collaborative writing style ensures readers find both practical advice and theoretical foundations. Who is it for? This book is geared toward data engineers, architects, and scientists who seek to confidently handle pipeline challenges. Whether you're addressing specific issues or wish to introduce proactive measures in your team, this guide meets the needs of those ready to leverage observability as a key practice.

Data Science for Web3

2023-12-29 · O'Reilly Data Science Books O'Reilly Amazon

book

by Gabriela Castillo Areco

AI/ML Blockchain Data Science DeFi NFT Python Web3 data data-science

Discover how to navigate the world of Web3 data with 'Data Science for Web3,' an expertly crafted guide by Gabriela Castillo Areco. Through practical examples, industry insights, and real-world use cases, you will learn the skills needed to analyze blockchain data and extract actionable business insights. What this Book will help me do Understand blockchain transactions and data structures to build robust datasets. Leverage on-chain and off-chain data for valuable Web3 business insights. Create DeFi- and NFT-specific datasets for targeted analysis. Develop machine learning models tailored for blockchain use cases. Apply data science techniques to innovate in the Web3 ecosystem. Author(s) Gabriela Castillo Areco is a seasoned data scientist and an expert in blockchain analytics. With years of experience in the technology and finance sectors, Gabriela brings a practical perspective to understanding intricate data within the emerging Web3 paradigm. Her engaging approach makes technical concepts accessible and actionable. Who is it for? This book is ideal for data professionals such as analysts, scientists, or engineers, aiming to harness the potential of blockchain analytics. It's also suitable for business professionals exploring data-driven opportunities within Web3. Whether you're a beginner or an experienced learner with some Python background, this book will meet you at your level.

#169 Unlocking Efficiency Gains Through Process Mining with Wil van der Aalst and Cong Yu, Chief Scientist and VP Engineering at Celonis

2023-12-28 · DataFramed Listen

podcast_episode

by Wil van der Aalst (RWTH Aachen University; Celonis; Fraunhofer FIT; Tilburg University) , Richie (DataCamp) , Cong Yu (Celonis)

AI/ML Data Engineering Data Science Process Mining

Regardless of profession, the work we do leaves behind a trace of actions that help us achieve our goals. This is especially true for those that work with data. For large enterprises where there are seemingly countless processes happening at any one time, keeping track of these processes is crucial. Given the scale of these processes, one small efficiency gain can leads to a staggering amount of time and money saved. Process mining is a data-driven approach to process analysis that uses event logs to extract process-related information. It can separate inferred facts, from exact truths, and uncover what really happens in a variety of operations. Wil van der Aalst is a full professor at RWTH Aachen University, leading the Process and Data Science (PADS) group. He is also the Chief Scientist at Celonis, part-time affiliated with the Fraunhofer FIT, and a member of the Board of Governors of Tilburg University. His research interests include process mining, Petri nets, business process management, workflow management, process modeling, and process analysis. Wil van der Aalst has published over 275 journal papers, 35 books (as author or editor), 630 refereed conference/workshop publications, and 85 book chapters. Cong Yu leads the CeloAI group at Celonis focusing on bringing advanced AI technologies to EMS products, building up capabilities for their knowledge platform, and ultimately helping enterprises in reducing process inefficiencies and achieving operational excellence. Previously, Cong was Principal (Research) Scientist / Research Director at Google Research NYC from September 2010 to July 2022, leading the NYSD/Beacon Research Group, and also taught at NYU Courant Institute of Mathematical Sciences. In the episode, Wil, Cong, and Richie explore process mining and its development over the past 25 years, the differences between process mining and ML, AI, and data mining, popular use cases of process mining, adoption from large enterprises like BMW, HP, and Dell, the requirements for an effective process mining system, the role of predictive analytics and data engineering in process mining, how to scale process mining systems, prospects within the field and much more. Links Mentioned in the Show: CelonisGartner’s Magic Quadrant for Process MiningPM4PyProcess Query Language (PQL)[Couse] Business Process Analytics in R

90: 2024 New Year Resolutions: 3 Things You Need to Become a Data Analyst

2023-12-27 · Data Career Podcast: Helping You Land a Data Analyst Job FAST Listen

podcast_episode

by Avery Smith

AI/ML Data Analytics

📊 Come to my next free “How to Land Your First Data Job” training

🏫 Land your data analyst job combining these 3 things in my bootcamp

Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

133 - New Experiencing Data Interviews Coming in January 2024

2023-12-26 · Experiencing Data w/ Brian T. O’Neill (AI & data product management leadership—powered by UX design) Listen

podcast_episode

by Brian O’Neill (Designing for Analytics)

AI/ML

Today I am sharing some highlights for 2023 from the podcast, and also letting you all know I’ll be taking a break from the podcast for the rest of December, but I’ll be back with a new episode on January 9th, 2024. I’ve also got two links to share with you—details inside!

Transcript Greetings everyone - I’m taking a little break from Experiencing Data over December of 2023, but I’ll be back in January with more interviews and insights on leveraging UX design and product management to create indispensable data products, machine learning apps, and decision support tools.

Experiencing Data turned this year five years old back in November, with over 130 episodes to date! I still can’t believe it’s been going that long and how far we’ve come.

Some highlights for me in 2023 included launching the Data Product Leadership Community, finding out that the show is now in the top 2% of all podcasts worldwide according to ListenNotes, and most of all, hearing from you that the podcast, and my writing, and the guests that I have brought on are having an impact on your work, your careers, and hopefully the lives of your customers, users, and stakeholders as well!

So, for now, I’ve got just two links for you:

If you’re wondering how to either:

support the show yourself with a really fast review on Apple Podcasts, to record a quick audio question for me to answer on the show, or if you want to join my free Insights mailing lists where I share my bi-weekly ideas and thoughts and 1-page episode summaries of all the show drops that I put out here on Experiencing Data.

…just head over to designingforanalytics.com/podcast and you’ll get links to all those things there.

And secondly, if you need help increasing customer adoption, delight, the business value, or the usability of your analytics and machine learning applications in 2024, I invite you to set up a free discovery call with me 1 on 1.

You bring the questions, I’ll bring my ears, and by the end of the call, I’ll give you my best advice on how to move forward with your situation – whether it’s working with me or not. To schedule one of those free discovery calls, visit designingforanalytics.com/go

And finally, there will be some news coming out next year with the show, as well as my business, so I hope you’ll hop on the mailing list and stay tuned, that’s probably the best place to do that. And if you celebrate holidays in December and January, I hope they’re safe, enjoyable, and rejuvenating. Until 2024, stay tuned right here - and in the words of the great Arnold Schwarzenegger, I’ll be back.

#235: 2023 Year in Review with Josh Crowhurst

2023-12-26 · The Analytics Power Hour Listen

podcast_episode

by Val Kroll , Julie Hoyer , Tim Wilson (Analytics Power Hour - Columbus (OH) , Josh Crowhurst , Moe Kiss (Canva) , Michael Helbling (Search Discovery)

AI/ML Data Science GenAI Google Analytics LLM Marketing MMM

For those who celebrate or acknowledge it, Christmas is now in the rearview mirror. Father Time has a beard that reaches down to his toes, and he's ready to hand over the clock to an absolutely adorable little Baby Time when 2024 rolls in. That means it's time for our annual set of reflections on the analytics and data science industry. Somehow, the authoring of this description of the show was completely unaided by an LLM, although the show did include quite a bit of discussion around generative AI. It also included the announcement of a local LLM based on all of our podcast episodes to date (updated with each new episode going forward!), which you can try out here! The discussion was wide-ranging beyond AI: Google Analytics 4, Marketing Mix Modelling (MMM), the technical/engineering side of analytics versus the softer skills of creative analytical thought and engaging with stakeholders, and more, as well as a look ahead to 2024! For complete show notes, including links to items mentioned in this episode and a transcript of the show, visit the show page.

Troubleshooting Kafka In Production

2023-12-24 · Data Engineering Podcast Listen

podcast_episode

by Elad Eldor , Tobias Macey

AI/ML Cloud Computing Data Engineering Data Lake Data Lakehouse Data Management Delta Hudi Iceberg Kafka SaaS SQL +2 more

Summary

Kafka has become a ubiquitous technology, offering a simple method for coordinating events and data across different systems. Operating it at scale, however, is notoriously challenging. Elad Eldor has experienced these challenges first-hand, leading to his work writing the book "Kafka: : Troubleshooting in Production". In this episode he highlights the sources of complexity that contribute to Kafka's operational difficulties, and some of the main ways to identify and mitigate potential sources of trouble.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Your host is Tobias Macey and today I'm interviewing Elad Eldor about operating Kafka in production and how to keep your clusters stable and performant

Interview

Introduction How did you get involved in the area of data management? Can you describe your experiences with Kafka?

What are the operational challenges that you have had to overcome while working with Kafka? What motivated to write a book about how to manage Kafka in production?

There are many options now for persistent data queues. What are the factors to consider when determining whether Kafka is the right choice?

In the case where Kafka is the appropriate tool, there are many ways to run it now. What are the considerations that teams need to work through when determining whether/where/how to operate a cluster?

When provisioning a Kafka cluster, what are the requirements that need to be considered when determining the sizing?

What are the axes along which size/scale need to be determined?

The core promise of Kafka is that it is a durable store for continuous data. What are the mechanisms that are available for preventing data loss?

Under what circumstances can data be lost?

What are the different failure conditions that cluster operators need to be aware of?

What are the monitoring strategies that ar

DataTopics: Data Roles & MistralAI

2023-12-23 · DataTopics: All Things Data, AI & Tech Listen

podcast_episode

by Maryam

AI/ML Analytics Engineering LLM

Send us a text Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society. Dive into conversations that should flow as smoothly as your morning coffee (but don’t), where industry insights meet laid-back banter. Whether you’re a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let’s get into the heart of data, unplugged style! In this episode, we’re joined by Maryam, an Analytics Engineer with a passion for challenges and a knack for curiosity. From sewing to yoga, Maryam brings a unique perspective to our tech-centric discussions. Analytics Engineer Insights: Maryam discusses her role, the rise of Analytics Engineers, and their essential tools. Read more about Analytics Engineering.The Emerging Role of AI Translator: Exploring the link between Analytics Engineers and AI Translators, and the skills required in these evolving fields. Learn about AI Translator.Mistral AI’s New Developments: Analyzing Mistral AI’s latest model and its implications for the industry. Discover Mistral AI’s update.ChatGPT – A Double-Edged Sword: Discussing the impacts of ChatGPT on the AI landscape and the pace of innovation. Reflect on ChatGPT’s impact.ChatGPT & Job Applications: A fresh take on how ChatGPT is influencing job applications and hiring processes.Engineering Management Insights: Exploring whether becoming an Engineering Manager is a path worth considering.Intro music courtesy of fesliyanstudios.com.

We streamed live!

Festive Names and Forecast Games

2023-12-22 · Moody's Talks - Inside Economics Listen

podcast_episode

by Carl Tannenbaum (Northern Trust) , Cris deRitis , Mark Zandi (Moody's Analytics) , Marisa DiNatale (Moody's Analytics)

In the last podcast before the holidays, Inside Economics chats with Carl Tannenbaum, Chief Economist of Northern Trust about the economy, financial system, Fed and forecasting. The group took as a good omen that they had a Tannenbaum and a DiNatale on the podcast just before Christmas (you may need to Google the names). Not that they attribute their forecasting process to the use of such signs. For more on Carl Tannenbaum click here Follow Mark Zandi @MarkZandi, Cris deRitis @MiddleWayEcon, and Marisa DiNatale on LinkedIn for additional insight.

Questions or Comments, please email us at [email protected]. We would love to hear from you. To stay informed and follow the insights of Moody's Analytics economists, visit Economic View.

talk-data.com

Analytics

Activity Trend

Top Events

Top Speakers

92: What is Linear Programming? (A Glimpse to What I Did as a Data Scientist at ExxonMobil)

134 - What Sanjeev Mohan Learned Co-Authoring “Data Products for Dummies”

Organizing for Success Part III: How to Organize and Staff Data Analytics Teams - Audio Blog

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Catch Up and Ketchup

91: If I Wanted to Become a Data Analyst In 2024, This is What I'd Do [FULL BLUEPRINT]

The Path To Modern Data Governance - Audio Blog

How to Become a Data Analyst

Designing Data Platforms For Fintech Companies

A Tour Around Credit Land

Maintaining Federal Governance Policies with Multiple Data Sources w/ Tim Tutt

data #datascience #datagovernance #podcast #analysts #dataanalytics

Data Observability for Data Engineering

Data Science for Web3

#169 Unlocking Efficiency Gains Through Process Mining with Wil van der Aalst and Cong Yu, Chief Scientist and VP Engineering at Celonis

90: 2024 New Year Resolutions: 3 Things You Need to Become a Data Analyst

133 - New Experiencing Data Interviews Coming in January 2024

#235: 2023 Year in Review with Josh Crowhurst

Troubleshooting Kafka In Production

DataTopics: Data Roles & MistralAI

Festive Names and Forecast Games