This might just be the most interesting and thought-provoking episode of Leaders of Analytics yet. Why? Without even recognising it, you make hundreds of ethical decisions every day. Some of these decisions you probably don’t even recognise as being grounded in ethical principles because they are so ingrained in your subconscious. AI on the other hand, doesn’t make decisions based on ethics, unless ethical behaviour is somehow picked up in the training data. Therefore, we must make AI ethical by design, but that is not easy. Many of the ethical dilemmas arising from AI are difficult to solve, because the problems are so novel in a human context. Yet we all need to get used dealing with these ethical dilemmas at scale as we implement AI in our business operations. To understand the unwieldy world of ethical AI, I recently spoke to James Brusseau who is a philosopher at Pace University, specialising in AI ethics. His academic research explores the human experience of artificial intelligence in the areas of privacy, freedom, authenticity and personal identity and he works with organisations around the world to develop ethical AI applications. In this episode of Leaders of Analytics, we discuss: What AI ethics is and why it’s importantThe most common dilemmas or challenges we face when it comes to AI ethicsWhether AI driven curation of information is a good thing or a bad thingHow we can develop a framework for dealing with ethical dilemmas at scaleHow governments might regulate AI or introduce other incentives to achieve ethical AI by designHow leaders can get prepared for managing and governing the ethical implications of using AI in their operations, and much more.
talk-data.com
Topic
Analytics
4552
tagged
Activity Trend
Top Events
Summary The modern data stack has been gaining a lot of attention recently with a rapidly growing set of managed services for different stages of the data lifecycle. With all of the available options it is possible to run a scalable, production grade data platform with a small team, but there are still sharp edges and integration challenges to work through. Peter Fishman and Dan Silberman experienced these difficulties firsthand and created Mozart Data to provide a single, easy to use option for getting started with the modern data stack. In this episode they explain how they designed a user experience to make working with data more accessibly by organizations without a data team, while allowing for more advanced users to build out more complex workflows. They also share their thoughts on the modern data ecosystem and how it improves the availability of analytics for companies of all sizes.
Announcements
Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets & code, Atlan enables teams to create a single source of truth for all their data assets, and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker and more. Go to dataengineeringpodcast.com/atlan today and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $3000 on an annual subscription Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days. Datafold helps Data teams gain visibility and confidence in the quality of their analytical data through data profiling, column-level lineage and intelligent anomaly detection. Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Go to dataengineeringpodcast.com/datafold today to start a 30-day trial of Datafold. Your host is Tobias Macey and today I’m interviewing Peter Fishman and Dan Silberman about Mozart Data and how they are building a unified experience for the modern data stack
Interview
Introduction How did you get involved in the area of data management? Can you describe what Mozart Data is and the story behind it? The promise of the "modern data stack" is that it’s all delivered as a service to make it easier to set up. What are the missing pieces that make something like Mozart necessary? What are the main workflows or industries that you are focusing on? Who are the main personas that you are building Mozart for?
How has that combination of user persona and industry focus informed your decisions around feature priorities and user experience?
Can you describe how you have architected the Mozart platform?
How have you approached the bu
Summary The data that you have access to affects the questions that you can answer. By using external data sources you can drastically increase the range of analysis that is available to your organization. The challenge comes in all of the operational aspects of finding, accessing, organizing, and serving that data. In this episode Mark Hookey discusses how he and his team at Demyst do all of the DataOps for external data sources so that you don’t have to, including the systems necessary to organize and catalog the various collections that they host, the various serving layers to provide query interfaces that match your platform, and the utility of having a single place to access a multitude of information. If you are having trouble answering questions for your business with the data that you generate and collect internally, then it is definitely worthwhile to explore the information available from external sources.
Announcements
Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Struggling with broken pipelines? Stale dashboards? Missing data? If this resonates with you, you’re not alone. Data engineers struggling with unreliable data need look no further than Monte Carlo, the world’s first end-to-end, fully automated Data Observability Platform! In the same way that application performance monitoring ensures reliable software and keeps application downtime at bay, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo monitors and alerts for data issues across your data warehouses, data lakes, ETL, and business intelligence, reducing time to detection and resolution from weeks or days to just minutes. Start trusting your data with Monte Carlo today! Visit dataengineeringpodcast.com/montecarlo to learn more. The first 10 people to request a personalized product tour will receive an exclusive Monte Carlo Swag box. Are you bored with writing scripts to move data into SaaS tools like Salesforce, Marketo, or Facebook Ads? Hightouch is the easiest way to sync data into the platforms that your business teams rely on. The data you’re looking for is already in your data warehouse and BI tools. Connect your warehouse to Hightouch, paste a SQL query, and use their visual mapper to specify how data should appear in your SaaS systems. No more scripts, just SQL. Supercharge your business teams with customer data using Hightouch for Reverse ETL today. Get started for free at dataengineeringpodcast.com/hightouch. Your host is Tobias Macey and today I’m interviewing Mark Hookey about Demyst Data, a platform for operationalizing external data
Interview
Introduction How did you get involved in the area of data management? Can you describe what Demyst is and the story behind it?
What are the services and systems that you provide for organizations to incorporate external sources in their data workflows? Who are your target customers?
What are some examples of data sets that an organization might want to use in their analytics?
How are these different from SaaS data that an organization might integrate with tools such as Stitcher and Fivetran?
What are some of the challenges that are introduced by working with these external data sets?
If an organization isn’t using Demyst what are some
Dive into the world of advanced analytics and visualizations in Power BI with "Extending Power BI with Python and R". This comprehensive guide will teach you how to integrate Python and R scripting into your Power BI projects, allowing you to build data models, transform data, and create rich visualizations. Learn practical techniques to make your Power BI dashboards more interactive and insightful. What this Book will help me do Master the integration of Python and R scripts into Power BI to enhance its functionality. Learn to implement advanced data transformations and enrichments using external APIs. Create advanced visualizations and custom visuals with R for improved analytics. Perform advanced data analysis including handling missing data using Python and R. Leverage machine learning techniques within Power BI projects to extract actionable insights. Author(s) None Zavarella is a data science expert and renowned author specializing in data analytics and visualization tools. With years of experience working with Power BI, Python, and R in diverse data-driven projects, Zavarella offers a unique perspective on enhancing Power BI capabilities. Passionate about teaching, they craft clear and impactful tutorials for learners. Who is it for? This book is perfect for business intelligence professionals, data scientists, and business analysts who already use Power BI and want to augment its features with Python and R. If you have a foundational understanding of Power BI and some basic familiarity with Python and R, this book will help you explore their combined potential for advanced analytics.
Learn Tableau by working through concrete examples and issues that you are likely to face in your day-to-day work. Author Shankar Arul starts by teaching you the fundamentals of data analytics before moving on to the core concepts of Tableau. You will learn how to create calculated fields, and about the currently available calculation functionalities in Tableau, including Basic Expressions, Level of Detail (LOD) Expressions, and Table Calculations. As the book progresses, you’ll be walked through comparisons and trend calculations using tables. A concluding chapter on dashboarding will show you how to build actionable dashboards to communicate analysis and visualizations. You’ll also see how Tableau can complement and communicate with Excel. After completing this book, you will be ready to tackle the challenges of data analytics using Tableau without getting bogged down by the technicalities of the tool. What Will You Learn Master the core concepts of Tableau Automate and simplify dashboards to help business users Understand the basics of data visualization techniques Leverage powerful features such as parameters, table calculations, level of detail expressions, and more Who is This book For Business analysts, data analysts, as well as financial analysts.
Mark, Ryan, and Cris discuss inflation throughout the history of the United States and whether we're in the midst of an era. Full episode transcript here.
Questions or Comments, please email us at [email protected]. We would love to hear from you. To stay informed and follow the insights of Moody's Analytics economists, visit Economic View.
On this episode, we chat with Aileen Crowley, former Vice President of Global Streaming Marketing at Universal Music Group. Before leaving the major label world in November 2020, Aileen devised data-driven streaming strategy for developing artists, working directly with artist management to translate streaming analytics, develop artist release strategies, and implement plans for audience growth.
Prior to that, Aileen was the General Manager of DigSin, a subscription-based independent music label focused on singles, playlisting, and data, as well as being an artist manager—and that was after spending almost seven years at world-renowned consulting firm McKinsey & Co. Today, Aileen runs The Streaming Story, a website dedicated to contextualizing streaming success with the narrative surrounding that success. Since recording this interview, Aileen has teamed up with Lark42, a digital consultancy that solves hard problems in the music, data, blockchain, streaming and startup space. You can connect with Aileen on LinkedIn here. If you want more free insights, follow our podcast, our blog, and our socials. If you're an artist with a free Chartmetric account, sign up for the artist plan, made exclusively for you, here. If you're new to Chartmetric, follow the URL above after creating a free account here.
Summary One of the perennial challenges posed by data lakes is how to keep them up to date as new data is collected. With the improvements in streaming engines it is now possible to perform all of your data integration in near real time, but it can be challenging to understand the proper processing patterns to make that performant. In this episode Ori Rafael shares his experiences from Upsolver and building scalable stream processing for integrating and analyzing data, and what the tradeoffs are when coming from a batch oriented mindset.
Announcements
Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets & code, Atlan enables teams to create a single source of truth for all their data assets, and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker and more. Go to dataengineeringpodcast.com/atlan today and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $3000 on an annual subscription Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days. Datafold helps Data teams gain visibility and confidence in the quality of their analytical data through data profiling, column-level lineage and intelligent anomaly detection. Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Go to dataengineeringpodcast.com/datafold today to start a 30-day trial of Datafold. Your host is Tobias Macey and today I’m interviewing Ori Rafael about strategies for building stream and batch processing patterns for data lake analytics
Interview
Introduction How did you get involved in the area of data management? Can you start by giving an overview of the state of the market for data lakes today?
What are the prevailing architectural and technological patterns that are being used to manage these systems?
Batch and streaming systems have been used in various combinations since the early days of Hadoop. The Lambda architecture has largely been abandoned, so what is the answer for today’s data lakes? What are the challenges presented by streaming approaches to data transformations?
The batch model for processing is intuitive despite its latency problems. What are the benefits that it provides?
The core concept for data orchestration is the DAG. How does that manifest in a streaming context? In batch processing idempotent/immutable datasets are created by re-running the entire pipeline when logic changes need to be made. Given that there is no definitive start or end of a stream, what are the options for amending logical errors in transformations? What are some of the da
Diane Swonk, Chief Economist of Grant Thornton, joins Mark, Cris, and Ryan to discuss the current state of the American consumer. They focus on what factors are driving the holiday sales, excess savings, and an outlook on inflation and it's effects on consumers. Full episode transcript can be found here. Recommended Read The Passionate Economist: Finding the Power and Humanity Behind the Numbers, by Diane Swonk, purchase a copy here.
Questions or Comments, please email us at [email protected]. We would love to hear from you. To stay informed and follow the insights of Moody's Analytics economists, visit Economic View.
Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
Delve into the serverless world of Amazon Athena with the comprehensive book 'Serverless Analytics with Amazon Athena'. This guide introduces you to the power of Athena, showing you how to efficiently query data in Amazon S3 using SQL without the hassle of managing infrastructure. With clear instructions and practical examples, you'll master querying structured, unstructured, and semi-structured data seamlessly. What this Book will help me do Effectively query and analyze both structured and unstructured data stored in S3 using Amazon Athena. Integrate Athena with other AWS services to create powerful, secure, and cost-efficient data workflows. Develop ETL pipelines and machine learning workflows leveraging Athena's compatibility with AWS Glue. Monitor and troubleshoot Athena queries for consistent performance and build scalable serverless data solutions. Implement security best practices and optimize costs when managing your Athena-driven data solutions. Author(s) None Virtuoso, along with co-authors Mert Turkay Hocanin None and None Wishnick, brings a wealth of experience in cloud solutions, serverless technologies, and data engineering. They excel in demystifying complex technical topics and have a passion for empowering readers with practical skills and knowledge. Who is it for? This book is tailored for business intelligence analysts, application developers, and system administrators who want to harness Amazon Athena for seamless, cost-efficient data analytics. It suits individuals with basic SQL knowledge looking to expand their capabilities in querying and processing data. Whether you're managing growing datasets or building data-driven applications, this book provides the know-how to get it right.
David is Sr. Director of Data at Lyst, and as leader of their analytics + data science teams he has followed the evolution of data roles closely over the past decade. David spends a lot of time thinking about career progression + data team structure, and in this conversation with Tristan + Julia they dive into the classic individual contributor vs manager conundrum, migrating between warehouses, and reactive vs proactive data workflows. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.
Send us a text Want to be featured as a guest on Making Data Simple? Reach out to us at [[email protected]] and tell us why you should be next.
Abstract Making Data Simple Podcast is hosted by Al Martin, VP, IBM Expert Services Delivery, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun. This week on Making Data Simple, we have Nick Amabile. Nick is the CEO is DAS42, a US data analytics consulting firm that helps companies make better decisions, faster. Founded in 2015, DAS42 is comprised of data analysts, scientists, business professionals, and engineers who provide end-to-end data services —including data strategy, tech stack integrations, application implementation, and enterprise analytics training. Nick’s philosophy is centered around the two components critical to achieving data-driven success: building an effective data analytics environment and building a data-centric company culture. Show Notes 2:15 – Nick’s history 4:22 – DAS42 8:52 – Is your brand consulting? 11:07 – What do you do different? 14:39 – What’s important about consulting? 18:25 – Is managed services cost effective? 21:18 – What still surprises you? 25:32 – What metrics do you look at? 28:11 – How has the pandemic affected you? 32:28 – Why venture capital fund? 34:49 – What are your practices today? Website: das42 Email – [email protected] Connect with the Team Producer Kate Brown - LinkedIn. Producer Steve Templeton - LinkedIn. Host Al Martin - LinkedIn and Twitter. Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.
Learn data science with Python by building five real-world projects! Experiment with card game predictions, tracking disease outbreaks, and more, as you build a flexible and intuitive understanding of data science. In Data Science Bookcamp you will learn: Techniques for computing and plotting probabilities Statistical analysis using Scipy How to organize datasets with clustering algorithms How to visualize complex multi-variable datasets How to train a decision tree machine learning algorithm In Data Science Bookcamp you’ll test and build your knowledge of Python with the kind of open-ended problems that professional data scientists work on every day. Downloadable data sets and thoroughly-explained solutions help you lock in what you’ve learned, building your confidence and making you ready for an exciting new data science career. About the Technology A data science project has a lot of moving parts, and it takes practice and skill to get all the code, algorithms, datasets, formats, and visualizations working together harmoniously. This unique book guides you through five realistic projects, including tracking disease outbreaks from news headlines, analyzing social networks, and finding relevant patterns in ad click data. About the Book Data Science Bookcamp doesn’t stop with surface-level theory and toy examples. As you work through each project, you’ll learn how to troubleshoot common problems like missing data, messy data, and algorithms that don’t quite fit the model you’re building. You’ll appreciate the detailed setup instructions and the fully explained solutions that highlight common failure points. In the end, you’ll be confident in your skills because you can see the results. What's Inside Web scraping Organize datasets with clustering algorithms Visualize complex multi-variable datasets Train a decision tree machine learning algorithm About the Reader For readers who know the basics of Python. No prior data science or machine learning skills required. About the Author Leonard Apeltsin is the Head of Data Science at Anomaly, where his team applies advanced analytics to uncover healthcare fraud, waste, and abuse. Quotes Valuable and accessible… a solid foundation for anyone aspiring to be a data scientist. - Amaresh Rajasekharan, IBM Corporation Really good introduction of statistical data science concepts. A must-have for every beginner! - Simone Sguazza, University of Applied Sciences and Arts of Southern Switzerland A full-fledged tutorial in data science including common Python libraries and language tricks! - Jean-François Morin, Laval University This book is a complete package for understanding how the data science process works end to end. - Ayon Roy, Internshala
There are so many ways to use AI technology in retail to improve customer experience, optimise supply chains and reduce waste. Yet it seems to me that most innovations in the retail industry over the last 30 years have focused on automating labour-intensive tasks. In my personal opinion, the retail customer experience has not improved markedly in my lifetime, and in some cases, it has gotten worse. Anyone who’s ever interacted with a self-checkout machine will know what I mean. So, what is next for the retail industry and what can technology and data science do to improve efficiency and customer experience across the many disparate parts of retailing? To answer these questions, I recently spoke to Shantha Mohan who is a true expert in the field. Shantha is currently an Executive in Residence at the Integrated Innovation Institute at Carnegie Mellon University, where she co-delivers courses, contributes to curriculum design, and mentors students in their projects and practicums. Shantha is also a co-founder and long-time executive of Retail Solutions Inc (RSi) where she ran the company’s worldwide product Development team that built the products & services which made the company a leader in retail analytics solutions used by consumer packaged goods companies and retailers across the globe. She holds a PhD in Operations Management and a Bachelor of Engineering in Electronics and Communication Engineering. In this episode of Leaders of Analytics, we discuss: The applications of AI in retail with the most potential, for online and in-store shopping respectivelyThe differences between retail in developed and developing countries and how AI must be customised for different markets across the globe.The typical consequences of items being out of stock and how can AI and other relevant technologies help combat out-of-stock problems.Whether AI in retail will increase or diminish the ability for small retailers to compete, and much more.
In this episode of DataFramed, we speak with Andy Cotgreave, Technical Evangelist at Tableau about the role of data storytelling when driving change with analytics, and the importance of the analyst role within a data-driven organization.
Throughout the episode, Andy discusses his background, the skills every analyst should know to equip organizations with better data-driven decision making, his best practices for data storytelling, how he thinks about data literacy and ways to spread it within the organization, the importance of community when creating a data-driven organization, and more.
Relevant links from the interview:
We’d love your feedback! Let us know which topics you’d like us to cover and what you think of DataFramed by answering this 30-second surveyCheck out our upcoming webinar with AndyCheck out Andy's bookBecome a Tableau expert
Emilie Mazzacurati, Global Head of Moody's Climate Solutions, joins Mark, Ryan, and Cris to discuss the global economic impact of climate change, the potential effects of a carbon tax on the economy, and the climate risk policies in President Biden's Build Back Better plan. Full episode transcript here.
Questions or Comments, please email us at [email protected]. We would love to hear from you. To stay informed and follow the insights of Moody's Analytics economists, visit Economic View.
Send us a text Want to be featured as a guest on Making Data Simple? Reach out to us at [[email protected]] and tell us why you should be next.
Abstract Hosted by Al Martin, VP, IBM Expert Services Delivery, Making Data Simple provides the latest thinking on big data, A.I., and the implications for the enterprise from a range of experts. This week on Making Data Simple, we have Tim Freestone. Tim is the founder of Alooba. Alloba is a skills assessment platform for analytics, data science and data engineering. They help businesses identify the best candidates that apply for a role within its company. Show Notes 4:46 – How do you go from economics teacher to head of business intelligence? 7:53 – Do CV’s matter anymore? 13:22 – What business problem is Alooba solving? 16:05 – Do you have any data that supports your theory? 19:01 – Why analytics, data science, data engineering? 20:26 - What do you do that others don’t? 23:50 – How does Alooba define success? 25:42 – Who’s your target client base? 32:40 –Is there a customer you can talk about? 36:24 – What does Alooba mean? Alooba Connect with the Team Producer Kate Brown - LinkedIn. Producer Steve Templeton - LinkedIn. Host Al Martin - LinkedIn and Twitter. Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.
Mark, Ryan, and Cris dissect the October U.S. employment report and what it says about the state of the economy, wage growth, and inflation. We knew there were Hunger Games and Squid Games. Now add the Zandi Games. Full episode transcript here.
Questions or Comments, please email us at [email protected]. We would love to hear from you. To stay informed and follow the insights of Moody's Analytics economists, visit Economic View.
Summary The precursor to widespread adoption of cloud data warehouses was the creation of customer data platforms. Acting as a centralized repository of information about how your customers interact with your organization they drove a wave of analytics about how to improve products based on actual usage data. A natural outgrowth of that capability is the more recent growth of reverse ETL systems that use those analytics to feed back into the operational systems used to engage with the customer. In this episode Tejas Manohar and Rachel Bradley-Haas share the story of their own careers and experiences coinciding with these trends. They also discuss the current state of the market for these technological patterns and how to take advantage of them in your own work.
Announcements
Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Struggling with broken pipelines? Stale dashboards? Missing data? If this resonates with you, you’re not alone. Data engineers struggling with unreliable data need look no further than Monte Carlo, the world’s first end-to-end, fully automated Data Observability Platform! In the same way that application performance monitoring ensures reliable software and keeps application downtime at bay, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo monitors and alerts for data issues across your data warehouses, data lakes, ETL, and business intelligence, reducing time to detection and resolution from weeks or days to just minutes. Go to dataengineeringpodcast.com/montecarlo and start trusting your data with Monte Carlo today! Are you bored with writing scripts to move data into SaaS tools like Salesforce, Marketo, or Facebook Ads? Hightouch is the easiest way to sync data into the platforms that your business teams rely on. The data you’re looking for is already in your data warehouse and BI tools. Connect your warehouse to Hightouch, paste a SQL query, and use their visual mapper to specify how data should appear in your SaaS systems. No more scripts, just SQL. Supercharge your business teams with customer data using Hightouch for Reverse ETL today. Get started for free at dataengineeringpodcast.com/hightouch. Your host is Tobias Macey and today I’m interviewing Rachel Bradley-Haas and Tejas Manohar about the combination of operational analytics and the customer data platform
Interview
Introduction How did you get involved in the area of data management? Can we start by discussing what it means to have a "customer data platform"? What are the challenges that organizations face in establishing a unified view of their customer interactions?
How do the presence of multiple product lines impact the ability to understand the relationship with the customer?
We have been building data warehouses and business intelligence systems for decades. How does the idea of a CDP differ from the approaches of those previous generations? A recent outgrowth of the focus on creating a CDP is the introduction of "operational analytics", which was initially termed "reverse ETL". What are your opinions on the semantics and importance of these names?
What is the relationship between a CDP and operational analytics? (can you have one without the other?)
How have the capabilities
Julien has a unique history of building open frameworks that make data platforms interoperable. He's contributed in various ways to Apache Arrow, Apache Iceberg, Apache Parquet, and Marquez, and is currently leading OpenLineage, an open framework for data lineage collection and analysis. In this episode, Tristan & Julia dive into how open source projects grow to become standards, and why data lineage in particular is in need of an open standard. They also cover into some of the compelling use cases for this data lineage metadata, and where you might be able to deploy it in your work. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.