talk-data.com talk-data.com

Topic

API

Application Programming Interface (API)

integration software_development data_exchange

856

tagged

Activity Trend

65 peak/qtr
2020-Q1 2026-Q1

Activities

856 activities · Newest first

Summary

This podcast started almost exactly six years ago, and the technology landscape was much different than it is now. In that time there have been a number of generational shifts in how data engineering is done. In this episode I reflect on some of the major themes and take a brief look forward at some of the upcoming changes.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Your host is Tobias Macey and today I'm reflecting on the major trends in data engineering over the past 6 years

Interview

Introduction 6 years of running the Data Engineering Podcast Around the first time that data engineering was discussed as a role

Followed on from hype about "data science"

Hadoop era Streaming Lambda and Kappa architectures

Not really referenced anymore

"Big Data" era of capture everything has shifted to focusing on data that presents value

Regulatory environment increases risk, better tools introduce more capability to understand what data is useful

Data catalogs

Amundsen and Alation

Orchestration engine

Oozie, etc. -> Airflow and Luigi -> Dagster, Prefect, Lyft, etc. Orchestration is now a part of most vertical tools

Cloud data warehouses Data lakes DataOps and MLOps Data quality to data observability Metadata for everything

Data catalog -> data discovery -> active metadata

Business intelligence

Read only reports to metric/semantic layers Embedded analytics and data APIs

Rise of ELT

dbt Corresponding introduction of reverse ETL

What are the most interesting, unexpected, or challenging lessons that you have learned while working on running the podcast? What do you have planned for the future of the podcast?

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Sponsored By: Materialize: Materialize

Looking for the simplest way to get the freshest data possible to your teams? Because let's face it: if real-time were easy, everyone would be using it. Look no further than Materialize, the streaming database you already know how to use.

Materialize’s PostgreSQL-compatible interface lets users leverage the tools they already use, with unsurpassed simplicity enabled by full ANSI SQL support. Delivered as a single platform with the separation of storage and compute, strict-serializability, active replication, horizontal scalability and workload isolation — Materialize is now the fastest way to build products with streaming data, drastically reducing the time, expertise, cost and maintenance traditionally associated with implementation of real-time features.

Sign up now for early access to Materialize and get started with the power of streaming data with the same simplicity and low implementation cost as batch cloud data warehouses.

Go to materialize.comSupport Data Engineering Podcast

IBM Software Systems Integration: With IBM MQ Series for JMS, IBM FileNet Case Manager, and IBM Business Automation Workflow

Examine the working details for real-world Java programs used for system integration with IBM Software, applying various API libraries (as used by Banking and Insurance companies). This book includes the step-by-step procedure to use the IBM FileNet Case Manager 5.3.3 Case Builder solution and the similar IBM System, IBM Business Automation Workflow to create an Audit System. You'll learn how to implement the workflow with a client Java Message Service (JMS) java method developed with Workflow Custom Operations System Step components. Using IBM Cognos Analytics Version 11.2, you'll be able to create new views for IBM Case Manager Analytics for custom time dimensions. The book also explains the SQL code and procedures required to create example Online Analytical Processing (OLAP) cubes with multi-level time dimensions for IBM Case Manager analytics. IBM Software Systems Integration features the most up to date systems software procedures using tested API calls. What You Will Learn Review techniques for generating custom IBM JMS code Create a new custom view for a multi-level time dimension See how a java program can provide the IBM FileNet document management API calls for content store folder and document replication Configure Java components for content engine events Who This Book Is ForIT consultants, Systems and Solution Architects.

In this episode, Conor and Bryce finish their conversation with Jane Losare-Lusby about the Rust Programming Language. Link to Episode 108 on Website

Twitter ADSP: The PodcastConor HoekstraBryce Adelstein LelbachAbout the Guest: Jane Losare-Lusby is currently on both the Rust Library Team and the Rust Library API Team. She is also the Error Handling Project Group Lead, the Rust Foundation Project Director of Collaboration, and a Principal Rust Open Source Engineer at Futurewei Technologies.

Show Notes

Date Recorded: 2022-11-02 Date Released: 2022-12-16 https://cheats.rs/ADSP Episode 106: Jane Losare-Lusby on Rust!ADSP Episode 107: Jane Losare-Lusby on Rust! (Part 2)Rust Evangelism Strike ForceRust Evangelism StrikeforceRust Governance TeamsA List of Companies that Use Array Languages (J, K, APL, q)A List of companies that use HaskellHoogleRoogleKotlin Programming LanguageCarbon Language: An experimental successor to C++ - Chandler Carruth - CppNorth 2022Carbon GithubAwesome Rust MentorsClojure BridgeIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

Oracle Autonomous Database in Enterprise Architecture

Explore the capabilities of Oracle Autonomous Database (ADB) to improve enterprise-level data management. Through this book, you will dive deep into deploying, managing, and securing ADBs using Oracle Cloud Infrastructure (OCI). Gain hands-on experience with high-availability setups, data migration methods, and advanced security measures to elevate your enterprise architecture. What this Book will help me do Understand the key considerations for planning, migrating, and maintaining Oracle Autonomous Databases. Learn to implement high availability solutions using Autonomous Data Guard in ADB environments. Master the configuration of backup, restore, and disaster recovery strategies within OCI. Implement advanced security practices including encryption and IAM policy management. Gain proficiency in leveraging ADB features like APEX, SQL Developer Web, and REST APIs for rapid application development. Author(s) The authors None Sharma, Krishnakumar KM, and None Panda are experts in database systems, particularly in Oracle technologies. With years of hands-on experience implementing enterprise solutions and training professionals, they have pooled their knowledge to craft a resource-rich guide filled with practical advice. Who is it for? This book is ideal for cloud architects, database administrators, and implementation consultants seeking to leverage Oracle's Autonomous Database for enhanced automation, security, and scalability. It is well-suited for professionals with foundational knowledge of Linux, OCI, and databases. Aspiring cloud engineers and students aiming to understand modern database management will also benefit greatly.

Data Visualization with Python and JavaScript, 2nd Edition

How do you turn raw, unprocessed, or malformed data into dynamic, interactive web visualizations? In this practical book, author Kyran Dale shows data scientists and analysts--as well as Python and JavaScript developers--how to create the ideal toolchain for the job. By providing engaging examples and stressing hard-earned best practices, this guide teaches you how to leverage the power of best-of-breed Python and JavaScript libraries. Python provides accessible, powerful, and mature libraries for scraping, cleaning, and processing data. And while JavaScript is the best language when it comes to programming web visualizations, its data processing abilities can't compare with Python's. Together, these two languages are a perfect complement for creating a modern web-visualization toolchain. This book gets you started. You'll learn how to: Obtain data you need programmatically, using scraping tools or web APIs: Requests, Scrapy, Beautiful Soup Clean and process data using Python's heavyweight data processing libraries within the NumPy ecosystem: Jupyter notebooks with pandas+Matplotlib+Seaborn Deliver the data to a browser with static files or by using Flask, the lightweight Python server, and a RESTful API Pick up enough web development skills (HTML, CSS, JS) to get your visualized data on the web Use the data you've mined and refined to create web charts and visualizations with Plotly, D3, Leaflet, and other libraries

In this episode, Conor and Bryce continue their conversation with Jane Losare-Lusby about the Rust Programming Language. Link to Episode 107 on Website

Twitter ADSP: The PodcastConor HoekstraBryce Adelstein LelbachAbout the Guest: Jane Losare-Lusby is currently on both the Rust Library Team and the Rust Library API Team. She is also the Error Handling Project Group Lead, the Rust Foundation Project Director of Collaboration, and a Principal Rust Open Source Engineer at Futurewei Technologies.

Show Notes

Date Recorded: 2022-11-02 Date Released: 2022-12-09 https://cheats.rs/ADSP Episode 106: Jane Losare-Lusby on Rust!Rust std::slice::iterRust std:IntoIterator::into_iterC++20 ConceptsRust TraitsC++ Pattern Matching ProposalC++ Pattern matching using is and asO3DCON 2022: Keynote C++ Horizons Bryce Adelstein Lelbachwww.crates.ioADSO Episode 92: Special Guest Kate Gregory!C++Club Episode 155: WG21 October mailing, Carbon, Cpp2, SafetyIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

In this episode, Conor and Bryce talk to Jane Losare-Lusby about the Rust Programming Language. Link to Episode 106 on Website

Twitter ADSP: The PodcastConor HoekstraBryce Adelstein LelbachAbout the Guest: Jane Losare-Lusby is currently on both the Rust Library Team and the Rust Library API Team. She is also the Error Handling Project Group Lead, the Rust Foundation Project Director of Collaboration, and a Principal Rust Open Source Engineer at Futurewei Technologies.

Show Notes

Date Recorded: 2022-11-02 Date Released: 2022-12-02 https://cheats.rs/Rustacean Station: Error Handling in Rust with Jane Losare-LusbyAre We Podcast Yet with Jane Losare-LusbyADSP poll about becoming a Rust podcastConor’s Tweet about /cpp vs /rustADSP Episode 101: C++ Developers Try Rust!C++23 std::views::zipRust std::iter::Iterator::zipRust ClippyRust TraitsC++20 ConceptsEsteban K ber on TwitterRust unsafeRust miriThis Week in RustRust AnalyzerRust std::iter::Iterator::flat_mapRust std::iter::Iterator::enumerateIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

Today I’m discussing something we’ve been talking about a lot on the podcast recently - the definition of a “data product.” While my definition is still a work in progress, I think it’s worth putting out into the world at this point to get more feedback. In addition to sharing my definition of data products (as defined the “producty way”), on today’s episode definition, I also discuss some of the non-technical skills that data product managers (DPMs) in the ML and AI space need if they want to achieve good user adoption of their solutions. I’ll also share my thoughts on whether data scientists can make good data product managers, what a DPM can do to better understand your users and stakeholders, and how product and UX design factors into this role. 

Highlights/ Skip to:

I introduce my reasons for sharing my definition of a data product (0:46) My definition of data product (7:26) Thinking the “producty” way (8:14) My thoughts on necessary skills for data PMs (in particular, AI & machine learning product management) (12:21) How data scientists can become good data product managers (DPMs) by taking off the data science hat (13:42) Understanding the role of UX design within the context of DPM (16:37) Crafting your sales and marketing strategies to emphasize the value of your product to the people who can use or purchase it (23:07) How to build a team that will help you increase adoption of your data product (30:01) How to build relationships with stakeholders/customers that allow you to find the right solutions for them (33:47) Letting go of a technical identity to develop a new identity as a DPM who can lead a team to build a product that actually gets used (36:32)

Quotes from Today’s Episode “This is what’s missing in some of the other definitions that I see around data products  [...] they’re not talking about it from the customer of the data product lens. And that orientation sums up all of the work that I’m doing and trying to get you to do as well, which is to put the people at the center of the work that you’re doing and not the data science, engineering, tech, or design. I want you to put the people at the center.” (6:12) “A data product is a data-driven, end-to-end, human-in-the-loop decision support solution that’s so valuable, users would potentially pay to use it.” (7:26) “I want to plunge all the way in and say, ‘if you want to do this kind of work, then you need to be thinking the product-y way.’ And this means inherently letting go of some of the data science-y way of thinking and the data-first kinds of ways of thinking.” (11:46) “I’ve read in a few places that data scientists don’t make for good data product managers. [While it may be true that they’re more introverted,] I don’t think that necessarily means that there’s an inherent problem with data scientists becoming good data product managers. I think the main challenge will be—and this is the same thing for almost any career transitioning into product management—is knowing when to let go of your former identity and wear the right hat at the right time.” (14:24) “Make better things for people that will improve their life and their outcomes and the business value will follow if you’ve properly aligned those two things together.” (17:21) “The big message here is this: there is always a design and experience, whether it is an API, or a platform, a dashboard, a full application, etc. Since there are no null design choices, how much are you going to intentionally shape that UX, or just pray that it comes out good on the other end? Prayer is not really a reliable strategy.  If you want to routinely do this work right, you need to put intention behind it.” (22:33)  “Relationship building is a must, and this is where applying user experience research can be very useful—not just for users, but also with stakeholders. It’s learning how to ask really good questions and learning the feelings, emotions, and reasons why people ask your team to build the thing that they’ve asked for. Learning how to dig into that is really important.” (26:26)

Links Designing for Analytics Community Work With Me Email Record a question

R 4 Data Science Quick Reference: A Pocket Guide to APIs, Libraries, and Packages

In this handy, quick reference book you'll be introduced to several R data science packages, with examples of how to use each of them. All concepts will be covered concisely, with many illustrative examples using the following APIs: readr, dibble, forecasts, lubridate, stringr, tidyr, magnittr, dplyr, purrr, ggplot2, modelr, and more. With R 4 Data Science Quick Reference, you'll have the code, APIs, and insights to write data science-based applications in the R programming language. You'll also be able to carry out data analysis. All source code used in the book is freely available on GitHub.. What You'll Learn Implement applicable R 4 programming language specification features Import data with readr Work with categories using forcats, time and dates with lubridate, and strings with stringr Format data using tidyr and then transform that data using magrittr and dplyr Write functions with R for data science, data mining, and analytics-based applications Visualize data with ggplot2 and fit data to models using modelr Who This Book Is For Programmers new to R's data science, data mining, and analytics packages. Some prior coding experience with R in general is recommended.

Data as a Product: How dbt powers Slido’s Data API

What do you do when you find out that your team is being tasked with building a single platform that should be able to serve everyone's data needs, no matter whether they are internal (from within your company) or external (your customers)? What's more it's expected to be fast, stable, granular, sophisticated, simple, scalable, usable, easy to maintain, compatible… the list goes on.

Well, time to find a new-school solution. We'll walk you through our story of how and why we built Slido's dataAPI using everyone's favourite Analytics Engineering tool, dbt.

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Standardizing the unstandardized: dbt modeling for Web 3.0

Web 3.0 is constantly evolving: everyday there are new smart contracts, project updates, tokens and chains. And just because the data is on the blockchain and public does not mean it's easily accessible and digestible. It may be easy to monitor specific parts of the blockchain such as a particular smart contract, but how do you build a scalable infrastructure flexible enough to account for any new business request in a rapidly evolving industry?

Join Alec Kamra (Mythical Games) as he shows you how to do just this. As a blockchain gaming platform, Mythical Games has built a stable and flexible multi-chain solution using Google BQ's public datasets and a few external APIs that allow them to monitor all trades and transfers for their business needs.

Check the slides here:https://docs.google.com/presentation/d/14wcMHKAGhm9qvZ2NlPSv1wQj5FJF-ttL4k3lAivt6Fk/edit?usp=sharing

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Automating CI/CD in dbt Cloud: Sunrun's story

Does a two-step deployment workflow for developing, testing, and deploying code to dbt Cloud sound possible? Sunrun thinks so. Join James Sorensen and Jared Stout to learn how they used Github Actions and API integrations with dbt Cloud and Jira to entirely automate the CI/CD workflow, saving the team time and worry when moving through SOX certification.

Check the slides here: https://docs.google.com/presentation/d/1ZecU0-TN8SxNFpdKdkVksuDjpUy6XiaulqBdfqhLb68/edit#slide=id.g15507761f0b_0_10

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Intelligent Document Processing with AWS AI/ML

Dive into the world of Intelligent Document Processing (IDP) with the power of AWS AI/ML. This book guides you from understanding the challenges of document processing to building effective IDP pipelines using advanced AWS APIs and Python. Through hands-on projects and real-world applications, this book will equip you with the skills needed to automate and unlock value from your document workflows. What this Book will help me do Understand the stages and challenges of the Intelligent Document Processing pipeline. Learn how to automate document processing workflow using AWS AI services. Acquire practical insights into Python libraries for document processing. Discover industry applications including healthcare and financial sectors. Develop the skill to solve real-world IDP problems with AI/ML. Author(s) Sonali Sahu is a seasoned AI/ML consultant and author with a focus on innovative technologies for industry problems. With extensive hands-on project experience and deep expertise in AWS AI/ML tools, she bridges the gap between theory and application. Her writing is approachable and practical, aimed to empower technical practitioners to excel. Who is it for? This book is aimed at developers, data scientists, and technical professionals wanting to leverage AWS AI/ML for document processing. Aimed at intermediate-level professionals, the content helps those with a working knowledge of Python or AI tools to enhance their skills. Whether you're in healthcare, finance, or a similar field, this book equips you to address document-centric problems using cutting-edge solutions.

Data orchestration uses caching, APIs, and centralized metadata to help compute engines access data in hybrid or multi-cloud environments. Data platform engineers can use data orchestration to gain simple, flexible, and high-speed access to distributed data for modern analytics and AI projects. Published at: https://www.eckerson.com/articles/data-orchestration-simplifying-data-access-for-analytics

Summary Data observability is a product category that has seen massive growth and adoption in recent years. Monte Carlo is in the vanguard of companies who have been enabling data teams to observe and understand their complex data systems. In this episode founders Barr Moses and Lior Gavish rejoin the show to reflect on the evolution and adoption of data observability technologies and the capabilities that are being introduced as the broader ecosystem adopts the practices.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos. RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state-of-the-art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder. The only thing worse than having bad data is not knowing that you have it. With Bigeye’s data observability platform, if there is an issue with your data or data pipelines you’ll know right away and can get it fixed before the business is impacted. Bigeye let’s data teams measure, improve, and communicate the quality of your data to company stakeholders. With complete API access, a user-friendly interface, and automated yet flexible alerting, you’ve got everything you need to establish and maintain trust in your data. Go to dataengineeringpodcast.com/bigeye today to sign up and start trusting your analyses. Your host is Tobias Macey and today I’m interviewing Barr Moses and Lior Gavish about the state of the market for data observability and their own work at Monte Carlo

Interview

Introduction How did you get involved in the area of data management? Can you give the elevator pitch for Monte Carlo?

What are the notable changes in the Monte Carlo product and business since our last conversation in October 2020?

You were one of the early entrants in the market of data quality/data observability products. In your work to gain visibility and traction you invested substantially in content creation (blog posts, presentations, round table conversations, etc.). How would you summarize the focus of your initial efforts? Why do you think data observability has really taken off? A few years ago, the category barely existed – what’s changed? There’s a larger debate within

Summary The dream of every engineer is to automate all of their tasks. For data engineers, this is a monumental undertaking. Orchestration engines are one step in that direction, but they are not a complete solution. In this episode Sean Knapp shares his views on what constitutes proper automation and the work that he and his team at Ascend are doing to help make it a reality.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos. RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state-of-the-art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder. The only thing worse than having bad data is not knowing that you have it. With Bigeye’s data observability platform, if there is an issue with your data or data pipelines you’ll know right away and can get it fixed before the business is impacted. Bigeye let’s data teams measure, improve, and communicate the quality of your data to company stakeholders. With complete API access, a user-friendly interface, and automated yet flexible alerting, you’ve got everything you need to establish and maintain trust in your data. Go to dataengineeringpodcast.com/bigeye today to sign up and start trusting your analyses. Your host is Tobias Macey and today I’m interviewing Sean Knapp about the role of data automation in building maintainable systems

Interview

Introduction How did you get involved in the area of data management? Can you describe what you mean by the term "data automation" and the assumptions that it includes? One of the perennial challenges of automation is that there are always steps that are resistant to being performed without human involvement. What are some of the tasks that you have found to be common problems in that sense? What are the different concerns that need to be included in a stack that supports fully automated data workflows? There was recently an interesting article suggesting that the "left-to-right" approach to data workflows is backwards. In your experience, what would be required to allow for triggering data processes based on the needs of the data consumers? (e.g. "make sure that this BI dashboard is up to date every 6 hours") What are the

Summary The position of Chief Data Officer (CDO) is relatively new in the business world and has not been universally adopted. As a result, not everyone understands what the responsibilities of the role are, when you need one, and how to hire for it. In this episode Tracy Daniels, CDO of Truist, shares her journey into the position, her responsibilities, and her relationship to the data professionals in her organization.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos. RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state-of-the-art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder. The only thing worse than having bad data is not knowing that you have it. With Bigeye’s data observability platform, if there is an issue with your data or data pipelines you’ll know right away and can get it fixed before the business is impacted. Bigeye let’s data teams measure, improve, and communicate the quality of your data to company stakeholders. With complete API access, a user-friendly interface, and automated yet flexible alerting, you’ve got everything you need to establish and maintain trust in your data. Go to dataengineeringpodcast.com/bigeye today to sign up and start trusting your analyses. Your host is Tobias Macey and today I’m interviewing Tracy Daniels about the role and responsibilities of the Chief Data Officer and how it is evolving along with the ecosystem

Interview

Introduction How did you get involved in the area of data management? Can you describe what your path to CDO of Truist has been?

As a CDO, what are your responsibilities and scope of influence?

Not every organization has an explicit position for the CDO. What are the factors that determine when that should be a distinct role?

What is the relationship and potential overlap with a CTO?

As the CDO of Truist, what are some of the projects/activities that are vying for your time and attention? Can you share the composition of your teams and how you think about organizational structure and integration for data professionals in your company? What are the industry and business trends that are having the greatest impact on your work as a

In this episode, we’re talking to Brook Lovatt, Chief Executive Officer at Cloudentity. Cloudentity is a company that provides application and security teams with a better way to automate and control how information is shared over APIs.   We talk about the problems Cloudentity solves and how it came to be, along with the options available to today’s SaaS companies when it comes to building a security authorization layer. Brook shares some of the positive impacts of facilitating data sharing.   We discuss the differences between data and API, how SaaS has changed over time, the shift towards more product-oriented CEOs (and the advantages of this as a company scales), and the trend of selling software directly to developers.   Finally, we look at the growing importance of being a product specialist, and what the future holds for SaaS and developers.   This episode is brought to you by Qrvey The tools you need to take action with your data, on a platform built for maximum scalability, security, and cost efficiencies. If you’re ready to reduce complexity and dramatically lower costs, contact us today at qrvey.com. Qrvey, the modern no-code analytics solution for SaaS companies on AWS. 

Summary Data mesh is a frequent topic of conversation in the data community, with many debates about how and when to employ this architectural pattern. The team at AgileLab have first-hand experience helping large enterprise organizations evaluate and implement their own data mesh strategies. In this episode Paolo Platter shares the lessons they have learned in that process, the Data Mesh Boost platform that they have built to reduce some of the boilerplate required to make it successful, and some of the considerations to make when deciding if a data mesh is the right choice for you.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos. Prefect is the modern Dataflow Automation platform for the modern data stack, empowering data practitioners to build, run and monitor robust pipelines at scale. Guided by the principle that the orchestrator shouldn’t get in your way, Prefect is the only tool of its kind to offer the flexibility to write code as workflows. Prefect specializes in glueing together the disparate pieces of a pipeline, and integrating with modern distributed compute libraries to bring power where you need it, when you need it. Trusted by thousands of organizations and supported by over 20,000 community members, Prefect powers over 100MM business critical tasks a month. For more information on Prefect, visit dataengineeringpodcast.com/prefect. The only thing worse than having bad data is not knowing that you have it. With Bigeye’s data observability platform, if there is an issue with your data or data pipelines you’ll know right away and can get it fixed before the business is impacted. Bigeye let’s data teams measure, improve, and communicate the quality of your data to company stakeholders. With complete API access, a user-friendly interface, and automated yet flexible alerting, you’ve got everything you need to establish and maintain trust in your data. Go to dataengineeringpodcast.com/bigeye today to sign up and start trusting your analyses. Your host is Tobias Macey and today I’m interviewing Paolo Platter about Agile Lab’s lessons learned through helping large enterprises establish their own data mesh

Interview

Introduction How did you get involved in the area of data management? Can you share your experiences working with data mesh implementations? What were the stated goals of project engagements that led to data mesh implementations? What are some examples of projects where you explored data mesh as an option and decided that it was a poor fit? What are some of the technical and process investments that are necessary to support a mesh str

Summary Data lineage is the roadmap for your data platform, providing visibility into all of the dependencies for any report, machine learning model, or data warehouse table that you are working with. Because of its centrality to your data systems it is valuable for debugging, governance, understanding context, and myriad other purposes. This means that it is important to have an accurate and complete lineage graph so that you don’t have to perform your own detective work when time is in short supply. In this episode Ernie Ostic shares the approach that he and his team at Manta are taking to build a complete view of data lineage across the various data systems in your organization and the useful applications of that information in the work of every data stakeholder.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos. The only thing worse than having bad data is not knowing that you have it. With Bigeye’s data observability platform, if there is an issue with your data or data pipelines you’ll know right away and can get it fixed before the business is impacted. Bigeye let’s data teams measure, improve, and communicate the quality of your data to company stakeholders. With complete API access, a user-friendly interface, and automated yet flexible alerting, you’ve got everything you need to establish and maintain trust in your data. Go to dataengineeringpodcast.com/bigeye today to sign up and start trusting your analyses. Prefect is the modern Dataflow Automation platform for the modern data stack, empowering data practitioners to build, run and monitor robust pipelines at scale. Guided by the principle that the orchestrator shouldn’t get in your way, Prefect is the only tool of its kind to offer the flexibility to write code as workflows. Prefect specializes in glueing together the disparate pieces of a pipeline, and integrating with modern distributed compute libraries to bring power where you need it, when you need it. Trusted by thousands of organizations and supported by over 20,000 community members, Prefect powers over 100MM business critical tasks a month. For more information on Prefect, visit dataengineeringpodcast.com/prefect. Your host is Tobias Macey and today I’m interviewing Ernie Ostic about Manta, an automated data lineage service for managing visibility and quality of your data workflows

Interview

Introduction How did you get involved in the area of data management? Can you describe what Manta is and the story behind it? What are the core problems that Manta aims to solve? Data lineage and metadata systems are a hot topic right now. What i