talk-data.com talk-data.com

Topic

AWS Glue

etl data_catalog aws

6

tagged

Activity Trend

10 peak/qtr
2020-Q1 2026-Q1

Activities

6 activities · Newest first

This is part two of the framework; if you missed part one, head to episode 175 and start there so you're all caught up. 

In this episode of Experiencing Data, I continue my deep dive into the MIRRR UX Framework for designing trustworthy agentic AI applications. Building on Part 1’s “Monitor” and “Interrupt,” I unpack the three R’s: Redirect, Rerun, and Rollback—and share practical strategies for data product managers and leaders tasked with creating AI systems people will actually trust and use. I explain human-centered approaches to thinking about automation and how to handle unexpected outcomes in agentic AI applications without losing user confidence. I am hoping this control framework will help you get more value out of your data while simultaneously creating value for the human stakeholders, users, and customers.

Highlights / Skip to:

Introducing the MIRRR UX Framework (1:08) Designing for trust and user adoption plus perspectives you should be including when designing systems. (2:31) Monitor and interrupt controls let humans pause anything from a single AI task to the entire agent (3:17) Explaining “redirection” in the example context of use cases for claims adjusters working on insurance claims—so adjusters (users) can focus on important decisions. (4:35)  Rerun controls: lets humans redo an angentic task after unexpected results, preventing errors and building trust in early AI rollouts (11:12) Rerun vs. Redirect: what the difference is in the context of AI, using additional use cases from the insurance claim processing domain  (12:07) Empathy and user experience in AI adoption, and how the most useful insights come from directly observing users—not from analytics (18:28) Thinking about agentic AI as glue for existing applications and workflows, or as a worker  (27:35)

Quotes from Today’s Episode

The value of AI isn’t just about technical capability, it’s based in large part on whether the end-users will actually trust and adopt it. If we don’t design for trust from the start, even the most advanced AI can fail to deliver value."

"In agentic AI, knowing when to automate is just as important as knowing what to automate. Smart product and design decisions mean sometimes holding back on full automation until the people, processes, and culture are ready for it."

"Sometimes the most valuable thing you can do is slow down, create checkpoints, and give people a chance to course-correct before the work goes too far in the wrong direction."

"Reruns and rollbacks shouldn’t be seen as failures, they’re essential safety mechanisms that protect both the integrity of the work and the trust of the humans in the loop. They give people the confidence to keep using the system, even when mistakes happen."

"You can’t measure trust in an AI system by counting logins or tracking clicks. True adoption comes from understanding the people using it, listening to them, observing their workflows, and learning what really builds or breaks their confidence."

"You’ll never learn the real reasons behind a team’s choices by only looking at analytics, you have to actually talk to them and watch them work."

"Labels matter, what you call a button or an action can shape how people interpret and trust what will happen when they click it."

Quotes from Today’s Episode

Part 1: The MIRRR UX Framework for Designing Trustworthy Agentic AI Applications 

Michael Toland is a Product Management Consultant and blog contributor with ⁠Test Double⁠, residing in Columbus, OH. His experience spans 8 formal years of internal Product Management, with a few additional years of doing Product Management without even knowing what the field really was. In this episode, Michael shared how a data empowered company the size of Verizon was able to drastically reduce time-to-market metrics, experiment, and run data product MVPs in production. The reference data became a cornerstone of Verizon's go-to-market strategy and a glue for different teams and departments. One of the key takeaways is that to deliver value with data products and architect them effectively, one does not need to be a data wizard but rather have a passion for solving problems. Michael is also the author of an infrequently updated product satire site, ⁠Dignified Product.⁠

Send us a text Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society. Dive into conversations that should flow as smoothly as your morning coffee (but don't), where industry insights meet laid-back banter. Whether you're a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let's get into the heart of data, unplugged style! Bookmarklet Maker: Discover how to automate tasks with the Bookmarklet Maker, a tool for turning scripts into handy browser bookmarks. RouteLLM Framework: Explore the RouteLLM framework by LMSys and Anyscale, designed to optimize the cost-performance ratio of LLM routers. Learn more about this collaboration at LMSys and Anyscale. Q for SQL on CSV/TSV: Meet Q, a command-line tool that lets you run SQL queries directly on CSV or TSV files, simplifying data exploration from your terminal. DuckDB Community Extensions: Check out the latest updates in DuckDB's community extensions and see how this database system is evolving. Apple Intelligence and AI Maximalism: Explore Apple's AI strategy, their avoidance of chat UIs, risk management with OpenAI, and the shift of compute costs to users. Being Glue: Delve into the challenges of being "Glue" at work. Explore why women are more likely to take on non-promotable work and how this affects career progression and workplace dynamics.

Summary

All of the advancements in our technology is based around the principles of abstraction. These are valuable until they break down, which is an inevitable occurrence. In this episode the host Tobias Macey shares his reflections on recent experiences where the abstractions leaked and some observances on how to deal with that situation in a data platform architecture.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. Sign up free at dataengineeringpodcast.com/rudderstack Your host is Tobias Macey and today I'm sharing some thoughts and observances about abstractions and impedance mismatches from my experience building a data lakehouse with an ELT workflow

Interview

Introduction impact of community tech debt

hive metastore new work being done but not widely adopted

tensions between automation and correctness data type mapping

integer types complex types naming things (keys/column names from APIs to databases)

disaggregated databases - pros and cons

flexibility and cost control not as much tooling invested vs. Snowflake/BigQuery/Redshift

data modeling

dimensional modeling vs. answering today's questions

What are the most interesting, unexpected, or challenging lessons that you have learned while working on your data platform? When is ELT the wrong choice? What do you have planned for the future of your data platform?

Contact Info

LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

dbt Airbyte

Podcast Episode

Dagster

Podcast Episode

Trino

Podcast Episode

ELT Data Lakehouse Snowflake BigQuery Redshift Technical Debt Hive Metastore AWS Glue

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Sponsored By: Rudderstack: Rudderstack

RudderStack provides all your customer data pipelines in one platform. You can collect, transform, and route data across your entire stack with its event streaming, ETL, and reverse ETL pipelines.

RudderStack’s warehouse-first approach means it does not store sensitive information, and it allows you to leverage your existing data warehouse/data lake infrastructure to build a single source of truth for every team.

RudderStack also supports real-time use cases. You can Implement RudderStack SDKs once, then automatically send events to your warehouse and 150+ business tools, and you’ll never have to worry about API changes again.

Visit dataengineeringpodcast.com/rudderstack to sign up for free today, and snag a free T-Shirt just for being a Data Engineering Podcast listener.Support Data Engineering Podcast

Data science and machine learning are integral parts of most large-scale product manufacturing processes and are used to understand customer needs, detect quality issues, automate repetitive tasks and optimise supply chains. It’s an invisible glue that helps us produce more things for less, and in a timely fashion. To learn more about this fascinating topic, I recently spoke to Ranga Ramesh who is Senior Director, Quality Innovation and Transformation at Georgia-Pacific. Georgia-Pacific is one of the world’s largest manufacturers of consumer paper products and uses AI technologies throughout their manufacturing process. In this episode of Leaders of Analytics, we explore how computer vision and machine learning can be used to classify tissue paper softness and instantly detect quality issues that could otherwise render large volumes of product useless. Ranga’s work is featured as a case study in our recently published book, Demystifying AI for the Enterprise.

Summary

The theory behind how a tool is supposed to work and the realities of putting it into practice are often at odds with each other. Learning the pitfalls and best practices from someone who has gained that knowledge the hard way can save you from wasted time and frustration. In this episode James Meickle discusses his recent experience building a new installation of Airflow. He points out the strengths, design flaws, and areas of improvement for the framework. He also describes the design patterns and workflows that his team has built to allow them to use Airflow as the basis of their data science platform.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing James Meickle about his experiences building a new Airflow installation

Interview

Introduction How did you get involved in the area of data management? What was your initial project requirement?

What tooling did you consider in addition to Airflow? What aspects of the Airflow platform led you to choose it as your implementation target?

Can you describe your current deployment architecture?

How many engineers are involved in writing tasks for your Airflow installation?

What resources were the most helpful while learning about Airflow design patterns?

How have you architected your DAGs for deployment and extensibility?

What kinds of tests and automation have you put in place to support the ongoing stability of your deployment? What are some of the dead-ends or other pitfalls that you encountered during the course of this project? What aspects of Airflow have you found to be lacking that you would like to see improved? What did you wish someone had told you before you started work on your Airflow installation?

If you were to start over would you make the same choice? If Airflow wasn’t available what would be your second choice?

What are your next steps for improvements and fixes?

Contact Info

@eronarn on Twitter Website eronarn on GitHub

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Quantopian Harvard Brain Science Initiative DevOps Days Boston Google Maps API Cron ETL (Extract, Transform, Load) Azkaban Luigi AWS Glue Airflow Pachyderm

Podcast Interview

AirBnB Python YAML Ansible REST (Representational State Transfer) SAML (Security Assertion Markup Language) RBAC (Role-Based Access Control) Maxime Beauchemin

Medium Blog

Celery Dask

Podcast Interview

PostgreSQL

Podcast Interview

Redis Cloudformation Jupyter Notebook Qubole Astronomer

Podcast Interview

Gunicorn Kubernetes Airflow Improvement Proposals Python Enhancement Proposals (PEP)

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast