talk-data.com talk-data.com

Topic

AI/ML

Artificial Intelligence/Machine Learning

data_science algorithms predictive_analytics

9014

tagged

Activity Trend

1532 peak/qtr
2020-Q1 2026-Q1

Activities

9014 activities · Newest first

talk
by Tito Osadebey (Keele University; Synectics Solutions; Unify)

Fairness and inclusivity are critical challenges as AI systems influence decisions in healthcare, finance, and everyday life. Yet, most fairness frameworks are developed in limited contexts, often overlooking the data diversity needed for global reliability.

In this talk, Tito Osadebey shares lessons from his research on bias in computer vision models to highlight where fairness efforts often fall short and how data professionals can address these gaps. He’ll outline practical principles for building and evaluating inclusive AI systems, discuss pitfalls that lead to hidden biases, and explore what “fairness” really means in practice.

Tito Osadebey is an AI researcher and data scientist whose work focuses on fairness, inclusivity, and ethical representation in AI systems. He recently published a paper on bias in computer vision models using Nigerian food images, which examines how underrepresentation of the Global South affects model performance and trust.

Tito has contributed to research and industry projects spanning computer vision, NLP, GenAI and data science with organisations including Keele University, Synectics Solutions, and Unify. His work has been featured on BBC Radio, and he led a team from Keele University which secured 3rd place globally at the 2025 IEEE MetroXraine Forensic Handwritten Document Analysis Challenge.

He is passionate about making AI systems more inclusive, context-aware, and equitable bridging the gap between technical innovation and human understanding.

In this session, we’ll show how to structure and deliver the right context to large language models (LLMs) so they can actually reason through tasks - not just retrieve answers. We'll show practical ways to provide context across prompts and tools, using a Model Context Repository to make your AI apps much smarter.

Dans un contexte où l'intelligence artificielle façonne les enjeux stratégiques, développer une IA souveraine est essentiel pour garantir autonomie, sécurité et maîtrise technologique. Chez Thales, cela implique de traduire les principes théoriques de souveraineté numérique en solutions pratiques, robustes et conformes aux exigences du secteur de la défense pour permettre l'exploitabilité et l'exportabilité de nos solutions.

Brought to You By: •⁠ Statsig ⁠ — ⁠ The unified platform for flags, analytics, experiments, and more. Statsig enables two cultures at once: continuous shipping and experimentation. Companies like Notion went from single-digit experiments per quarter to over 300 experiments with Statsig. Start using Statsig with a generous free tier, and a $50K startup program. •⁠ Linear ⁠ — ⁠ The system for modern product development. When most companies hit real scale, they start to slow down, and are faced with “process debt.” This often hits software engineers the most. Companies switch to Linear to hit a hard reset on this process debt – ones like Scale cut their bug resolution in half after the switch. Check out Linear’s migration guide for details. — What’s it like to work as a software engineer inside one of the world’s biggest streaming companies? In this special episode recorded at Netflix’s headquarters in Los Gatos, I sit down with Elizabeth Stone, Netflix’s Chief Technology Officer. Before becoming CTO, Elizabeth led data and insights at Netflix and was VP of Science at Lyft. She brings a rare mix of technical depth, product thinking, and people leadership. We discuss what it means to be “unusually responsible” at Netflix, how engineers make decisions without layers of approval, and how the company balances autonomy with guardrails for high-stakes projects like Netflix Live. Elizabeth shares how teams self-reflect and learn from outages and failures, why Netflix doesn’t do formal performance reviews, and what new grads bring to a company known for hiring experienced engineers. This episode offers a rare inside look at how Netflix engineers build, learn, and lead at a global scale. — Timestamps (00:00) Intro (01:44) The scale of Netflix  (03:31) Production software stack (05:20) Engineering challenges in production (06:38) How the Open Connect delivery network works (08:30) From pitch to play  (11:31) How Netflix enables engineers to make decisions  (13:26) Building Netflix Live for global sports (16:25) Learnings from Paul vs. Tyson for NFL Live (17:47) Inside the control room  (20:35) What being unusually responsible looks like (24:15) Balancing team autonomy with guardrails for Live (30:55) The high talent bar and introduction of levels at Netflix (36:01) The Keeper Test   (41:27) Why engineers leave or stay  (44:27) How AI tools are used at Netflix (47:54) AI’s highest-impact use cases (50:20) What new grads add and why senior talent still matters (53:25) Open source at Netflix  (57:07) Elizabeth’s parting advice for new engineers to succeed at Netflix  — The Pragmatic Engineer deepdives relevant for this episode: • The end of the senior-only level at Netflix • Netflix revamps its compensation philosophy • Live streaming at world-record scale with Ashutosh Agrawal • Shipping to production • What is good software architecture? — Production and marketing by ⁠⁠⁠⁠⁠⁠⁠⁠https://penname.co/⁠⁠⁠⁠⁠⁠⁠⁠. For inquiries about sponsoring the podcast, email [email protected].

Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe

The future of education is being reshaped by AI-powered personalization. Traditional online learning platforms offer static content that doesn't adapt to individual needs, but new technologies are creating truly interactive experiences that respond to each learner's context, pace, and goals. How can personalized AI tutoring bridge the gap between mass education and the gold standard of one-on-one human tutoring? What if every professional could have a private tutor that understands their industry, role, and specific challenges? As organizations invest in upskilling their workforce, the question becomes: how can we leverage AI to make learning more engaging, effective, and accessible for everyone? As the Co-Founder & CEO of DataCamp, Jonathan Cornelissen has helped grow DataCamp to upskill over 10M+ learners and 2800+ teams and enterprise clients. He is interested in everything related to data science, education, and entrepreneurship. He holds a Ph.D. in financial econometrics and was the original author of an R package for quantitative finance. Yusuf Saber is a technology leader and entrepreneur with extensive experience building and scaling data-driven organizations across the Middle East. He is the Founder of Optima and a Venture Partner at COTU Ventures, with previous leadership roles at talabat, including VP of Data and Senior Director of Data Science and Engineering. Earlier in his career, he co-founded BulkWhiz and Trustious, and led data science initiatives at Careem. Yusuf holds research experience from ETH Zurich and began his career as an engineering intern at Mentor Graphics. In the episode, Richie, Jo and Yusuf explore the innovative AI-driven learning platform Optima, its unique approach to personalized education, the potential for AI to enhance learning experiences, the future of AI in education, the challenges and opportunities in creating dynamic, context-aware learning environments, and much more. Links Mentioned in the Show: Read more about the announcementTry the AI-Native Courses:Intro to SQLIntro to AI for Work New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for busines

Send us a text Data’s everywhere, but so often it feels… stuck. Joining us today is Jure Leskovec, Chief Scientist at Kumo and a Stanford Professor who's fundamentally reshaped how we understand networks—from Pinterest's recommendations to tracking the spread of disease. We’ll unpack why structured data is lagging behind the AI revolution, exploring how techniques like Graph Neural Networks are finally unlocking its potential, and how this all plays out in real-world applications

00:57 Meet Jure Leskovec02:31 Knowing When to Move On04:01 Academia versus Industry07:30 Learnings from Pinterest10:28 The Kumo Pitch17:57 The Secret Sauce25:51 Monetization27:12 Only the Enterprise?29:49 The Sandbox to Try Before Buy31:42 The Best Use Cases35:00 Summarizing37:38 Predicting AI40:15 What's True and No One Agrees41:19 Learning

LinkedIn: https://www.linkedin.com/in/leskovec/ Website: https://kumo.ai/ Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

AI Systems Performance Engineering

Elevate your AI system performance capabilities with this definitive guide to maximizing efficiency across every layer of your AI infrastructure. In today's era of ever-growing generative models, AI Systems Performance Engineering provides engineers, researchers, and developers with a hands-on set of actionable optimization strategies. Learn to co-optimize hardware, software, and algorithms to build resilient, scalable, and cost-effective AI systems that excel in both training and inference. Authored by Chris Fregly, a performance-focused engineering and product leader, this resource transforms complex AI systems into streamlined, high-impact AI solutions. Inside, you'll discover step-by-step methodologies for fine-tuning GPU CUDA kernels, PyTorch-based algorithms, and multinode training and inference systems. You'll also master the art of scaling GPU clusters for high performance, distributed model training jobs, and inference servers. The book ends with a 175+-item checklist of proven, ready-to-use optimizations. Codesign and optimize hardware, software, and algorithms to achieve maximum throughput and cost savings Implement cutting-edge inference strategies that reduce latency and boost throughput in real-world settings Utilize industry-leading scalability tools and frameworks Profile, diagnose, and eliminate performance bottlenecks across complex AI pipelines Integrate full stack optimization techniques for robust, reliable AI system performance

API testing includes so many things in it - functionality, data, performance, security. We'd like to know as much as we can about our APIs, but we've got so little time. Can AI help? You bet. It can help in planning, case suggestions, preparations for testing, documenting the tests and help with integration with our favorite tools. Things that took hours now take seconds. API Testing is changing. We want to take advantage of AI's power, and make sure that our testing is not only productive, but effective. I'll show you how.

In this conversation, Dr. Cecilia Dones and I discuss the social skills we're losing as AI becomes more integrated into our lives. We explore the erosion of social norms, from AI companions joining Zoom calls without consent, endless enshitified content, to my son's generation calling AI girlfriends "clankers".Is there hope? We break down the "rage currency" that dominates media and the positive AI stories that go unheard. The biggest takeaway: as the world becomes more synthetic, "showing up" in person will become the ultimate "premium value."

🏆 Follow this roadmap w/ The Data Analytics Accelerator (My Bootcamp): https://datacareerjumpstart.com/daa ⌚ TIMESTAMPS 00:19 - Step 1: Skills 02:33 - Step 2: Data Roles 06:38 - Step 3: Projects 10:22 - Step 4: Portfolio 13:20 - Step 5: Resume & LinkedIn 17:59 - Step 6: Job Hunting 21:12 - Step 7: Interviews 22:53 - The SPN Method 💌 Join 30k+ aspiring data analysts & get my tips in your inbox weekly 👉 https://www.datacareerjumpstart.com/newsletter 🆘 Feeling stuck in your data journey? Come to my next free "How to Land Your First Data Job" training 👉 https://www.datacareerjumpstart.com/training 👩‍💻 Want to land a data job in less than 90 days? 👉 https://www.datacareerjumpstart.com/daa 👔 Ace The Interview with Confidence 👉 https://www.datacareerjumpstart.com//interviewsimulator  🔗 CONNECT WITH AVERY 🎥 YouTube Channel 🤝 LinkedIn 📸 Instagram 🎵 TikTok 💻 Website Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

I missed my parents, so I built an AI that talks like them. This isn’t about replacing people—it’s about remembering the voices that make us feel safe. In this 90-minute episode of Data & AI with Mukundan, we explore what happens when technology stops chasing efficiency and starts chasing empathy. Mukundan shares the story behind “What Would Mom & Dad Say?”, a Streamlit + GPT-4 experiment that generates comforting messages in the voice of loved ones. You’ll hear: The emotional spark that inspired the projectThe plain-English prompts anyone can use to teach AI empathyBoundaries & ethics of emotional AIHow this project reframed loneliness, creativity, and connectionTakeaway: AI can’t love you—but it can remind you of the people who do. 🔗 Try the free reflection prompts below: THE ONE-PROMPT VERSION: “What Would Mom & Dad Say?”
“You are speaking to me as one of my parents. Choose the tone I mention: either Mom (warm and reflective) or Dad (practical and encouraging). First, notice the emotion in what I tell you—fear, stress, guilt, joy, or confusion—and name it back to me so I feel heard. Then reply in 3 parts: Start by validating what I’m feeling, in a caring way.Share a short story, lesson, or perspective that fits the situation.End with one hopeful or guiding question that helps me think forward. Keep your words gentle, honest, and simple. No technical language. Speak like someone who loves me and wants me to feel calm and capable again.”

Join the Discussion (comments hub): https://mukundansankar.substack.com/notes Tools I use for my Podcast and Affiliate PartnersRecording Partner: Riverside → Sign up here (affiliate)Host Your Podcast: RSS.com (affiliate )Research Tools: Sider.ai (affiliate)Sourcetable AI: Join Here(affiliate)🔗 Connect with Me:Free Email NewsletterWebsite: Data & AI with MukundanGitHub: https://github.com/mukund14Twitter/X: @sankarmukund475LinkedIn: Mukundan SankarYouTube: Subscribe

Today, we’re joined by Chris McHenry, Chief Product Officer at Aviatrix, a cloud native network security company. We talk about:  Prerequisites to driving operational efficiency with agentic AIBridging the gap between security & engineering so organizations can go fast & be secure What’s required in order for agentic AI to create a magical momentWith cloud powering so much of our society, the need to get security right The security challenges introduced by agentic AI apps, including new attack vectors

Data storytelling isn't just about presenting numbers—it's about creating shared wisdom that drives better decision-making. In our increasingly polarized world, we often miss that most people actually have reasonable views hidden behind the loudest voices. But how can technology help us cut through the noise and build genuine understanding? What if AI could help us share stories across different communities and contexts, making our collective knowledge more accessible? From reducing unnecessary meetings to enabling more effective collaboration, the way we exchange information is evolving rapidly. Are you prepared for a future where AI helps us communicate more effectively rather than replacing human judgment? Professor Alex “Sandy” Pentland is a leading computational scientist, co-founder of the MIT Media Lab and Media Lab Asia, and a HAI Fellow at Stanford. Recognized by Forbes as one of the world’s most powerful data scientists, he played a key role in shaping the GDPR through the World Economic Forum and contributed to the UN’s Sustainable Development Goals as one of the Secretary General’s “Data Revolutionaries.” His accolades include MIT’s Toshiba Chair, election to the U.S. National Academy of Engineering, the Harvard Business Review McKinsey Award, and the DARPA 40th Anniversary of the Internet Award. Pentland has served on advisory boards for organizations such as the UN Secretary General, UN Foundation, Consumers Union, and formerly for the OECD, Google, AT&T, and Nissan. Companies originating from his lab have driven major innovations, including India’s Aadhaar digital identity system, Alibaba’s news and advertising arm, and the world’s largest rural health service network. His more recent ventures span mental health (Ginger.io), AI interaction management (Cogito), delivery optimization (Wise Systems), financial privacy (Akoya), and fairness in social services (Prosperia). A mentor to over 80 PhD students—many now leading in academia, research, or entrepreneurship—Pentland helped pioneer fields such as computational social science, wearable computing, and modern biometrics. His books include Social Physics, Honest Signals, Building the New Economy, and Trusted Data. In the episode, Richie and Sandy explore the role of storytelling in data and AI, how technology reshapes our narratives, the impact of AI on decision-making, the importance of shared wisdom in communities, and much more. Links Mentioned in the Show: MIT Media LabSandy’s Booksdeliberation.ioConnect with SandySkill Track: Artificial Intelligence (AI) LeadershipRelated Episode: The Human Element of AI-Driven Transformation with Steve Lucas, CEO at BoomiRewatch RADAR AI  New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

Building B2B analytics and AI tools that people will actually pay for and use is hard. The reality is, your product won’t deliver ROI if no one’s using it. That’s why first principles thinking says you have to solve the usage problem first.

In this episode, I’ll explain why the key to user adoption is designing with the flow of work—building your solution around the natural workflows of your users to minimize the behavior changes you’re asking them to make. When users clearly see the value in your product, it becomes easier to sell and removes many product-related blockers along the way.

We’ll explore how product design impacts sales, the difference between buyers and users in enterprise contexts, and why challenging the “data/AI-first” mindset is essential. I’ll also share practical ways to align features with user needs, reduce friction, and drive long-term adoption and impact.

If you’re ready to move beyond the dashboard and start building products that truly fit the way people work, this episode is for you.

Highlights/Skip to: 

The core argument: why solving for user adoption first helps demonstrate ROI and facilitate sales in B2B analytics and AI products  (1:34) How showing the value to actual end users—not just buyers—makes it easier to sell your product (2:33) Why designing for outcomes instead of outputs (dashboards, etc) leads to better adoption and long-term product value (8:16) How to “see” beyond users’ surface-level feature requests and solutions so you can solve for the actual, unspoken need—leading to an indispensable product (10:23) Reframing feature requests as design-actionable problems (12:07)  Solving for unspoken needs vs. customer-requested features and functions (15:51) Why “disruption” is the wrong approach for product development (21:19)

Quotes: 

“Customers’ tolerance for poorly designed B2B software has decreased significantly over the last decade. People now expect enterprise tools to function as smoothly and intuitively as the consumer apps they use every day. 

Clunky software that slows down workflows is no longer acceptable, regardless of the data it provides. If your product frustrates users or requires extra effort to achieve results, adoption will suffer.

Even the most powerful AI or analytics engine cannot compensate for a confusing or poorly structured interface. Enterprises now demand experiences that are seamless, efficient, and aligned with real workflows. 

This shift means that product design is no longer a secondary consideration; it is critical to commercial success.  Founders and product leaders must prioritize usability, clarity, and delight in every interaction. Software that is difficult to use increases the risk of churn, lengthens sales cycles, and diminishes perceived value. Products must anticipate user needs and deliver solutions that integrate naturally into existing workflows. 

The companies that succeed are the ones that treat user experience as a strategic differentiator. Ignoring this trend creates friction, frustration, and missed opportunities for adoption and revenue growth. Design quality is now inseparable from product value and market competitiveness.  The message is clear: if you want your product to be adopted, retain customers, and win in the market, UX must be central to your strategy.”

“No user really wants to ‘check a dashboard’ or use a feature for its own sake. Dashboards, charts, and tables are outputs, not solutions. What users care about is completing their tasks, solving their problems, and achieving meaningful results. 

Designing around workflows rather than features ensures your product is indispensable. A workflow-first approach maps your solution to the actual tasks users perform in the real world. 

When we understand the jobs users need to accomplish, we can build products that deliver real value and remove friction. Focusing solely on features or data can create bloated products that users ignore or struggle to use. 

Outputs are meaningless if they do not fit into the context of a user’s work. The key is to translate user needs into actionable workflows and design every element to support those flows. 

This approach reduces cognitive load, improves adoption, and ensures the product's ROI is realized. It also allows you to anticipate challenges and design solutions that make workflows smoother, faster, and more efficient. 

By centering design on actual tasks rather than arbitrary metrics, your product becomes a tool users can’t imagine living without. Workflow-focused design directly ties to measurable outcomes for both end users and buyers. It shifts the conversation from features to value, making adoption, satisfaction, and revenue more predictable.”

“Just because a product is built with AI or powerful data capabilities doesn’t mean anyone will adopt it. Long-term value comes from designing solutions that users cannot live without. It’s about creating experiences that take people from frustration to satisfaction to delight. 

Products must fit into users’ natural workflows and improve their performance, efficiency, and outcomes. Buyers' perceived ROI is closely tied to meaningful adoption by end users. If users struggle, churn rises, and financial impact is diminished, regardless of technical sophistication. 

Designing for delight ensures that the product becomes a positive force in the user’s daily work. It strengthens engagement, reduces friction, and builds customer loyalty. 

High-quality UX allows the product to demonstrate value automatically, without constant explanations or hand-holding. Delightful experiences encourage advocacy, referrals, and easier future sales. 

The real power of design lies in aligning technical capabilities with human behavior and workflow. 

When done correctly, this approach transforms a tool into an indispensable part of the user’s job and a demonstrable asset for the business. 

Focusing on usability, satisfaction, and delight creates long-term adoption and retention, which is the ultimate measure of product success.”

“Your product should enter the user’s work stream like a raft on a river, moving in the same direction as their workflow. Users should not have to fight the current or stop their flow to use your tool. 

Introducing friction or requiring users to change their behavior increases risk, even if the product delivers ROI. The more naturally your product aligns with existing workflows, the easier it is to adopt and the more likely it is to be retained. 

Products that feel intuitive and effortless become indispensable, reducing conversations about usability during demos. By matching the flow of work, your solution improves satisfaction, accelerates adoption, and enhances perceived value. 

Disrupting workflows without careful observation can create new problems, frustrate users, and slow down sales. The goal is to move users from frustration to satisfaction to delight, all while achieving the intended outcomes. 

Designing with the flow of work ensures that every feature, interface element, and interaction fits seamlessly into the tasks users already perform. It allows users to focus on value instead of figuring out how to use the product. 

This alignment is key to unlocking adoption, retaining customers, and building long-term loyalty. 

Products that resist the natural workflow may demonstrate ROI on paper but fail in practice due to friction and low engagement. 

Success requires designing a product that supports the user’s journey downstream without interruption or extra effort. 

When you achieve this, adoption becomes easier, sales conversations smoother, and long-term retention higher.”

AI and data analytics are transforming business, and your data career can’t afford to be left behind. 🎙️ In this episode of Data Career School, I sit down with Ketan Mudda, Director of Data Science & AI Solutions at Walmart, to explore how AI is reshaping retail, analytics, and decision-making—and what it means for students, job seekers, and early-career professionals in 2026.

We dive into: How AI is driving innovation and smarter decisions in retail and business Essential skills data professionals need to thrive in an AI-first world How AI tools like ChatGPT are changing the way analysts work What employers look for beyond technical expertise Strategies to future-proof your data career

Ketan also shares his journey from Credit Risk Analyst at HSBC to leading AI-driven initiatives at one of the world’s largest retailers.

Whether you’re starting your data career, exploring AI’s impact on business, or curious about analytics in action, this episode is packed with actionable insights, inspiration, and career guidance.

🎙️ Hosted by Amlan Mohanty — creator of Data Career School, where we explore AI, data analytics, and the future of work. Follow me: 📺 YouTube 🔗 LinkedIn 📸 Instagram

🎧Listen now to level up your data career!

Chapters 00:00 The Journey of Ketan Mudda05:18 AI's Transformative Impact on Industries12:49 Responsible AI Practices14:28 The Role of Education in Data Science23:18 AI and the Future of Jobs28:03 Embracing AI Tools for Success29:44 The Importance of Networking31:40 Curiosity and Continuous Learning32:50 Storytelling in Data Science Leadership36:22 Focus on AI Ethics and Change Management41:03 Learning How to Learn44:57 Identifying Problems Over Tools

Summary In this episode of the Data Engineering Podcast Ariel Pohoryles, head of product marketing for Boomi's data management offerings, talks about a recent survey of 300 data leaders on how organizations are investing in data to scale AI. He shares a paradox uncovered in the research: while 77% of leaders trust the data feeding their AI systems, only 50% trust their organization's data overall. Ariel explains why truly productionizing AI demands broader, continuously refreshed data with stronger automation and governance, and highlights the challenges posed by unstructured data and vector stores. The conversation covers the need to shift from manual reviews to automated pipelines, the resurgence of metadata and master data management, and the importance of guardrails, traceability, and agent governance. Ariel also predicts a growing convergence between data teams and application integration teams and advises leaders to focus on high-value use cases, aggressive pipeline automation, and cataloging and governing the coming sprawl of AI agents, all while using AI to accelerate data engineering itself.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Ariel Pohoryles about data management investments that organizations are making to enable them to scale AI implementationsInterview IntroductionHow did you get involved in the area of data management?Can you start by describing the motivation and scope of your recent survey on data management investments for AI across your respondents?What are the key takeaways that were most significant to you?The survey reveals a fascinating paradox: 77% of leaders trust the data used by their AI systems, yet only half trust their organization's overall data quality. For our data engineering audience, what does this suggest about how companies are currently sourcing data for AI? Does it imply they are using narrow, manually-curated "golden datasets," and what are the technical challenges and risks of that approach as they try to scale?The report highlights a heavy reliance on manual data quality processes, with one expert noting companies feel it's "not reliable to fully automate validation" for external or customer data. At the same time, maturity in "Automated tools for data integration and cleansing" is low, at only 42%. What specific technical hurdles or organizational inertia are preventing teams from adopting more automation in their data quality and integration pipelines?There was a significant point made that with generative AI, "biases can scale much faster," making automated governance essential. From a data engineering perspective, how does the data management strategy need to evolve to support generative AI versus traditional ML models? What new types of data quality checks, lineage tracking, or monitoring for feedback loops are required when the model itself is generating new content based on its own outputs?The report champions a "centralized data management platform" as the "connective tissue" for reliable AI. How do you see the scale and data maturity impacting the realities of that effort?How do architectural patterns in the shape of cloud warehouses, lakehouses, data mesh, data products, etc. factor into that need for centralized/unified platforms?A surprising finding was that a third of respondents have not fully grasped the risk of significant inaccuracies in their AI models if they fail to prioritize data management. In your experience, what are the biggest blind spots for data and analytics leaders?Looking at the maturity charts, companies rate themselves highly on "Developing a data management strategy" (65%) but lag significantly in areas like "Automated tools for data integration and cleansing" (42%) and "Conducting bias-detection audits" (24%). If you were advising a data engineering team lead based on these findings, what would you tell them to prioritize in the next 6-12 months to bridge the gap between strategy and a truly scalable, trustworthy data foundation for AI?The report states that 83% of companies expect to integrate more data sources for their AI in the next year. For a data engineer on the ground, what is the most important capability they need to build into their platform to handle this influx?What are the most interesting, innovative, or unexpected ways that you have seen teams addressing the new and accelerated data needs for AI applications?What are some of the noteworthy trends or predictions that you have for the near-term future of the impact that AI is having or will have on data teams and systems?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links BoomiData ManagementIntegration & Automation DemoAgentstudioData Connector Agent WebinarSurvey ResultsData GovernanceShadow ITPodcast EpisodeThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

The problem of address matching arrives when the address of one physical place is written in two or more different ways. This situation is very common in companies that receive records of customers from different sources. The differences can be classified as syntactic and semantic. In the first type, the meaning is the same but the way they are written is different. For example, one can find "Street" vs "St". In the second type, the meaning is not exactly the same. For example, one can find "Road" instead of "Street". To solve this problem and match addresses, we have a couple of approaches. The first and simple is by using similarity metrics. The second uses natural language and transformers. This is a hands-on talk and is intended for data process analyst. We are going to go through these solutions implemented in a Jupyter notebook using Python.

LLMs have a lot of hype around them these days. Let’s demystify how they work and see how we can put them in context for data science use. As data scientists, we want to make sure our results are inspectable, reliable, reproducible, and replicable. We already have many tools to help us in this front. However, LLMs provide a new challenge; we may not always be given the same results back from a query. This means trying to work out areas where LLMs excel in, and use those behaviors in our data science artifacts. This talk will introduce you to LLMs, the Chatlas packages, and how they can be integrated into a Shiny to create an AI-powered dashboard (using querychat). We’ll see how we can leverage the tasks LLMs are good at to better our data science products.

AI/ML workloads depend heavily on complex software stacks, including numerical computing libraries (SciPy, NumPy), deep learning frameworks (PyTorch, TensorFlow), and specialized toolchains (CUDA, cuDNN). However, integrating these dependencies into Bazel-based workflows remains challenging due to compatibility issues, dependency resolution, and performance optimization. This session explores the process of creating and maintaining Bazel packages for key AI/ML libraries, ensuring reproducibility, performance, and ease of use for researchers and engineers.